Home

If the vote you rocked, your personal info can be grokked

Your voter data could be used against you. A foreign intelligence service that wished to identify the family members of deployed military personnel could do so by cross-referencing public voter record data and social media posts.

An employer who only wanted to hire employees with a specific political affiliation could do so by analyzing the primary ballot history of job applicants.

An identity fraud ring seeking to open credit accounts in the names of other people could identify voters whose mail has been returned (via voter file suspense indicators) to take over those addresses using bogus change-of-address requests.

These scenarios are possible thanks to the ability to link publicly available voter data to other data sets, according to Noah M. Kenney, founder of consultancy Digital 520.

"I picked two different counties that kind of represented opposite ends of the spectrum," Kenney told The Register in a phone interview. 

"In Texas, they hide a lot of information and then North Carolina makes a lot of it public in terms of the specific records. And what I was looking at specifically is if you go and merge this data set or link this data set with other data sets, how likely are you to be able to re-identify a person?"

More than 25 years ago, research by Latanya Sweeney, currently a professor at Harvard, demonstrated that most of the US population (87 percent) could be identified with just three anonymous data points – a five-digit ZIP code, gender, and date of birth.

Those results can be improved when combined with other data sets. And recent research has shown that the process of identifying people from seemingly anonymous data points becomes even easier with AI tools.

In a research paper titled "Public Voting Records: A Record, or an Attack Surface?", Kenney describes how he analyzed public records from Travis County, Texas, and Robeson County, North Carolina to show that the adversarial scenarios cited above are practical with public data.

The Texas file provides fewer data points than the North Carolina file, but the research suggests redaction doesn't make much of a difference in the re-identification scenarios evaluated.

With the less detailed Texas info, Kenney was able to use a Python script to link the voter records to other public records like the Federal Election Commission's individual-contribution data.

"We pulled 500 contribution records for ZIP 78704 (an Austin-core ZIP including South Congress and Travis Heights neighborhoods) from the 2024 cycle via the FEC OpenAPI on May 1, 2026," he explains in his paper. 

"We de-duplicated to 181 unique contributors by exact match on (last name, first name, ZIP), and inner-joined to the voter file on the same key, no fuzzy matching, no nickname normalization, no suffix handling. Of the 181 contributors, 105 (58.01 percent) matched any voter record and 95 (52.49 percent) matched a uniquely-identifiable voter. Of the 105 matches, 74.3 percent had a non-trivial employer field in FEC."

That 52 percent individual match rate for identifying individuals from voter rolls and FEC data, Kenney said, would be more like 90–95 percent using the kinds of tools commercial data brokers employ.

The North Carolina voter dataset includes a phone number for the majority of voters. According to the paper, 88.53 percent of voters who have a phone number listed have a number that is unique within the county. As a result, external datasets containing phone numbers can be joined at a similar rate using this field as a key to narrow down and identify likely individuals.

Among the report's other findings: 

There's currently no comprehensive federal privacy law. While many states have privacy rules, there's a lot of variation.

"Even within a specific state, most of the counties are individually handling these public records requests, so they all handle them differently across the country," said Kenney. 

"Some of them, you can't get them. Some of them, you need an ID to get them. Some of them you have to go through a request process for public records or you have to pay for them. The two counties I used are both freely available. You can go and download zip files of them without even putting in an email address or your name from anywhere in the world."

Kenney said that he believes that access controls represent a better answer than redacting certain data fields, pointing to his findings that show redaction doesn't necessarily protect against privacy harms. He recommends measures like rate limits on bulk file requests, identity verification, requiring state ID, maintaining audit logs of requests, and prohibiting commercial resale of these records – because they're often used by data brokers.

Beyond specific fixes based on his findings – Texas should generalize voter registration dates to a year rather than a day and armed forces mailing codes should be excluded from voter rolls – Kenney argues that people should be allowed to opt out of inclusion in public data sets and that general data privacy protections would be helpful.

Last week, House Republicans introduced the Secure Data Act in an effort to create federal privacy rules. But Kenney says that it's significantly weaker than a lot of state regulations and he doesn't expect it will pass.

"The industry consensus is that the likelihood of it passing is extremely low, at least in its current form," he said. "This represents the third attempt to pass comprehensive data privacy in recent years, most recent being the American Data Privacy and Protection Act, which failed to pass." ®

Source: The register

Previous

Next