Postprocessing, Consistency, and the Challenges of Negative Numbers

Registered United States Census Bureau Logo

Postprocessing, Consistency, and the Challenges of Negative Numbers

Privacy 1

There are two core components to the 2020 Disclosure Avoidance System (DAS): noise injection and postprocessing.

Step 1: Injecting Noise

In order to protect privacy, the DAS injects a small amount of noise into every statistic that it produces from the confidential data. The amount of noise for each statistic is randomly selected from a distribution centered around zero. Values closest to zero (e.g., 0, +1 or -2) are most typical. Larger amounts are possible, but the likelihood of their selection decreases the farther away from zero they get.

The amount of privacy-loss budget expended on the calculation determines the likelihood of drawing a value of noise closer to (or farther away from) zero. The DAS then adds or subtracts that value from the statistic, and reports the result as the answer to the query.

If the true answer to a query (e.g., how many people in this block are 43 years old and female?) is 1 person, the DAS might return the answer as -1 person. Viewing just the noisy output, the system protects the privacy for that lone individual because you can’t distinguish whether the true answer was 0 (high probability), 1 (relatively high probability), or larger (less likely).

These noisy tabulations, with possible fractional or negative results, have high statistical value, as the noise added to protect privacy is drawn from an unbiased distribution, preserving their statistical validity. Sophisticated data users can even build the underlying noise distributions into their models to improve the accuracy of their analyses.

However, presenting official statistics in this way would be confusing for most census data users. The Census Bureau has always produced counts or estimates that reflect whole numbers of people, not “partial” or “negative” people, and the number of people with different attributes (e.g., males and females) in a geography have always added up to the total population in that geography.

 

Step 2: Post-Processing the Noisy Statistics

To meet these expectations, after the DAS adds noise to the data it performs another step called “postprocessing.” The DAS processes the statistics one geographic level at a time, starting at the national level and working through each geographic level on the geographic hierarchy or “spine” down to census blocks. At each level, the DAS processes all of the units at that geographic level at once, to ensure that they properly add up to the geographic unit above. For example, the DAS examines the noisy counts for all blocks within a block group, then finds the set of counts for each block that is closest to its noisy count but that also jointly adds up to the total population for the block group.

Because this postprocessing step requires making additional changes to the already noisy statistics, it also introduces error. As was clear with the early version of the DAS used to produce the 2010 Demonstration Data Products, it can sometimes produce distortions with significant biases.

Most notably, that version of the DAS added population to small area geographies, and, because the state-level counts are invariant, removed population from larger areas. The relative impact of the distortions was more noticeable for small area geographies and populations whose characteristics are less common.  

 

Addressing Post-Processing Distortion in the DAS

The distortions caused by postprocessing in the early DAS significantly exceeded those caused by the privacy-protecting algorithm behind the DAS. 

Correcting this problem has been a top priority of the Census Bureau's DAS development team. In March, the team developed a "multipass" method of applying the DAS's "TopDown" algorithm in postprocessing. Early results are promising. 

We'll be sharing more about the multipass method and results of ongoing analyses of iterative DAS progress through our blogs and this newsletter. 

Learn More

Gray Divider

Did You Know? 

Did you know that from the 1990 through 2010 decennial censuses, the Census Bureau protected privacy by injecting noise using a “record swapping” method? Processing rules would identify households most at risk of being re-identified and would swap their records with those of a household in a different geographic location.

The Census Bureau’s recent research has determined that, in the age of Big Data and artificial intelligence, this method is no longer sufficient to protect respondent identity.

 

Gray Divider

Have Suggestions?

Do you have specific questions you'd like us to answer in this newsletter, or topics you'd like discussed? Send us an email at 2020DAS@census.gov and let us know!

Contact Us

About Disclosure Avoidance Modernization

The Census Bureau is protecting 2020 Census data products with a powerful new cryptography-based disclosure avoidance system known as “differential privacy.”  We are committed to producing 2020 Census data products that are of the same high quality you've come to expect while protecting respondent confidentiality from emerging privacy threats in today’s digital world. 

 

Share This