New Privacy-Protected Microdata Files

Registered United States Census Bureau Logo

New Privacy-Protected Microdata Files

Protecting Your Privacy and Security

In an effort to keep our data users informed of the progress we are making on the Disclosure Avoidance System (DAS) for the 2020 Census, we have released a new set of demonstration data for our data users to evaluate. The 2010 Demonstration Privacy-Protected Microdata Files 2020-09-17 (PPMFs) were generated by running the 2010 Census data through an updated version of the DAS that incorporates system improvements we have been developing and testing over the last several months.

Updates to American Indian and Alaska Native Data Processing and Post-Processing Algorithms

One of the most notable system improvements reflected in the new PPMFs is a change in how the DAS processes data for American Indian and Alaska Native (AIAN) tribal areas. Previously, the DAS performed all of the post-processing along the standard geographic hierarchy. This led to notable problems for key “off-spine” geographies like the AIAN tribal areas. The new PPMFs reflect system design changes to address this issue. We have implemented a new geographic hierarchy specifically for AIAN tribal areas within each state and have added a new state-level population invariant for AIAN tribal areas.

The new files also reflect a number of additional improvements to the DAS post-processing algorithms, which are designed to reduce distortions that our data users have identified in our earlier demonstration data products.

Improvements in Population Count Accuracy and Fitness-for-Use

Together, these changes have yielded substantial improvement to the accuracy and fitness-for-use of the resulting statistics. For example, in the 2010 Demonstration Data Products released in October 2019, the total population count for the average county was off by approximately 82 people. In the May 27 PPMF, that error was reduced to 16 people. The new PPMFs have further reduced that average error to 6.7  people. Similarly, average error in total population counts for federal American Indian Reservations and Off-Reservation Trust Lands has been reduced to 6.5  people (down from an error of 32.6 people in the May 27 PPMF).

These system improvements reflect our primary goal to maximize the accuracy of population counts (total population counts and counts by specific characteristics), rather than accuracy of percentages or rates. While the overall accuracy of these counts are roughly comparable across all levels of geography, the amount of noise necessary to protect privacy may be relatively larger (in percentage terms) at lower levels of geography with smaller total populations.

Updated Metrics and Tabulations to Follow

To help our data users evaluate the impact of these design improvements on fitness-for-use for their specific uses of census data, we will be releasing new Detailed Summary Metrics in the coming days. These metrics have been revised and expanded based on extensive feedback from our data users. The Committee on National Statistics and the IPUMS National Historical Geographic Information System (NHGIS) will also be producing comprehensive tabulations of the PPMF data for data users to evaluate.

This PPMF is Focused on Redistricting Tables

The Census Bureau’s recent operational schedule changes have necessitated that we focus all of our current attention on preparing for the production of the PL94-171 redistricting data files by the statutory deadline of March 31, 2021. Consequently, these new PPMFs include only the data necessary to support tabulation of tables P1-P5 and H1 of the redistricting data. Data elements necessary to support other data product tabulations are not included in this release. The full set of data elements necessary to support the Demographic Profiles and Demographic and Housing Characteristics files will be reincorporated into DAS runs at a future date.

We encourage our data users to evaluate these new demonstration data to provide feedback that will help us continue our efforts to improve the 2020 DAS and to inform upcoming policy decisions about the DAS system and parameters. To provide additional time for your analysis and feedback for these decisions, we have postponed the upcoming meeting of the Data Stewardship Executive Policy Committee (DSEP) on invariants and system architecture until October 8. We will also provide   in advance of the December 2020 DSEP meeting that will set the privacy-loss budget for the first of the 2020 Census data products.

The statistics included in this newsletter have been cleared for public dissemination by the Census Bureau’s Disclosure Review Board (CBDRB-FY20-DSEP-001).


Just Published


Useful Links:


Sign Up!

Was this forwarded to you? Sign up to receive your own copy!

Sign Up!


Have Suggestions?

Do you have specific questions you'd like us to answer in this newsletter, or topics you'd like discussed? Send us an email at 2020DAS@census.gov and let us know!

Contact Us


About Disclosure Avoidance Modernization

The Census Bureau is protecting 2020 Census data products with a powerful new cryptography-based disclosure avoidance system known as “differential privacy.”  We are committed to producing 2020 Census data products that are of the same high quality you've come to expect while protecting respondent confidentiality from emerging privacy threats in today's digital world. 

 

Share This