First Release of Post-Baseline Quality Metrics Results

First Release of Post-Baseline Quality Metrics Results

The Census Bureau today released the first set of results  that measure improvements in the iterative design of the 2020 Disclosure Avoidance System (DAS).

As we develop the final DAS throughout 2020 we will continue to produce new results so you can compare the impact of this work against the baseline version of the DAS: the 2010 Demonstration Data Products released in October 2019.

“2010 Demonstration Metrics 2” measure the accuracy of the DAS output using the draft measures released in March (see the 3/27/20 entry in the DAS Updates page for more information).  We are still evaluating user feedback on these metrics and will issue updates as we move forward.

About the Latest DAS Development Work Reflected in These Results

This revised set of metrics was calculated on a national run of the 2020 Disclosure Avoidance System (DAS) on the 2010 Census data following the conclusion of DAS development Sprint II (March 2-March 31, 2020). The most notable change implemented during Sprint II affects how the DAS TopDown Algorithm (TDA) converts the noisy measurements taken from the confidential data into the counts that will be tabulated and published, an operation that we call “postprocessing.”

Previously, the TDA conducted the postprocessing of all of the statistics for a particular geographic level at the same time. Unfortunately, as we saw in the 2010 demonstration data, the TDA had difficulty accurately performing this optimization when there were large quantities of statistics with zeros or very small values processed at the same time. The result was distortions in the data that effectively moved individuals from high- to low-density populations (e.g., from cities to rural areas, or from larger race groups to smaller race groups).

New Multipass Approach

With the changes implemented during Sprint II, the TDA now conducts the post-processing in a series of passes through all the geographic levels.

At the national level, then at the state level, then at each lower level of geography, the first pass of the algorithm solely determines the population counts for each unit within that geographic level (e.g., for all census tracts within a county).

Once those total population counts are determined, the second pass of the algorithm processes just the statistics necessary to produce the redistricting data (also known as the Public Law 94-171 data file), constraining those statistics to the population counts determined in the first pass.

The third pass through the algorithm then processes the core statistics necessary to support population by age, sex, and broad race/ethnicity categories for the demographic analyses that underlie the Population Estimates program. Third-pass statistics are constrained to the sum of the statistics produced for the redistricting data.

A final pass through TDA processes the remainder of the statistics necessary for the Demographic and Housing Characteristics files and the Demographic Profiles, constraining these values to the sum of the ones produced in the third pass.

Privacy-Loss Budget

To compare apples-to-apples and better isolate the impact of iterative DAS changes, this version of the DAS uses the same global privacy-loss budget (PLB) applied to the 2010 Demonstration Data Products, ε=6.0. Of this total budget, the person records use ε=4.0, and the housing records use ε=2.0. 

While more work remains to be done to further improve and optimize the DAS algorithms, these new accuracy metrics are intended to keep our data users informed of our progress in addressing the limitations observed in the 2010 demonstration data.

About Disclosure Avoidance Modernization

The Census Bureau is protecting 2020 Census data products with a powerful new cryptography-based disclosure avoidance system known as “differential privacy.”  We are committed to producing 2020 Census data products that are of the same high quality you've come to expect while protecting respondent confidentiality from emerging privacy threats in today’s digital world.