Fine-Tuning the Disclosure Avoidance System to Ensure Accuracy
While much of the Census Bureau’s efforts are focused on production of the first of the 2020 Census data products—the data used to apportion seats in the U.S. House of Representatives—work continues on the first set of 2020 Census data products that will be protected using the TopDown Algorithm (TDA) central to the Disclosure Avoidance System (DAS).
Our current schedule points to April 30, 2021, for release of the apportionment counts. We will announce the release schedule for the first DAS-protected product—the redistricting (P.L. 94-171) data—once planning for this product is complete.
In the meantime, we are continuing to improve the DAS to ensure that the published data for the 2020 Census meet legislative, programmatic, and data user needs. Over the last few months, we implemented a number of system design changes that addressed the post-processing distortions observed in our earlier demonstration data.
We have already evaluated the effectiveness of these system improvements on the tabulations necessary to support the redistricting (P.L. 94-171) data. We are extending these system improvements to provide similar benefits for the remaining tabulations necessary to support the Demographic and Housing Characteristics (DHC) File.
With these improvements in place, we are confident that the accuracy of the resulting data is now under the direct control of the privacy-loss budget (PLB) and related tunable system parameters, and we can shift our focus to setting those parameters. The Census Bureau’s Data Stewardship Executive Policy (DSEP) committee, a committee comprised solely of senior career executives, will choose the ultimate PLB and tunable system parameters.
Underway: Experimental Data Runs to Identify Optimal PLB and Other Parameters
Over the last several weeks, the DAS team has been conducting numerous full-scale experimental runs of the DAS using 2010 Census data. These experimental runs allow us to evaluate different settings of key system parameters, including:
- The optimal set and processing order of queries against the confidential data.
- The share of PLB allocated to the different queries.
- The share of PLB allocated to tabulations at different geographic levels (e.g., county and tract).
Current experiments focus on the redistricting data release. The results of current and future experiments will allow DSEP to determine the optimal configuration of the parameters necessary to prioritize accuracy and fitness-for-use across the different tables that comprise the redistricting and the DHC data products.
The resulting data will also allow the DSEP to identify the overall PLB and allocation combination necessary to support the many governmental, demographic, and research use cases that our data users and other stakeholders have brought to our attention. As a reminder, the apportionment data are not subject to the TopDown Algorithm; those data include state-level totals only, and aggregation of records to the state level is sufficient privacy protection of the individual underlying records.
DSEP’s decision-making will be based on comprehensive empirical analyses of these experimental DAS runs (at a wide range of PLBs) to identify the PLB needed to best achieve these fitness-for-use targets. At the same time, the DSEP will also evaluate the impact of those PLBs on the privacy guarantees that we can make to our respondents for the overall protection of their various characteristics at different levels of geography.
Upcoming Data and Metrics Will Reflect the Tunable Parameters
Although the November 16, 2020, privacy-protected microdata files (PPMFs) substantially reduced the post-processing errors identified in earlier DAS runs, many of you expressed concerns that these files still produced unacceptably high levels of error for some of your key tabulations. We want to assure you that this is a function of the fixed PLB that has been used for all PPMFs. Future results will not be held to that fixed PLB.
We intentionally set the PLB for all PPMF releases to be functionally equal to the PLB used for the October 2019 Demonstration Data Products release. Keeping the PLB constant across these demonstration runs of the system allowed us and our data users to evaluate subsequent iterations of the DAS for improvements in the post-processing portion of the algorithm. Thus, improvements seen across these releases can be directly attributed to algorithmic improvements aimed at mitigating post-processing errors. With these improvements now in place, the overall data accuracy and the relative accuracy of certain tables over others can be directly controlled by the setting and allocation of PLB, as discussed above.
Once these experimental runs are complete, we plan to release an additional set of PPMFs and Detailed Summary Metrics for the redistricting data product. These PPMFs will use a higher allocation of PLB to allow our data users to evaluate demonstration data that will more readily approximate the anticipated privacy/accuracy tradeoff of the 2020 Census data products.
We plan to release these data prior to the DSEP’s final decisions on PLB allocation for the redistricting data products, giving data users an opportunity to submit feedback to inform those decisions. Then, after DSEP has chosen the PLB and final DAS parameters, we will release a final “production-ready” set of redistricting data PPMFs produced using the 2010 Census data. These will reflect the actual PLB and system parameters that will produce the official 2020 Census redistricting data product.
Our primary objective throughout these efforts continues to be the production of high-quality data products that meet our data users’ needs while ensuring that our respondents’ data are protected against the new privacy threats that we have identified. As we work toward this goal, we remain grateful to the many data users who have evaluated the PPMFs and provided invaluable feedback to inform our efforts and decision-making.
Was this forwarded to you?
Sign up to receive your own copy!
Useful Links:
Have Suggestions?
Do you have specific questions you'd like us to answer in this newsletter or topics you'd like discussed? Send us an email at 2020DAS@census.gov and let us know!
|