Preliminary Research into Alternatives and Supplements to Differential Privacy; Webinar Friday 6/4

Registered United States Census Bureau Logo
Privacy lock

Preliminary Research into Alternatives and Supplements to Differential Privacy; Webinar Friday 6/4


Census Bureau researchers last week presented at advisory committee meetings preliminary findings from experiments designed to illustrate how techniques used to protect privacy in earlier censuses would impact census results if applied today on 2010 Census results.

The findings offer stakeholders a tool for comparing the trade-offs between the earlier methods and the new approach designed for application to the P.L. 94-171 redistricting data, the TopDown Algorithm, which is based on the principles of differential privacy.

The research concludes that reusing the 1980 suppression techniques would significantly limit the amount of data that could be published.  Relaxing and extending the 2010 Census swapping algorithm would not improve the re-identification outcomes, regardless of the swapping rate used.

By releasing this analysis, we aim to give stakeholders a better empirical understanding of the need to modernize our disclosure avoidance methods. Early results from our internal research in 2017 found that the protections used for the 2010 Census could no longer withstand a data reconstruction attack. Continuing to use those methods for the 2020 Census was not an option.

Also important is the fact that if the Census Bureau were to revert to a system based on traditional methods, making that shift would require significant time – at least six months – to retool systems and processes and conduct quality assurance after such a decision were made. 

To ensure timely redistricting data delivery, the Data Stewardship Executive Policy Committee is set to make a determination on the privacy loss budget for the 2020 redistricting data in early June. 

Findings - Suppression Analysis

Suppression – removing information from published tables to protect privacy – was last used as a primary disclosure avoidance technique in 1980.

Applying the suppression rules from the 1980 Census to 2010 Census P.L. 94-171 Redistricting data, whole tables would be suppressed for geographies with between 1 and 14 persons. These counts would be for a reduced set of race and ethnicity categories based on OMB Directive 15, representing only 14 OMB-designated race and ethnicity groups as follows:

  • For Table P3 - Race for the Population 18 Years and Over: Tables for more than 84% of blocks, block groups and tracts would be suppressed.
  • For Table P4 - Hispanic or Latino, and not Hispanic or Latino, by Race for the Population 18 Years and Over: Tables for more than 87% of blocks, block groups and tracts would be suppressed.

For additional tables, individual cells would be suppressed (replaced with “0”) if counts in those cells were 1 or 2:

  • For Table P1, Race: Cells would change to a “zero” value for 8% of blocks, 10% of block groups, and 5.5% of tracts. That number would likely be much higher with a fuller analysis since suppression of complementary cells will also be required.
  • For Table P2, Hispanic or Latino, and not Hispanic or Latino, by Race, cells would change to a “zero” value for 6% of blocks, 14.5% of block groups, and 11% of tracts.That number would likely be much higher with a fuller analysis since suppression of complementary cells will also be required.

Applying the previously used suppression rules to the 2010 Census Summary File 1 (SF1) tables found that, at the block level, more than 38% of person tables and 32% of housing unit tables would be suppressed.

Findings - Swapping Analysis

The research team analyzed the impact of relaxing and extending the 2010 Census swapping algorithm to the data. Options explored included combinations of the following:

  • Swap rates ranging from 5 - 50% of housing units,
  • Altering household size +/- 1 for up to 80% of housing units
  • Altering the target tract for the swap partner within a county or within a state for up to 70% of housing units.

The analysis revealed:

  • Low swap rates (defined as a simple swap percentage of 5%, without altering households or tracts) yielded essentially the same high re-identification outcomes as for the actual 2010 Census Summary File 1.
  • High swap rates (defined as a swap percentage of 50%, altering 50% of household sizes and 70% of swap partner tracts) only minimally improved re-identification outcomes. Moreover, accuracy metrics when using high swap rates were inferior to the most recent (4/28/2021) Privacy-Protected Microdata File (PPMF).

These imply that mid-level swap rates, as implemented, may match the TopDown Algorithm in terms of accuracy but will have a low impact on reducing re-identification.


Join Us Friday, June 4, for a Webinar on Research on Alternatives to Differential Privacy

We are hosting a webinar this Friday, June 4, to walk through the research and take audience questions. The webinar will be recorded and posted as part of our series on Disclosure Avoidance. There you will also find transcripts and recordings for the previous webinars in the series.

Details:

Time: 2:00 – 3:00 pm (ET) 

WebEx log-in:  Click here to join the meeting

WebEx event number (If needed):  199 855 0149

WebEx event password (If needed):   Census#1

Audio: Listen to the webinar in one of two ways: 

Using your computer's speakers (choose "Audio Broadcast," which is 1-way audio)  -OR-  Using your TELEPHONE (call 888-996-4917, code: 9385910#)


2021 Key Dates, Redistricting (P.L. 94-171) Data Product

The Census Bureau’s Data Stewardship Executive Policymaking Committee (DSEP) will meet in early June to review the latest data regarding the TopDown Algorithm and approve settings and parameters. Their decisions will be informed by the feedback we’ve received from numerous stakeholders, which has resulted in ongoing fine-tuning of the algorithm since the release of the last demonstration data set on April 28. Additional fine-tuning as directed by the DSEP will continue through June, with quality control analysis leading to the FTP release of the redistricting data by August 16.

Early June:                   

  • The Census Bureau’s Data Stewardship Executive Policy (DSEP) Committee makes final determination of PLB, system parameters based on data user feedback for P.L. 94-171.

Late June:                    

  • Final DAS production run and quality control analysis begins for P.L. 94-171 data.

By August 16:

  • Release 2020 Census P.L. 94-171 data as Legacy Format Summary File*.

September:                 

  • Census Bureau releases PPMFs and Detailed Summary Metrics from applying the production version of the DAS to the 2010 Census data.
  • Census Bureau releases production code base for P.L. 94-171 redistricting summary data file and related technical papers.

By September 30:         

  • Release 2020 Census P.L. 94-171 data** and Differential Privacy Handbook.

*   Released via Census Bureau FTP site.

** Released via data.census.gov.


Was this forwarded to you?

Sign up to receive your own copy!

Sign Up!


Useful Links:


Contact Us

About Disclosure Avoidance Modernization

The Census Bureau is protecting 2020 Census data products with a powerful new cryptography-based disclosure avoidance system known as “differential privacy.”  We are committed to producing 2020 Census data products that are of the same high quality you've come to expect while protecting respondent confidentiality from emerging privacy threats in today's digital world. 

 

Share This