Challenge Winners Awarded $161,000 for Outperforming Traditional Privacy Techniques for Both Accuracy and Privacy
DrivenData, the host of data science competitions that advance solutions for social good, and HeroX, the social network for innovation and the world's leading platform for crowdsourced solutions, today announced the winners of the third and final sprint of the Algorithm Contest of the Differential Privacy Temporal Map Challenge, which was sponsored by the Public Safety Communications Research (PSCR) Division of the National Institute of Standards and Technology (NIST).
With a prize purse totaling $161,000 across the entire challenge, today's announcement of the third algorithm sprint offered $25,000 to the first place winner. The team, "N - CRiPT", a group of differential privacy researchers from the National University of Singapore, Alibaba Group, secured first place. Their goal was to bring differential privacy into a practical setting. The second place winner was the "Minutemen" team, a group of differential privacy graduate students from the University of Massachusetts Amherst.
The focus of this prize challenge was to create synthetic data that preserves the characteristics of a dataset containing time and geographic information. Synthetic data has the ability to offer greater privacy protections than traditional anonymization techniques. Differentially private synthetic data can be shared with researchers, policy makers, and even the public without the risk of exposing individuals in the original data. However, the synthetic records are only useful if they preserve the trends and relationships in the original data.
Contestants of this challenge were charged with developing algorithms that de-identify datasets while maintaining a high level of accuracy. This ensures the data is both private and useful. Top contestants of the final sprint demonstrated algorithms that produce records with both more privacy and greater accuracy than the typical subsampling techniques used by many government agencies to release records.
The first sprint featured data captured from 911 calls in Baltimore, MD made over the course of one year. Participants in this sprint were tasked with developing de-identification algorithms designed to generate privatized data sets using the monthly reported incident counts for each type of incident by neighborhood. Winners were announced here.
The second sprint used demographic data from the U.S. Census Bureau's American Community Survey which surveyed individuals in various U.S. states from 2012 to 2018. The data set included 35 different survey features (such as age, sex, income, education, work and health insurance data) for every individual surveyed. Simulated longitudinal data was created by linking different individual records across multiple years, which increased the difficulty of protecting each simulated person's privacy. To succeed in this sprint, participants needed to build de-identification algorithms by generating a set of synthetic, privatized survey records that most accurately preserved the patterns in the original data. Winners were announced here.
The third sprint centered around taxi rides taken in Chicago, Illinois. Because the sprint focused on protecting the taxi drivers rather than just their trips, competitors needed to provide privacy for up to 200 records per individual driver, a very challenging problem. They were evaluated over 77 Chicago community areas. The deidentified synthetic data needed to preserve the characteristics of taxi trips in each community area, the patterns of traffic between communities, as well as the population characteristics of taxi drivers themselves (typical working times and locations). The top two winning teams were each able to produce synthetic data that provided very strong privacy protection and was also more accurate for analysis than data protected by traditional privacy techniques such as subsampling.
Challenge participants are now eligible to earn up to $5000 for creating and executing a development plan that further improves the code quality of solutions and advances their usefulness to the public safety community. Participants can also earn the Open Source prize, an additional $4000, by releasing their solutions in an open source repository. Winning solutions will be those that meet differential privacy after being uploaded to an open source repository.
About DRIVENDATA
DrivenData is a social enterprise dedicated to bringing the data tools and methods that are transforming industry to the world's biggest challenges. As part of that work, DrivenData's competition platform channels the skills and passion of data scientists, researchers, and other quantitative experts to build solutions for social good. These online machine learning challenges are designed to engage a large expert community, connect them with real-world data problems, and highlight their best solutions.
About HEROX
HeroX is a social network for crowdsourcing innovation and human ingenuity, co-founded in 2013 by serial entrepreneur, Christian Cotichini and XPRIZE Founder and Futurist, Peter Diamandis. HeroX offers a turnkey, easy-to-use platform that supports anyone, anywhere, to solve everyday business and world challenges using the power of the crowd. Uniquely positioned as the Social Network for Innovation, HeroX is the only place you can build, grow and curate your very own crowd.
Media Contact:
Alexandra Pony
250.858.0656
To learn about eligibility requirements, visit challenge.gov, and for additional information about the challenge, visit DrivenData.org.
NIST, a nonregulatory agency of the U.S. Department of Commerce, promotes U.S. innovation and industrial competitiveness by advancing measurement science, standards and technology in ways that enhance economic security and improve our quality of life. To learn more about NIST, visit NIST.gov.
Media Contacts:
To arrange an interview and/or any media inquiries with NIST, please contact Jennifer Huergo at (202) 309-1027 and
SOURCE DrivenData; HeroX