CPG at ICML 2024
Jul 22, 2024Matthieu Meeus and Igor Shilov introduced their paper [Copyright Traps for Large Language Models](https://arxiv.org/pdf/2402.09363) at ICML 2024 in Vienna.
News from the Computational Privacy Group at Imperial College London
Matthieu Meeus and Igor Shilov introduced their paper [Copyright Traps for Large Language Models](https://arxiv.org/pdf/2402.09363) at ICML 2024 in Vienna.
Nataša Krčo and Igor Shilov led a session about exploring the robustness of modern data privacy systems. Learn more [here](https://www.imperial.ac.uk/news/254914/imperials-computational-privacy-group-lead-session/).
Dr Yves-Alexandre de Montjoye hosted a session on using technology to detect illegal content and assessing the robustness of modern data privacy mechanisms.
Our paper, Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks,(https://dl.acm.org/doi/10.1145/3626232.3653272), co-authored by Ana-Maria Cretu, Miruna Rusu and Yves-Alexandre de Montjoye has received a Best Paper Award at the ACM CODASPY '24 conference!
The CPG went on a two-day retreat to South England.
The CPG attended the CNIL Privacy Research Day in Paris in June 2023. Ana-Maria Crețu presented her paper on automated privacy attacks (Querysnout), Shubham Jain presented both papers on perceptual hashing and Florent Guépin presented his paper on correlation inference attacks.
Ana-Maria Cretu and CPG alumnus Florimond Houssiau (currently a postdoc at The Alan Turing Institute) presented their paper “QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems” at the ACM CCS 2022 conference in Los Angeles.
In a new paper published in Science Advances, Arnaud J. Tournier and Yves-Alexandre de Montjoye propose an entropy-based profiling attack for location data which shows that much more auxiliary information than previously believed is available to re-identify individuals in location data. The results show that individuals are correctly identified 79% of the time in a large location dataset of 0.5 million individuals. The proposed attack is robust to state-of-the-art noise addition and learns time-persistent profiles and their accuracy only slowly decreases over time (linear, roughly 1% per week).
The CPG attended the USENIX Security Symposium in Boston on 10-12 August 2022. Ana-Maria Cretu and Andrea Gadotti presented their papers on evaluating the robustness of perceptual hashing-based client-side scanning systems and on pool inference attacks against Apple's Count Mean Sketch, respectively.
In their new paper, Andrea Gadotti, Florimond Houssiau, Meenatchi Sundaram Muthu Selva Annamalai, and Yves-Alexandre de Montjoye, investigate the practical guarantees of Apple’s implementation of local differential privacy in iOS and macOS. They propose a new type of attacks, called pool inference attacks, where an adversary has access to a user’s obfuscated data, defines pools of objects, and exploits the user’s polarized behavior in multiple data collections to infer the user’s preferred pool. The results show that pool inference attacks are a concern for data protected by local differential privacy mechanisms with a large ε — such as Apple’s Count Mean Sketch mechanism —, emphasizing the need for additional technical safeguards and the need for more research on how to apply local differential privacy for multiple collections.
Our work was featured in Last Week Tonight with John Oliver in their episode on data brokers. The paper appears at 11:31 and features a statistic from “Estimating the success of re-identifications in incomplete datasets using generative models”, namely that 99.98% of Americans can be correctly identified in any dataset using 15 demographic attributes.
On January 27th, 2022, Yves-Alexandre participated in a roundtable of the Digital Regulation Co-operation Forum (DRCF) on perceptual hashing-based client-side scanning and presented CPG’s work on adversarial detection avoidance attacks.
A new Nature Communications paper by Ana-Maria Crețu, Federico Monti, Stefano Marrone, Xiaowen Dong, Michael Bronstein, and Yves-Alexandre de Montjoye reveals that data about people’s interactions can be used to identify individuals in anonymous datasets. The paper shows that the learned profiles are stable and that people’s behavior is still identifiable over a long period of time. The results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together making them personal data under the European Union’s General Data Protection Regulation (GDPR).
Florimond Houssiau, Luc Rocher, and Yves-Alexandre de Montjoye show in this short paper that the privacy guarantees given by Google for their shared aggregated data from 300M Google Maps users to be incorrect.
Our workshop paper titled “Interaction data are identifiable even across long periods of time” (a long version of which will be published soon in Nature Communications) was accepted as a *contributed talk* at the PPML 2021 workshop. A recording of the talk given by lead author Ana-Maria Creţu is available on Youtube.
Ana-Maria Crețu, Shubham Jain, and Yves-Alexandre de Montjoye’s paper on the robustness of perceptual hashing-based client-side scanning to detection avoidance attacks was selected for an *oral presentation* at the Conference on Applied Machine Learning in Information Security (CAMLIS 2021). Ana-Maria gave the talk on Nov 4, 2021.
In their new paper due to appear at USENIX Security 2022, Shubham Jain, Ana-Maria Crețu, and Yves-Alexandre de Montjoye showed perceptual hashing-based client-side scanning mechanisms to be highly vulnerable to detection avoidance attacks. The paper proposes a general black-box attack and demonstrates that >99.9% of images can be successfully modified while preserving the image content.
At the HotPETs 2021 workshop at Privacy Enhancing Technologies Symposium (PETS), Shubham Jain presented his joint work with Ana-Maria Crețu and Yves-Alexandre de Montjoye on vulnerabilities of perceptual-hashing client-side scanning mechanisms to detection avoidance attacks.
CPG paper, “Unique in the Crowd: The privacy bounds of human mobility” was mentioned in a motion to dismiss (MTD) to the US district court for the Central District of California in “Justin Sanchez, et al. v. Los Angeles Department of Transportation, et al.”
“The risk of re-identification remains high even in country-scale location datasets” by Ali Farzanehfar, Florimond Houssiau, and Yves-Alexandre de Montjoye appeared today in Cell Pattern. The paper measures, mathematically models, and provides a lower bound on the relationship between the size of a dataset and the risk of re-identification as measured by unicity. The results show that the risk of re-identification decreases very slowly with increasing dataset size, contradicting previous claims.
Andrea was invited by Computerphile to present the anonymity problems in location data. The full video is available on YouTube.
While governments are ramping up their efforts to slow down the spread of COVID-19, contact tracing apps are being developed to record interactions and warn users if one of their contacts is later diagnosed positive. These apps could help avoid long-term confinement, but also record fine-grained location or close-proximity data. In this blog post, we propose 8 questions one should ask to understand how protective of privacy an app is.
Used correctly, mobile phone data could help monitor the effectiveness of lockdown measures and track contacts of people who have been tested positive. We've been asked if the data could be collected and used effectively without enabling mass surveillance. This is our response.
Yves-Alexandre is organizing a panel in Davos on ‘Europe’s digital leadership: can AI and privacy co-exist?’
Our noise-exploitation attack against Aircloak's Diffix system was presented by Andrea and Luc at USENIX Security 2019! The paper is available on the USENIX website, together with the slides and the video from the presentation. For more details about this paper, you can read our blog post and an article on TechCrunch.
In a new paper published in Nature Communications, Luc and Yves-Alexandre show how the incompleteness of datasets does not provide plausible deniability to participants. Contradicting previous claims, they show that sampling does not decrease the risk of re-identification.
On 17 June 2019, Andrea Gadotti presented CPG’s research at Westminster as part of the "In conversation with the National Statistician" event. On 26 June, he presented at the Evidence Week, organised by Sense About Science.
On 26 June 2019, Ali Farzanehfar was invited to the European Commission in Brussels to present the group's work on data anonymization. The event was part of the European Commission’s Connect Summer School (DG CONNECT) on pressing matters in the modern world.
On 9 May 2019, Andrea Gadotti was invited to the Ministry of the Interior in Berlin to speak at the round table meeting of the Data Ethics Commission. The meeting was live streamed on the Ministry's website and is now available on YouTube (Andrea's presentation starts at min 27:56).
Our work on capturing and visualising the data leaked by mobile devices's WiFi has been accepted at The Web Conference 2019 (WWW ‘19).
Olivia Solon writing for the Guardian on the (sharp) limits of data anonymization and ways forward includes quotes from CPG group leader, Yves-Alexandre de Montjoye.
We studied Diffix, a system developed and commercialized by Aircloak to anonymise data by adding noise to SQL queries sent by analysts. In a manuscript we just published on arXiv, we show that Diffix is vulnerable to a noise-exploitation attack. In short, our attack uses the noise added by Diffix to infer people’s private information with high accuracy. We share Diffix’s creators opinion that it is time to take a fresh look at building practical anonymization systems.
Recent revelations from Cambridge Analytica show how vulnerable our privacy is to seemingly innocuous apps installed by our friends. We here show how our privacy is affected by people we interact with. Node-based intrusions are becoming one of the main threat to our privacy.
Artificial Intelligence (AI) has potential to fundamentally change the way we work, live, and interact. There is however, no general AI out there and the accuracy of current machine learning models largely depend on the data on which they have been trained. For the coming decades, the development of AI will depend on access to ever larger and richer medical and behavioral datasets. We now have strong evidence that the tool we have used historically to find a balance between using the data in aggregate and protecting people’s privacy, de-identification, does not scale to big data datasets. The development and deployment of modern privacy-enhancing technologies (PET), allowing data controllers to make data available in a safe and transparent way, will be key to unlocking the great potential of AI.