Can Data Already Entered Be Extracted?
This is a member-only chapter. Log in with your Signal Over Noise membership email to continue.
Log in to readModule 3 · Section 5 of 7
Can Data Already Entered Be Extracted?
Yes. Carnegie Mellon University and Google DeepMind researchers showed in November 2023 that ChatGPT can be prompted to reproduce verbatim training data. With a $200 research budget, they extracted over 10,000 unique examples of training data including real email signatures with personal contact information, phone numbers, news articles, Stack Overflow source code, and copyrighted legal disclaimers. They estimated adversaries could extract ten times more data with additional queries.
If information you entered into a free-tier AI tool was used in training, that information potentially exists in the model and can potentially be extracted. This is a different risk from a conventional data breach — it is diffuse and hard to quantify — but it is real.