It is important whenever designing new technologies to ask “how will this affect people’s privacy?” This topic is especially important with regard to machine learning, where machine learning models are often trained on sensitive user data and then released to the public. For example, in the last few years we have seen models trained on users’ private emails, text messages, and medical records.
This article covers two aspects of our upcoming USENIX Security paper that investigates to what extent neural networks memorize rare and unique aspects of their training data.
Specifically, we quantitatively study to what extent following problem actually occurs in practice:
While our paper focuses on many directions, in this post we investigate two questions. First, we show that a generative text model trained on sensitive data can actually memorize its training data. For example, we show that given access to a language model trained on the Penn Treebank with one credit card number inserted, it is possible to completely extract this credit card number from the model.
Second, we develop an approach to quantify this memorization. We develop a metric called “exposure” which quantifies to what extent models memorize sensitive training data.
Read the rest of this post here