What measures, if any, including input and output filters have proved effective in enabling data subject rights in the context of generative AI models?
Short answer
We do not consider that input and output filters alone are sufficient to meet the full spectrum of data subject rights in respect of GenAI models. PETs (see question four) and other technology solutions need to be used in a universal and easily accessible manner.
More detailed explanation and analysis
Input and output filters can contribute to complying with data subject rights and, more broadly, be part of the technical and organisational measures that are required to be in place pursuant to Article 32 of the UK GDPR. However, on their own, and based on our research, we have not come across any evidence to suggest that such filters are the sole answer.
For example, and as mentioned specifically in the consultation paper, machine unlearning is a concept that could support compliance with a data subject’s right to be forgotten, involving the use of techniques that allow a model to “forget” certain data. It’s worth noting that even if a model does not “memorise” a data subject’s personal data exactly or include that data as output, a sophisticated cyberattack could potentially expose that personal data.
There are numerous machine unlearning methods, including (i) ‘Exact Unlearning’ meaning a complete retrain of the model minus the data to be forgotten; (ii) ‘Approximate Unlearning’ meaning the use of algorithms to adjust the model incrementally; (iii) ‘Gradient-based Methods’ meaning the adjustment of the model’s parameters; and (iv) ‘Data Partitioning’ meaning training the model on a subset of data so that unlearning can be more efficiently targeted at particular subsets of data.
As mentioned in response to question four, OpenAI has an online portal through which an individual can request to exercise their data rights (e.g. to download their data, to request removal of their data from outputs of the model, delete their account or ask to stop using their data to train the model). One of our colleagues recently submitted a privacy request to OpenAI and received a swift response, however the response was not helpful and did not provide the requested data in the first instance. He has since sent OpenAI another message through the privacy portal but is yet to receive a response. As such, this approach does not appear to be particularly effective because, rather than simply providing the requested data in accordance with Articles 13 and 14 of the UK GDPR, our colleague was pointed to various other things that he needed to do in order to download his data. This did not make it straightforward or easy to request the data.
As noted in the ICO’s analysis, the argument put forward by developers when faced with questions about why they have not provided details of all of the personal data which they are processing about an individual is that they cannot identify the individual in the database. However, we consider it to be the controller’s responsibility to ensure that data is collected and stored in a way which ensures there is adequate metadata and the ability to query the data to find all the information held on an individual.
Another measure that organisations implement is the use of input filters to minimise the personal data used in GenAI models and the use of output filters to monitor the output of GenAI models to ensure that the output does not contain any sensitive data. This seems to be the most common and effective way of enabling an organisation to fully comprehend the data that is being used in a GenAI model and as such, better enabling data subject rights.