Do you agree with the analysis presented in this document?

Short answer

Yes. We agree with most of it. For GenAI to comply with the UK GDPR, providers need to honour the letter and spirit of data subjects’ rights. Poorly-worded or long-winded policies or responses which lack precision and meaning are likely to make things worse, not better, for data subjects. Equally, hard-to-access processes water down fundamental rights and freedoms of data subjects.

More detailed explanation and analysis

We agree with much of the analysis presented in the document but we have reservations as to how useful the analysis will be for both developers trying to build GenAI models using personal data in a way which is consistent with the law and for data subjects seeking to understand and control the way in which their personal data is used by GenAI. The analysis provides broad objectives but we consider that more definitive guidance from the ICO as to what would / wouldn’t be sufficient to achieve the broad objectives would be useful for all concerned.

We agree with the analysis that there are three key elements to ensuring individuals’ rights are protected:

Awareness / Right to be informed – ensuring data subjects are aware that their personal data is being processed and why, how and by whom
Right of access – enabling data subjects to access their information which is being processed and to gather more specific details about that processing
Control – giving data subjects the ability to control how their personal data is processed by protecting their rights to erasure, rectification and restriction of processing and their right to object to processing

We have addressed each of these elements below.

Awareness / Right to be informed

The ICO’s analysis makes numerous references to developers needing to be clear and transparent with data subjects about how their data is being used for GenAI training and emphasises the need for individuals to have a meaningful understanding and clear expectations of what happens to their data. However, it provides little meaningful detail about (or examples of) how this transparency can be achieved.

We think it will be hard for GenAI providers to be clear and transparent with data subjects about personal data collected via web-scraping or where that data is otherwise not collected directly from the data subject. We are conducting our own independent analysis of personal data scraped from the internet used by LLMs and so far our research shows that LLMs are not able to segment one data subject from another in the training stage and application stage of processing.

The ICO’s analysis appears to give developers an invitation to rely on the Article 14 exception where lots of records are being processed but it is not clear about the circumstances in which the exception will be appropriate or the extent of the steps which developers should take to make information publicly available. We feel the guidance risks encouraging some organisations to overly rely on the Article 14 exceptions (i.e. it is impossible or requires disproportionate effort) to avoid building in data subject rights and following the privacy by design principle. Given the complexities of GenAI models and their inherent ongoing learning, it will prove challenging to organisations to explain how GenAI models work to data subjects in a concise and easy-to-understand way.

Right of access

The analysis makes reference to the argument which is most likely to be put forward by developers when refusing to respond to a SAR relating to GenAI model datasets, namely that they cannot identify the individual (in the training data or anywhere else). However, the ICO’s own guidance on finding and retrieving information states that when using large datasets and unstructured data, which can pose difficulties when producing all of the data held on an individual, it is even more important that controllers practice good data management, not just for facilitating the right of access but also because of the UK GDPR's legal requirements on accountability and documentation.

As we have addressed in our response to question five, we consider it to be the controller’s responsibility to ensure that data is collected and stored in a way which ensures there is adequate metadata and the ability to query the data to find all the information held on an individual.

From our experience, trying to access information being processed by OpenAI or chatbots such as Alexa initially appeared straightforward but it was clear that the information provided was far from a full list of all data being processed about the individual making the request.

Ongoing Subject Access Request

One of the contributing party submitted a right of access request to OpenAI as part of this project to provide hands-on experience of this issue. The response provided was generic, non-specific and did not actually give the data subject a copy of their personal data. Indeed, we have as a group seen firsthand how poor a response to a subject access request can be whereby the response is vague only providing access to the inputs from the data subject, rather than explaining the sources in an easy-to-understand way. The response deflects the data subject back to a tapestry of privacy links to hunt for the relevant information, all of which is vague. The subject access request response also highlighted that the only tools for accessing data subject rights are usually via:

Direct emails with the provider; or
Logging into the GenAI solution and following a series of links to either erase search history or chat functions.

Neither of these options was particular easy to use or effective and, at the time of writing, it still appears that personal data of the data subject is being processed which was not provided to him in the subject access response. It also appears to the writers that data subject rights were not part of the design of the GenAI system architecture; if they were, these processes would be easier to observe.

Control

As with exercising the right of access, current mechanisms used to enable data subjects to exercise their rights to erasure, rectification and restriction of processing and their right to object to processing are not fit for purpose as a result of the difficulties associated with identifying data in large datasets.

Privacy by design

We have limited experience of technical tools such as input / output mechanisms and mitigation measures (see our response to question four). However, we consider it to be important for developers and deployers of GenAI models to ensure they follow a privacy by design approach and identify at the outset how they are going to ensure the protection of data subjects’ rights.

Simply raising awareness and publicising statements informing people that by putting personal data onto the internet that data may be used for developing GenAI does not provide sufficient protection or meaningful control for data subjects. Other steps, such as full anonymisation, need to be considered to limit the extent of personal data which is being processed and, where significant minimisation is not possible, enhanced security, encryption and pseudonymisation of data should be applied.

We agree with the view expressed by Jamie Susskind in his book ‘The Digital Republic’ and what he calls the ‘consent trap’. It is our view that simply giving people information about the ways in which their personal data is going to be used for GenAI development does not do anything to increase the data subject’s power and control nor does it rebalance the relationship between technology companies and individuals. Instead, GenAI companies should have separate obligations to protect the rights and interests of data subjects and to remain accountable for their GenAI models.

It appears that current models have been developed with a build-first, comply-second approach, with engagement and scale being prioritised over accuracy and compliance. This has resulted in the rights of individuals being given little more than lip service and not being adequately protected. We call on the ICO to ensure a robust approach is taken to protecting the rights of data subjects. This is not to stifle creativity and development, but to ensure the UK’s GenAI industry creates and deploys systems which are safe, ethical and trustworthy and to ensure the remarkable benefits of GenAI are not marred by risks and negative consequences.