Navigating the legal intricacies of scraping personal data for AI development
 
					Part 1: 5 Essential safeguards for website operators
In the rapidly evolving world of artificial intelligence, data scraping is a hot topic. The copying of online text, images and videos has beneficial use cases (e.g. training AI models for more accurate fraud detection or collecting contact details of business representatives for marketing purposes).
But is it legal? The answer isn’t straightforward. The legality of data scraping depends on how and why the user scraped data, the intended use, and whether access was authorised. But, this week, there has been updated guidance from 16 global regulators[1] on data scraping for AI development, focusing on privacy compliance.
Key takeaways
Data scraping for AI development is a complex issue with significant legal and ethical implications. The key points are:
- website operators must proactively protect personal data from unlawful scraping;
- contractual terms alone cannot render data scraping lawful; and
- mass data scraping incidents can be high-risk data breaches.
Part 1 of this analysis highlights 5 of the essential safeguards website operators must take to ensure responsible data scraping practices.
The current regulatory landscape
16 data protection regulators, including the UK Information Commissioner’s Office, co-signed a statement addressing privacy concerns in data scraping. More info available here: Global privacy authorities issue follow-up joint statement on data scraping after industry engagement | ICO.
What is the key point of this statement? Publicly accessible personal data remains subject to data protection laws.
Organisations using scraped data for AI training must comply with relevant laws. This includes ensuring there is a lawful basis for the processing. Public interest, research or statistical purposes might be a lawful basis for this activity in some countries. In others, it might be consent. But it is important to get this right – and the obligations of users of scraped personal data will be considered in part 2.
The remainder of this article will consider how website operators can protect personal data from unlawful scraping.
5 Essential safeguards
Challenges in preventing data scraping are recognised by the industry. There are increasingly sophisticated scrapers, some using intelligent bots that can simulate real user activity. Website operators also find it difficult to differentiate scrapers from lawful users while maintaining a user-friendly interface. So, what can be done?
To protect against unlawful data scraping, website operators should implement a combination of safeguards. Here are 5 essential recommended measures:
- Implement data minimisation: As a first step, limit the publicly accessible personal data available. Ensure that data isn’t “sensitive”.
- Consider contractual terms: Contractual terms authorising the scraping of personal data should:
 a) indicate that third parties must comply with applicable laws,
 b) specify limitations on what information can be scraped and the purposes for which it may be used,
 c) detail the consequences for non-compliance with those terms.
 Note: Even this, in and of itself, can’t render such scraping lawful. Contracts are just one important safeguard.
- Implement website design elements:
 a) Use features like random account URLs to deter automated scraping
 b) Use CAPTCHAs and IP blocking to prevent excessive data collection
 c) Limit the number of visits per day by one account
 d) Consider whether to provide access the publicly accessible personal data via an API, with credentials to be verified
- Stay vigilant and take legal action: Send cease and desist letters when unauthorised scraping is suspected or confirmed. Then, require the deletion of unlawfully scraped information.
- Designate a dedicated team: Assign responsibility for monitoring and responding to unlawful data scraping activities and ensure you have an effective data governance procedure.
By implementing these safeguards, you’ll be taking some positive steps towards navigating the legal intricacies of web scraping while fostering responsible AI innovation. However, it is important to continuously monitor both the threat landscape and personal data made publicly available you make available on your website. Then regularly adjust your safeguards accordingly. The regulators have confirmed: failure to implement adequate safeguards to prevent unlawful data scraping could result in regulatory enforcement action.
Contact Linzi Penman for more information.
[1] Australia, Canada, UK, China, Norway, Switzerland, New Zealand, Colombia, Morocco, Jersey, Argentina, Mexico, Spain, Guernsey, Monaco, Israel