Human Rights Watch is grateful for the invitation to provide evidence to the Select Committee on Adopting Artificial Intelligence (AI) at its public hearings on July 16 and 17, 2024. In advance of the hearings, this submission covers our recent research on the scraping and misuse of Australian children’s personal photos to build AI tools without their knowledge or consent.
While this submission focuses on the protection of children’s rights, we also urge the Select Committee to examine the broader risks to everyone’s human rights.
Australian Children’s Personal Photos Misused to Power AI Tools
In July 2024, Human Rights Watch reported that it had uncovered the scraping and use of personal photos of Australian children to create powerful AI tools without the knowledge or consent of the children or their families.[1] These photos are scraped off the web into a large data set that companies then use to train their AI tools. In turn, others use these tools to create malicious deepfakes that put even more children at risk of exploitation and harm.
Analysis by Human Rights Watch found that LAION-5B, a data set used to train popular AI tools and built by scraping most of the internet, contains links to identifiable photos of Australian children. Some children’s names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.
One such photo features two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural. The accompanying caption reveals both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia. Information about these children does not appear to exist anywhere else on the internet.
Human Rights Watch found 190 photos of children from all of Australia’s states and territories. This is likely to be a significant undercount of the amount of children’s personal data in LAION-5B, as Human Rights Watch reviewed fewer than 0.0001 percent of the 5.85 billion images and captions contained in the data set.
The photos Human Rights Watch reviewed span the entirety of childhood. They capture intimate moments of babies being born into the gloved hands of doctors and still connected to their mother through their umbilical cord; young children blowing bubbles or playing instruments in preschools; children dressed as their favorite characters for Book Week; and girls in swimsuits at their school swimming carnival.
The photos also capture First Nations children, including those identified in captions as being from the Anangu, Arrernte, Pitjantjatjara, Pintupi, Tiwi, and Warlpiri peoples. These photos include toddlers dancing to a song in their Indigenous language; a girl proudly holding a sand goanna lizard by its tail; and three young boys with traditional body paint and their arms around each other.
Many of these photos were originally seen by few people and previously had a measure of privacy. They do not appear to be possible to find through an online search. Some photos were posted by children or their family on personal blogs and photo- and video-sharing sites. Other photos were uploaded by schools, or by photographers hired by families to capture personal moments and portraits. Some of these photos are not possible to find on the publicly accessible versions of these websites. Some were uploaded years or even a decade before LAION-5B was created.
Human Rights Watch found that LAION-5B also contained photos from sources that had taken steps to protect children’s privacy. One such photo is a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating Schoolies week after their final exams. The video’s creator took precautions to protect the privacy of those featured in the video: Its privacy settings are set to “unlisted,” and the video does not show up in YouTube’s search results.
YouTube’s terms of service prohibit scraping or harvesting information that might identify a person, including images of their faces, except under certain circumstances; this instance appears to violate these policies. YouTube did not respond to our request for comment.
Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people.[2] Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken.[3]
Moreover, current AI models cannot forget data they were trained on, even if the data was later removed from the training data set. This perpetuity risks harming Indigenous Australians in particular, as many First Nations peoples restrict the reproduction of photos of deceased people during periods of mourning.
These privacy risks pave the way for further harm. Training on photos of real children enables AI models to create convincing clones of any child, based on a handful of photos or even a single image.[4] Malicious actors have used LAION-trained AI tools to generate explicit imagery of children using innocuous photos, as well as explicit imagery of child survivors whose images of sexual abuse were scraped into LAION-5B.[5]
Likewise, the presence of Australian children in LAION-5B contributes to the ability of AI models trained on this data set to produce realistic imagery of Australian children. This substantially amplifies the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said nor did.
In June 2024, about 50 girls from Melbourne reported that photos from their social media profiles were taken and manipulated using AI to create sexually explicit deepfakes of them, which were then circulated online.[6]
Fabricated media have always existed, but required time, resources, and expertise to create, and were largely unrealistic. Current AI tools create lifelike outputs in seconds, are often free, and are easy to use, risking the proliferation of nonconsensual deepfakes that could recirculate online forever and inflict lasting harm.
LAION, the German nonprofit organization that manages LAION-5B, confirmed on June 1 that the data set contained the children’s personal photos found by Human Rights Watch, and pledged to remove them, saying they would send Human Rights Watch confirmation of the removal once it was completed. As of July 9, it has not provided confirmation that it has removed the children’s data from its data set. LAION also disputed that AI models trained on LAION-5B could reproduce personal data verbatim. It said: “We urge the HRW to reach out to the individuals or their guardians to encourage removing the content from public domains, which will help prevent its recirculation.”
Mark Dreyfus, Australia’s attorney general, recently introduced a bill in parliament banning the nonconsensual creation or sharing of sexually explicit deepfakes of adults, noting that such imagery of children would continue to be treated as child abuse material under the Criminal Code.[7] However, this approach misses the deeper problem that children’s personal data remains unprotected from misuse, including the use of AI to nonconsensually manipulate real children’s likenesses into any kind of deepfake.
Human Rights Watch encourages the Select Committee to recommend that any proposed AI regulations or policies:
- Respect, protect, and promote human rights throughout the development and use of AI. Such regulations or policies should also ensure that all stakeholders—government, companies, organizations, and individuals—refrain from or cease the development or use of AI that are inconsistent with international and national human rights laws or that pose undue risks to the enjoyment of human rights.[8]
- Incorporate data privacy protections for children. Children are entitled to special protections that guard their privacy, which is vital to ensuring their safety, agency, and dignity. The government should take special care to protect children’s privacy with respect to AI, as the nature of the technology’s development and use does not permit children and their guardians to meaningfully consent to how children’s data privacy is handled.
- Prohibit the scraping of children’s personal data into AI systems, given the privacy risks involved and the potential for new forms of misuse as the technology evolves.
- Prohibit the nonconsensual digital replication or manipulation of children’s likenesses.
- Provide those who experience harm through the development and use of AI with mechanisms to seek meaningful justice and remedy.
[1] “Australia: Children’s Personal Photos Misused to Power AI Tools,” Human Rights Watch news release, July 2, 2024, https://www.hrw.org/news/2024/07/03/australia-childrens-personal-photos-misused-power-ai-tools.
[2] Carlini, et al, “Extracting Training Data from Diffusion Models,” January 30, 2023, https://arxiv.org/abs/2301.13188 (accessed July 9, 2024); Benj Edwards, “Artist finds private medical record photos in popular AI training data set,” Ars Technica, September 21, 2022, https://arstechnica.com/information-technology/2022/09/artist-finds-private-medical-record-photos-in-popular-ai-training-data-set/ (accessed July 9, 2024).
[3] Carlini, et al, “Extracting Training Data from Diffusion Models,” January 30, 2023, https://arxiv.org/abs/2301.13188 (accessed July 9, 2024); Nasr, et al, “Extracting Training Data from ChatGPT,” November 28, 2023, https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html (accessed July 9, 2024); Mehul Srivastava and Cristina Criddle, “Nvidia’s AI software tricked nto leaking data,” Financial Times, June 8, 2023, https://www.ft.com/content/5aceb7a6-9d5a-4f1f-af3d-1ef0129b0934 (accessed July 9, 2024); Matt Burgess, “OpenAI’s Custom Chatbots Are Leaking Their Secrets,” WIRED, November 29, 2023, https://www.wired.com/story/openai-custom-chatbots-gpts-prompt-injection-attacks/ (accessed July 9, 2024).
[4] Benj Edwards, “AI image generation tech can now create life-wrecking deepfakes with ease,” Ars Technica, December 9, 2022, https://arstechnica.com/information-technology/2022/12/thanks-to-ai-its-probably-time-to-take-your-photos-off-the-internet/ (accessed July 9, 2024); Benj Edwards, “Microsoft’s VASA-1 can deepfake a person with one photo and one audio track,” Ars Technica, April 19, 2024, https://arstechnica.com/information-technology/2024/04/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track/ (accessed July 9, 2024).
[5] Emanuel Maiberg, “a16z Funded AI Platform Generated Images That “Could Be Categorized as Child Pornography,” Leaked Documents Show,” 404 Media, December 3, 2023, https://www.404media.co/a16z-funded-ai-platform-generated-images-that-could-be-categorized-as-child-pornography-leaked-documents-show/ (accessed July 9, 2024); David Thiel, “Identifying and Eliminating CSAM in Generative ML Training Data and Models,” Stanford Internet Observatory, December 23, 2023, https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_csam_report-2023-12-23.pdf (accessed July 9, 2024).
[6] ABC, “Police investigate fake nude photos of about 50 Bacchus Marsh Grammar students being circulated online,” Australian Broadcast Corporation, June 11, 2024, https://www.abc.net.au/news/2024-06-11/bacchus-marsh-grammar-explicit-images-ai-nude/103965298 (accessed July 10, 2024)
[7] Australia Attorney-General Mark Dreyfus, “New criminal laws to combat sexually explicit deepfakes,” June 5, 2024, https://ministers.ag.gov.au/media-centre/new-criminal-laws-combat-sexually-explicit-deepfakes-05-06-2024 (accessed July 9, 2024); Criminal Code Amendment (Deepfake Sexual Material) Bill 2024, https://www.aph.gov.au/Parliamentary_Business/Bills_Legislation/Bills_Search_Results/Result?bId=r7205 (accessed July 9, 2024).
[8] This recommendation is lightly adapted from UN General Assembly Resolution A/78/L.49, ‘Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development,’ for which Australia served as a key sponsor. See: United Nations General Assembly, Resolution A/78/L.49 (2024), https://www.undocs.org/Home/Mobile?FinalSymbol=A%2F78%2FL.49&Language=E&DeviceType=Desktop&LangRequested=False (accessed July 9, 2024), para 5.