Meta Platforms used public posts on Facebook and Instagram to train its new Meta AI virtual assistant, but excluded private posts shared only with family and friends in an effort to respect consumer privacy, the company’s chief policy officer said in an interview with Reuters.
Meta also did not use private chats on its messaging services as training data for the model and took steps to filter private data from public datasets used for training, said Nick Clegg, Meta’s head of global affairs, speaking on the sidelines of the event this year. The week of the company’s annual Connect conference.
“We tried to exclude datasets with a high preponderance of personal information,” Clegg said, adding that the “vast majority” of the metadata used for training was publicly available.
He mentioned LinkedIn as an example of a website whose content Meta consciously did not want to use due to privacy concerns.
Clegg’s comments come as technology companies, including Meta, OpenAI and Alphabet’s Google, have been criticized for using information obtained from the internet without permission to train their AI models, which ingest massive amounts of data to summarize the information and create images.
Companies are considering how to handle proprietary or copyrighted material that their AI systems may reproduce in the process, while at the same time facing lawsuits from authors who accuse them of copyright infringement.
Meta AI was the lead product among the first consumer AI tools unveiled by CEO Mark Zuckerberg on Wednesday at Meta’s annual product conference, Connect. Conversations about artificial intelligence dominated this year’s event, unlike previous conferences that focused on augmented and virtual reality.
Meta built the assistant using a custom template based on the powerful Llama 2 main language model that the company released for general commercial use in July, the company said.
The assistant can generate text, audio, images, and access real-time information through a partnership with Microsoft’s Bing search engine.
Public Facebook and Instagram posts used to train the Meta AI included text and images, Clegg said.
He said Meta also imposed security restrictions on the content the tool could create, such as prohibiting the creation of realistic images of public figures.
Regarding copyrighted material, Clegg said he expects “a fair amount of litigation” over the issue of “whether or not creative content falls within the existing fair use doctrine,” which allows limited use of copyrighted works for purposes such as Commenting, researching and publishing. Parody.
“We believe that is the case, but I strongly doubt that this will lead to a lawsuit,” Clegg said.
Some companies with image generation tools make it easier to reproduce famous characters like Mickey Mouse, while others have paid for the materials or deliberately avoided including them in the training data.
For example, OpenAI signed a six-year deal this summer with content provider Shutterstock to use the company’s photo, video, and music libraries for training.
When asked whether Meta had taken such steps to prevent the reproduction of copyrighted images, a Meta spokesperson pointed to new terms of service that prevent users from creating content that violates privacy and intellectual property rights.