The race to lead AI has become a desperate hunt for the digital data needed to advance the technology. To obtain it, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law.
lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest AI system.So OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an AI system smarter.Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said.
The companies’ actions illustrate how online information – news stories, fictional works, message board posts, Wikipedia articles, computer programs, photos, podcasts and movie clips – has increasingly become the lifeblood of the booming AI industry. Google and Meta, which have billions of users who produce search queries and social media posts every day, were largely limited by privacy laws and their own policies from drawing on much of that content for AI.
Google said its AI models “are trained on some YouTube content”, which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. Justine Bateman, a filmmaker, former actress and author of two books, told the Copyright Office that AI models were taking content — including her writing and films — without permission or payment.‘Scale is all you need’
Researchers often “cleaned” the data by removing hate speech and other unwanted text before using it to train AI models. In 2022, DeepMind, an AI lab owned by Google, went further. It tested 400 AI models and varied the amount of training data and other factors. By late 2021, those supplies were depleted, say eight people with knowledge of the company, who are not authorised to speak publicly.
But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means ” .OpenAI employees knew they were wading into a legal gray area, the sources said, but believed that training AI with the videos was fair use. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods.
Billions of words sat in people’s Google Docs and other free Google apps. But the company’s privacy restrictions limited how they could use the data, three people with knowledge of Google’s practices said. In August, two privacy team members said, they pressed managers on whether Google could start using data from free consumer versions of Google Docs, Google Sheets and Google Slides.Mr Bryant said that the privacy policy changes had been made for clarity and that Google did not use information from Google Docs or related apps to train language models “without explicit permission” from users, referring to a voluntary program that allows users to test experimental features.
Some debated paying $10 a book for the full licensing rights to new titles. They discussed buying Simon & Schuster, which publishes authors such as Stephen King, according to the recordings. Meta was also limited by privacy changes it introduced after a 2018 scandal over sharing its users’ data with Cambridge Analytica, a voter-profiling company.
Meta’s executives said OpenAI seemed to have used copyrighted material without permission. It would take Meta too long to negotiate licences with publishers, artists, musicians and the news industry, they said, according to the recordings. At least two employees raised concerns about using intellectual property and not paying authors and other artists fairly or at all, according to the recordings.
Australia Latest News, Australia Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
AI Apple Google: Apple in talks to let Google’s Gemini power iPhone AI featuresThe negotiations are about licensing Gemini for some new features coming to the iPhone software this year, Bloomberg News reported.
Read more »
OpenAI deems its voice cloning tool too risky for general releaseDelaying the Voice Engine technology rollout minimises the potential for misinformation in an important global election year
Read more »
OpenAI has 'full confidence' in CEO Sam Altman after investigation, reinstates him to boardThe ChatGPT maker asked a law firm to look at what led the company to abruptly fire Mr Altman in November, only to rehire him days later.
Read more »
Elon Musk v OpenAI: tech giants are inciting existential fears to evade scrutinyMoguls extol the fruits of artificial intelligence, but seek to hide its science from public view
Read more »
With the Great Deskilling, it’s open season on human competenceWho needs to remember anything in the world of autocorrect and Google Maps?
Read more »
With the Great Deskilling, it’s open season on human competenceWho needs to remember anything in the world of autocorrect and Google Maps?
Read more »