• Home
  • News
  • Coins2Day 500
  • Tech
  • Finance
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Elon Musk says AI has already gobbled up all human-produced data to train itself and now relies on hallucination-prone synthetic data

Sasha Rogelberg
By
Sasha Rogelberg
Sasha Rogelberg
Reporter
Sasha Rogelberg
By
Sasha Rogelberg
Sasha Rogelberg
Reporter
January 10, 2025, 1:24 PM ET
Elon Musk puts his finger on his chin in a thinking face
Elon Musk said AI models have already exhausted all human-made data to train themselves.Marc Piasecki/Getty Images
  • Artificial intelligence relies on vast amounts of data to train itself. But Elon Musk says models have already run out of human-created data, and have turned to AI-generated information to teach itself.

AI takes an immense amount of resources—from endless water to an estimated $1 trillion worth of investor dollars—but Elon Musk warned the technology has already run out of its primary training resource: human-created data.

Recommended Video

Engineers and data scientists train AI by essentially reducing the entire internet, all books, and every interesting video published into a token that AI can digest and learn from, Musk told Mark Penn, CEO of marketing company Stagwell, in an interview streamed on X Wednesday. But AI has already consumed that information, and requires even more data to fine-tune itself.

“The cumulative sum of human knowledge has been exhausted in AI training,” Musk said. “That happened basically last year.”

In order to continue training, AI uses synthetic data that is also artificially generated. Musk likened the process to an AI model writing an essay and then grading the essay itself.

Tech giants like Microsoft, Google, and Meta have already turned to synthetic data to train their respective AI models. Google DeepMind used an artificially generated pool of 100 million unique examples to train its system AlphaGeometry to solve complex math problems, “sidestepping the data bottleneck” of human-generated information. In September, OpenAI introduced o1, an AI model that can fact-check itself. 

There are drawbacks to the widespread use of synthetic data for training models, Musk said. Synthetic data usage increases the likelihood of hallucinations, or nonsensical content that AI can share, believing it is completely true. Dubbed AI slop, these heaps of incomprehensible or just plain wrong information have already flooded the internet, raising concern among tech experts and users. Nick Clegg, president of global affairs at Meta, said in February the company is working to identify AI-generated content on its platforms.

“As the difference between human and synthetic content gets blurred, people want to know where the boundary lies,” Clegg said in a blog post.

Musk did not respond to Coins2Day’s request for comment.

Scientists agree: Human data is finite

The finiteness of human-produced data to train AI has become a widely accepted issue in the tech community. A study released in June by research group Epoch AI predicted tech companies will run out of publicly available content to train AI language models between 2028 and 2032—a slightly more conservative projection compared to what Musk claims happened last year. The limited training resources could slow the current rate of AI development.

“There is a serious bottleneck here,” Tamay Besiroglu, one of the study’s authors, told the Associated Press. “If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output.”

One reason why human-created information is becoming scarce is not just because AI is digesting it all, but also because owners of some of that data are apprehensive about AI using it. The MIT-led Data Provenance Initiative published a study in July finding the once-vast well of data for AI training was drying up. Looking at 14,000 web domains used in data sets for AI training, researchers found the online sources behind some of the data sets were restricting its usage, some by 45%, to keep bots from scraping their data. It’s part of a trend of data owners becoming sensitive to AI using their information, or wanting to be fairly compensated for that usage.

The future of AI training

Tech companies may no longer be able to rely on human-generated data for AI training, but they aren’t out of options.

“I don’t think anyone is panicking at the large AI companies,” Pablo Villalobos, lead author of the Epoch AI study, said in an interview with science journal Nature. “Or at least they don’t email me if they are.”

Some data scientists have not only turned to synthetic data, but also private information and deals with publications to have access to their content. OpenAI even reportedly had employees transcribe podcasts and YouTube videos to gather more training data, potentially violating copyright laws, according to the New York Times. OpenAI did not immediately respond to Coins2Day’s request for comment.

Still, synthetic data continues to be the future of AI training. CEO Sam Altman told the Sohn Conference Foundation in 2023 the company would run out of content to feed its models, but suggested as the production of synthetic data continues to improve, it will help solve the content crisis.

“As long as you can get over the synthetic data event horizon where the model is good enough to create good synthetic data, I think you should be alright,” he said.

Coins2Day Brainstorm AI returns to San Francisco Dec. 8–9 to convene the smartest people we know—technologists, entrepreneurs, Coins2Day Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI at another pivotal moment. Register here.
About the Author
Sasha Rogelberg
By Sasha RogelbergReporter
LinkedIn iconTwitter icon

Sasha Rogelberg is a reporter and former editorial fellow on the news desk at Coins2Day, covering retail and the intersection of business and popular culture.

See full bioRight Arrow Button Icon
Rankings
  • 100 Best Companies
  • Coins2Day 500
  • Global 500
  • Coins2Day 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Coins2Day Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Coins2Day Brand Studio
  • Coins2Day Analytics
  • Coins2Day Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Coins2Day
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Coins2Day Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Coins2Day Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.