Elon Musk Warns AI Training Data is Exhausted, Calls Synthetic Data Key
Elon Musk acknowledges the depletion of real-world data for AI training and highlights synthetic data as the path forward.
Elon Musk has joined other AI experts in declaring that we’ve run out of quality real-world data to train AI models. Speaking with Stagwell chairman Mark Penn during a livestream on X, Musk said, “We’ve now exhausted the cumulative sum of human knowledge in AI training. That happened last year.”
Musk’s observations align with Ilya Sutskever, former OpenAI chief scientist, who called this phase “peak data” during the NeurIPS conference. Sutskever predicted a shift in AI model development due to the scarcity of fresh training data.
Musk suggested synthetic data — AI-generated data — as the future for AI training. He explained, “The only way to supplement [real-world data] is with synthetic data… [AI] will grade itself and self-learn.”
Microsoft, Meta, OpenAI, and Anthropic are leading this shift. Gartner estimates that 60% of data used in AI projects in 2024 will be synthetic. For example, Microsoft’s Phi-4 and Google’s Gemma were trained on a mix of real and synthetic data, while Anthropic and Meta also rely heavily on AI-generated data.
Hence, synthetic data is cheaper to generate in general than real data. AI writing startup Writer created its Palmyra X 004 model for $700k which is way more affordable than OpenAI’s models which cost $4.6 million. However, challenges persist. There are findings that synthetic data may lead to a catastrophic failure, where style swaps lead to loss of functionality and increase the model bias.
Despite potential risks, Musk believes synthetic data is essential to AI’s future. He emphasizes the need for innovation in data creation to avoid stagnation in model performance.