The Chip Industry's Wake-Up Call: Shifting from GPUs to Purpose-Built AI Chips

In a recent video, Steeve Morin, the Founder & CEO of ZML, shed light on a critical shift in the chip industry: the transition from GPU architecture to purpose-built AI chips. This transition is being driven by the rise of edge AI, which enables AI processing directly on devices, reducing the need for cloud-based processing. By moving AI computations closer to the edge, this shift decreases latency, enhances real-time decision-making, and bolsters security, as sensitive data no longer needs to be sent to the cloud for processing.

As of 2025, AI chip sales are projected to exceed $50 billion, representing 8.5% of total expected chip sales. Long-term forecasts are even more striking, with the potential for AI chips to achieve $400 billion in sales by 2027. This massive growth signals a profound transformation in how AI will be powered in the future.

Below, we dive deeper into some fascinating insights shared by Steeve Morin, supported by key time codes from the video.

Inference vs. Training: A Shift in Market Dynamics

One of the key trends identified in the video is the dramatic shift in the AI chip market from training to inference. While training AI models requires immense computational power, inference—the process of using trained models to make predictions—has become the dominant use case for AI chips.

In five years, it's predicted that inference will account for 95% of the AI chip market, with training relegated to just 5% of the market. This shift reflects the growing demand for AI models to be deployed at scale and used for real-time decision-making, a crucial aspect of AI adoption in industries ranging from healthcare to finance and beyond. [35:05]

Efficiency Gains with Hardware

The video also discusses the significant potential for hardware optimizations to drive efficiency in AI model training and inference.

  • Switching from NVIDIA to AMD for running a 70B model can lead to a 4x improvement in efficiency in terms of spending. This illustrates the importance of choosing the right hardware to optimize AI performance, especially as companies look to reduce costs while scaling up AI adoption. [03:45]

  • NVIDIA, a dominant player in the GPU market, operates at a 90% margin, while its manufacturing partner TSMC sells chips at a 60% margin. This discrepancy highlights the lucrative nature of the AI chip market for manufacturers like NVIDIA. [15:28]

GPU Performance and Limitations

Steeve Morin also points out the limitations of current GPUs, especially when it comes to their performance in AI tasks.

  • The H100 GPU's inference performance is only about twice as fast as the A100, despite costing five times more. This indicates that while newer GPUs may provide some improvements, the cost-to-performance ratio may not justify the premium for certain use cases. [07:23]

  • Interestingly, doubling the number of GPUs in a setup might only result in a 10% improvement in inference performance. This diminishing return on investment is a key challenge as AI models continue to grow in complexity. [45:19]

Memory and Model Capacity: Scaling Up for Bigger Models

AI models are rapidly increasing in size, requiring larger memory and more powerful hardware to handle them.

  • With just eight H100 GPUs, it's possible to run two 70B models, highlighting the need for high-memory, high-performance systems to keep pace with the demands of state-of-the-art AI models. [45:12]

  • Groq's generation of AI chips features 230 MB of SRAM per chip, enabling faster memory access and better performance for complex AI tasks. [28:23]

  • Cerebras, known for its innovative wafer-scale engine, provides 44 GB of ESRAM, which is essential for running very large models efficiently. [28:40]

  • A 7TB model (BF16) requires an immense 140 GB of memory, underlining the importance of memory capacity as models scale to new heights. [28:29]

NVIDIA's Market Position: Dominance and Profitability

Despite the rising competition, NVIDIA remains a powerful player in the AI chip market:

  • Currently, 40% of NVIDIA's revenue comes from inference-related sales. This speaks to the growing dominance of inference workloads in the AI chip market and highlights the company's strong foothold in the industry. [25:00]

  • As mentioned earlier, NVIDIA operates at a 90% margin, showing the high profitability of its business model. This margin reflects the premium pricing of NVIDIA’s GPUs, particularly the A100 and H100 models, which are heavily utilized for AI training and inference tasks. [16:12]

Switching Costs and Market Entry: Overcoming Barriers to Change

  • While there are clear opportunities for more efficient hardware solutions, Steeve Morin emphasizes the challenge of switching providers in the chip industry.

  • Even if a new chip provider is seven times better in terms of a specific metric, that may not be enough to convince companies to make the switch. The costs and risks associated with transitioning to new hardware and retraining models can deter even the most innovative companies from adopting alternative solutions. [40:54]

DeepSeek's Cluster: A New Approach to GPU Deployment

Finally, Steeve Morin shares insights into DeepSeek's approach to scaling GPU infrastructure:

  • DeepSeek has adopted a unique strategy by organizing their GPU resources into four groups of 25,000 GPUs, rather than a single cluster of 100,000 GPUs. This modular approach allows for more efficient scaling and better management of resources, making it a notable example of how AI companies are optimizing their infrastructure for peak performance. [53:07]

Conclusion

The chip industry is on the cusp of a major transformation as it shifts away from traditional GPU architectures and embraces purpose-built AI chips. With the rise of edge AI, AI chip sales are poised for significant growth, with projections indicating a surge from $50 billion in 2025 to $400 billion by 2027.

As inference continues to dominate the AI chip market, companies will need to embrace new hardware solutions that offer better efficiency, performance, and scalability. The competition is intensifying, and companies like NVIDIA, AMD, Groq, and Cerebras are at the forefront of this evolution, helping shape the future of AI processing.

Steeve Morin’s insights highlight the challenges and opportunities ahead, emphasizing the importance of staying ahead of the curve in an industry that is rapidly evolving to meet the needs of artificial intelligence at scale.