AWS and Cerebras team up to supercharge AI inference
AWS is partnering with Cerebras to roll out a new AI inference service, coming soon.
By combining Cerebras's chips with AWS's Trainium3, they're aiming to make AI apps run faster and more efficiently, especially for things like chatbots and coding tools.
The system splits tasks for better performance
Instead of making one chip do all the work, the system splits tasks: Trainium3 handles the "prefill" (turning requests into tokens), while Cerebras chips take care of the "decode."
This teamwork is powered by custom networking designed for speedy responses.
Cerebras's chips are already outperforming NVIDIA's GPUs
Cerebras's WSE-3 chip is seriously powerful: with 4 trillion transistors and 900,000 AI cores, a CS-3 system can process up to 1,800 tokens per second (for the Llama 3.1 8B example, about 20 times faster than comparable NVIDIA GPU-based solutions on some models).
Meanwhile, AWS's Trainium3 handles the prefill stage and is designed to provide high performance for that role, great for heavy-duty AI tasks.
Amazon says Trainium3—and future Trainium4—are expected to lead in price-performance versus merchant GPUs.