Research Note: Cerebras Inference Service

Cerebras Inference

Cerebras Systems recently introduced Cerebras Inference, a high-performance AI inference service that delivers exceptional speed and affordability. The new service achieves 1,800 tokens per second for Meta’s Llama 3.1 8B model and 450 tokens per second for the 70B model, which Cerebras says makes it 20 times faster than NVIDIA GPU-based alternatives.