MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
chips
Search

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Tuesday August 27, 2024. 06:00 PM , from TheRegister
Faster than you can read? More like blink and you'll miss the hallucination
Hot Chips Inference performance in many modern generative AI workloads is usually a function of memory bandwidth rather than compute. The faster you can shuttle bits in and out of a high-bandwidth memory (HBM) the faster the model can generate a response.…
https://go.theregister.com/feed/www.theregister.com/2024/08/27/cerebras_ai_inference/

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2024 Zicos / 440Network
Current Date
Dec, Wed 18 - 16:12 CET