We might be entering to the era of Diffusion in LLM! Google's new modal hints
Google's new model, "Cheetah," achieves about 400 tokens per second. And judging by its ability to restructure messy but functional code into a nicely structured version which GPT-5 High failed at repeatedly, it's likely a diffusion model, not a transformer-based one.
Listen to this article
Duration: 0:23