ThursdAI - The top AI news from the past week

AGI is here? Jensen says yes, ARC-AGI-3 says AI scores under 1%

216 snips

Mar 27, 2026

Daniel Han, open-source ML engineer and Onslaught contributor who builds tools for fine-tuning and quantization, joins to unpack model compression and deployment. He explains weight quantization, KV cache trade-offs, and real-world limits of TurboQuant. The conversation also reacts to Gemini 3.1 Flash Live, new voice and music models, and key infra and quota dramas in the AI world.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Use Subconversations For Fast Follow-Ups While Agents Run

Use 'slash btw' or similar sub-conversation features to record follow-ups without triggering tool calls and get fast answers.
Alex described OpenClaw/OpenCall's slash btw that spawns a lightweight chat for quick follow-ups while long tasks run.

INSIGHT

Agents Tend To Inflate Code With Irrelevant Defensive Logic

Agentic code generation accumulates defensive, irrelevant checks over time, making generated code brittle and bloated.
Yam and Mario Zichner's 'Slowing The Fuck Down' visualization showed agents adding layers of checks that degrade long-term code quality.

ANECDOTE

OpenAI Sunsets Sora To Reallocate GPUs Toward Core Models

OpenAI shut down Sora app, API, and the $1B Disney deal to refocus compute on core models like Codex and GPT 5.4.
Alex explained the reason: GPU costs and focus on AGI-aligned priorities, not moral judgment about content.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Hey y’all, Alex here, let me catch you up!

Jensen Huang went on Lex and said AGI has been achieved. We’ll get to that.

The biggest demo moment: Gemini 3.1 Flash Live launched - Google’s omni model that sees, hears, and searches the web in real time. We tested it live and I said “what the f**k” on air. It was really impressive!

Google Research also dropped TurboQuant (6x KV cache compression) which crashed Samsung and Micron stocks - we had Daniel Han from UnSloth help us make sense of why that’s overblown. OpenAI killed Sora - the app, the API, and the $1B Disney deal. Claude felt noticeably dumber this week AND max account quotas are melting as 500+ people confirmed on my X and Reddit. We have an official word from Anthropic as to why.

Mistral launched Voxtral TTS (open weight, claims to beat ElevenLabs), Cohere shipped an ASR model, and Google’s Lyria 3 Pro now generates full 3-minute music tracks inside Producer AI.

This and a lot more in today’s episode, let’s dive in (as always, show notes and links in the end!)

ThursdAI - Let me catch you up!

Gemini 3.1 Flash Live: The Real-Time AI Companion Is Here

Google dropped a breaking news on the show today, with Gemini 3.1 Flash - LIVE version. This one is an omni-model, that means it can receive text/audio/video on input and respond in text and voice. It has Google search grounding, and it felt... immediate!

I was blown away, really, check out the video, the speed with which it was able to “see” me, respond to my query, look up something on the web, was mind blowing. I don’t often get “mind blown” anymore, there’s just too many news, but this one did the trick!

With the pricing being around 10x cheaper than GPT-real-time, and the Google search grounding being super fast, I can absolutely see this model being hooked up to... robots (like ReachyMini), SmartGlasses that can see what you see, and a bunch more!

Gemini Live is available on Google AI studio and has been rolled out globally inside the Google Search app! So now when you pull up the Google Search app, just open it and point at anything. Truly a remarkable advancement.

Google research publishes TurboQuant - 6x reduction in KV cache with 0 accuracy loss

Google research posted some work (based on an Arxiv paper from almost a year ago) that shows that with geometry tricks, combining two other techniques like PolarQuant and QJL, they are able to compress the KV cache of running LLMs by nearly 6x, and show an 8X speed up for model inference with zero accuracy loss.

If you ever watched silicon valley the HBO show, this sounds like the fictional middle-out algorithm from PiedPiper. If this scales (and that’s a big if, we don’t know if this applies to other, bigger models yet), this means significant decreases in memory requirements to run the current crop of LLMs for longer context.

The claim is big, so we’ll continue to monitor if this indeed scales, but the most interesting thing about this piece of news is, that it broke the AI bubble and went to wall street, with finance brows deciding that this means that memory will not be needed as much any more and it tanked Samsung and Micron stocks. Which I found particularly ridiculous on the show, did they not hear about Jevons Paradox? This is reminiscent of the DeepSeek R1 saga that tanked Nvidia stocks over a year ago.

Daniel Han from Unsloth, who joined us on the show, pointed out that the approach is mathematically interesting even if it’s not necessarily better than existing open-source techniques like DeepSeek MLA. LDJ noted that the baseline comparison (16-bit KV cache) isn’t really fair since most production systems are already compressing beyond that. Yam implemented it himself and confirmed the speedups are real, but so is the trade-off.

Anthropic updates: Opus dumber? Quotas lower! Injunction won! Computer.. used.

Anthropic folks, especially on the Claude code side are shipping like crazy, we won’t be able to cover all the updates, but there was a few notable things I have to keep you up to date on.

Claude Opus seems to be getting “dumber”, again

I have to talk about this because it affected my work directly this week and hundreds of people confirmed the same experience.

I use Claude Opus for my standard ThursdAI prep workflow — generating the TL;DR with 10 bullet points and an executive summary for every topic we cover, creating episode pages, etc.

The format has not changed for over a year and yet this week I asked for 10 factoids. I got 4. It says “10” right there in the prompt. Four bullet points.

On the website builder, I’ve asked Opus to create a page for last weeks episode, and instead of adding it to the other episode, Opus decided to ... replace the last episode with this one. This would be funny if it wasn’t sad. This is Opus 4.6 we’re talking about, not some quantized open source LLM from last year!

The reason is unclear, and it’s not only me, Wolfram noticed that it’s easier to see these types of things in other languages and that for the last week Opus would forget to add Umlauts in German!? and Yam also felt it.

Pro/Max plan quotas burning up, Anthropic confirmed that they are tightening them for “peak hour” usage

This week, so many people started posting that something is wrong with their Claude Codes, I did a survey, and it blew up. Hundreds of people replied and confirmed that for the first week, they are hitting their session quotas on Pro and 20x $200/mo MAX accounts much much quicker than before. When I say much quicker, I mean, some fokls have hit the quota in as little as 5 minutes. While some others had no issues.

I personally btw did not have this. A few days later, Thariq from the Claude code team, and later an official post, confirmed that Anthropic had been rolling out a “tightening” of the Pro/Max accounts to accomodate for growth.

This is of course, a huge bummer to the folks who pay $200/mo for the 20x max tier, as they tend to run agents and subagents overnight. But here’s the thing, I don’t think that folks from Anthropic see what we see, some folks got no issues with hitting quota, and some are barely able to use their subscription. I hope that they will find and resolve these bugs quick, because some folks are switching to Codex, and the Anthropic IPO is coming up! I will say, I don’t envy Thariq’s job, he’s doing it gracefully, and maybe one of the only ones in Anthropic that does it at all.

Judge granted Anthropic an injunction against DoW and the whole “Supply chain risk” designation!

Just in as I’m writing this, a district judge in CA, granted Anthropic an injunction against being designated as a supply-chain-risk company. If you haven’t been following, the US Department of War, specifically Pete Hegseth, threatened and then designated Anthropic as a supply chain risk company, while us president Trump “fired” Anthropic and banned its use in any gov agencies.

Well, no so fast says Judge Lin, from CA District court. In this Order, she shows that Dept. of war didn’t meet any legal requirements for this designation. It’s really a fascinating read, but the highligth is this:

When asked why Hegseth made a public statementthat had no legal effect and that did not reflect the immediate intent of DoW, counsel stated, “I don’t know.”

This is just the first court and will likely be escalated further up the judicial system. This is still developing and apparently the Pentagon declared Anthropic a supply chain risk under two different statutes, and this only affects one of them. So while it’s good news, it’s not over yet.

Voice & Audio Explosion: Three Releases in One Hour

I had to hit the breaking news button mid-TLDR because three major voice releases dropped simultaneously during the show.

Mistral Voxtral TTS — Mistral’s first text-to-speech model, 3 billion parameters, open weight. They claim it beats ElevenLabs Flash v2.5 in human preference tests (58% win rate on flagship voices, 68% on zero-shot voice cloning).

We tested it live on the show — it’s decent, with emotion controls for neutral, happy, and frustrated voices. I was not super impressed tbh, it sits somewhere between the very good big labs TTS and the very small open source 82M param TTS.

Cohere Transcribe — Cohere enters the ASR game with a 2 billion parameter open-source model (Apache 2.0!) that immediately grabbed the #1 spot on HuggingFace’s Open ASR Leaderboard with a 5.42% word error rate, beating Whisper Large v3’s 7.44%. In human evaluations, it wins 61% of the time on average, and 64% specifically against Whisper. For anyone in regulated industries needing local inference for compliance, this could genuinely replace Whisper as the default.

Google Lyria 3 Pro — Google’s most advanced music model is here.

It can now generate full 3-minute tracks with structural control — intros, verses, choruses, bridges. We generated a ThursdAI opening theme live on the show using Producer AI, and it was... honestly not bad?

It followed our instructions perfectly: drum and bass, 174 BPM, high energy podcast opener with vocals and introduction. The instruction-following was spot on. Nisten said it’s the best music generation model right now. It’s available to Gemini subscribers and via Producer AI and gemini, and it can even compose music from images. SynthID watermarked, royalty-free. We might actually use one of the generated tracks as a new show opener.

The craziest thing is, since Google acquired Composer, the team has been shipping. I only generated the audio during the live show, but now went back there to download it for you guys, and whoah, it can now generate whole clips by using other Google tech, this is really cool!

OpenAI kills SORA (and Atlas?)

Last week we reported on about OpenAI’s focus shift towards Codex and productivity, and this week we see the first casualty. OpenAI is killing SORA, the app, the Sora 2 and Sora 2 pro models and APIs.

Many AI haters are celebrating this as through “ai videos” is dead, but honestly, this is obviously about the GPU power and the other things OpenAI needs to do to win the fight against Anthropic. OpenAI is also apparently going to IPO this year (like Anthropic) and they absolutely need to win the productivity/agents in enterprise market.

As part of this shut down, the Disney + OpenAI partnership, is also dissolving, and Disney will no longer invest 1B into OpenAI.

So, say bye bye to having digital selfies with Sam Altman. I’ve generated this SORA vid to hear from Sam himself:

Atlas browser, OpenAI’s native browser endeavor is supposedly also going to transform, together with Codex and OpenAI native app into one super app that includes all three according to the same memo.

AGI is here according to Jensen, AGI is far away, according to ARC-AGI-3

The back to back this week can give anyone whiplash. First, Lex Friedman had Jensen Huang on the podcast, and asked him a very specific “WhenAGI” question, to which Jensen said “I believe it’s already here”

Then just a few short days layer, ArcPrize, released the 3rd version of Arc-AGI, Arc-AGI 3 a series of puzzle games, where humans get 100% pass-rate and the current LLM, top tier frontier LLMs, are getting less than 1%! It’s an interactive, agentic reasoning benchmark designed to test human-like generalization and intelligence in novel, abstract, turn-based environments.

The puzzles all look simple enough to do, and are actually fun, and while the wild claims of “AGI is not here yet” from the ArcPrize folks are quite interesting. The stated goal of the foudation is to release evaluations that are completely un-saturated, and this seems like one such thing at first glance.

There’s a bit of a debate in the community about the way Arc Prize went about this specific benchmark (no harnesses, raw LLM outputs), saying that humans got a “game” while the LLMs get just raw JSON and minimal and no extra tools.

For context, a agentic harness startup claims to have solved 35% already of the games in ArcAGI, but that result is unverified and self reported, becuase they are an agentic harness, which ArcAGI apparently disqualifies.

AI Art and Diffusion

I wanted to finish but I think these are important releases so I’ll include them briefly.

Luma Labs Uni-1 — thinks and generates pixels simultaneously, #1 human preference Elo (X, Announcement)

This was a surprising release, we previously seen Luma Labs do video, but this time they are posting their Uni-1 which is a… image model but it’s based on an LLM, so you talk to it, iterate together until you get results. Yes, Nano Banana via AI studio is kind of like this as well ,but Uni feels a bit different. It can also generate infographics, which I haven’t tried yet.

You can try Uni here

Phota Labs launches Phota Studio + API — a photography-focused image model with identity-preserving personalization (X, try it)

There’s tons of photo startups, but this one looks kind of crazy! You upload a bunch of your pictures, they train a “model” for you, and then you can create a whole bunch of images, and they do actually resemble you. Yes, Nano Banana can take a few reference pictures, but this somehow seems more accurate!

You can create professional photos, fix photos you like, add others to your photos. I do feel there’s a jump in capabilities here, specifically because of the personalization! Give them a try if you’re not worried about them training on your pics and let me know.

Modular made Flux.2 run in <300ms (X)

We told you about Modular, and Mojo before, and while they provide inference speedups, I was surprised to see them releasing a model optimization, and hope this comes to all image generations!

There’s a lot more to be said about this weeks updates, we went for over 2.5 hours (which I had to cut down to a bit over 1h45m) on the live show, and while I can go and on, I want to pause here. Weeks are getting crazier, denser and more unpredictable. I really thought we’d have a chill week until today!

P.S - Mario Zechner, the author of the Pi coding CLI, which sits at the heart of OpenClaw has posted an awesome essay called “thoughts on slowing the f**k down“, I strongly advice anyone with many agents running in parallel to read this.

Simultaneously, Alex Sidorenko posted this beautiful visualization of what happens when you have too many agents running in a loop, on your codebase. This is definitely starting to be noticeable as many companies use more and more agents, without reviewing their code. On weeks like this week, where Opus has almost deleted a part of my website, I feel this very strongly. Be careful out there!

See you next week!

* General

* Jensen says “AGI is here” (X, Lex full pod)

* Big CO LLMs + APIs

* Google drops Gemini Flash live - Gemini can see, hear and talk to you (X)

* OpenAI fully discontinues Sora, including app, API, and ChatGPT video features, as Disney deal collapses (X, X)

* Claude Code users blowing through weekly usage quotas by Monday/Tuesday (X)

* Anthropic tightens the Claude Pro/Max account quotas during Peak Hours (Anthropic announcement)

* ARC-AGI-3 launches: humans 100%, AI under 1% (X, Announcement)

* Anthropic gets an injunction against DoW in Supply-chain case (X)

* Open Source LLMs

* Google TurboQuant — KV cache 6x compression, 8x speedup, zero accuracy loss (X, Blog, Arxiv)

* Unsloth Studio: 10x faster inference, desktop shortcuts, auto-parameter detection (X, GitHub)

* Reka AI launches Edge, a 7B multimodal vision-language model built for sub-second latency on edge devices, now available on OpenRouter (X, HF, Announcement, Blog)

* Tools & Agentic Engineering

* Cursor Composer 2 tech report: 1T params trained on Kimi K2.5 (X, Blog)

* Modular 26.2 — FLUX.2 in <1 second, 99% cheaper than Nano Banana (X, Blog)

* litellm PyPI supply chain attack — SSH keys, cloud creds, API keys exfiltrated (X)

* Claude can now control your Mac - computer use arrives in Claude Cowork and Claude Code as a research preview (X, Announcement)

* Voice & Audio

* Mistral drops Voxtral TTS, a 3B-parameter open-weight text-to-speech model that beats ElevenLabs Flash in human preference tests (X, Blog)

* Cohere launches Transcribe, an open-source 2B ASR model that tops HuggingFace’s Open ASR Leaderboard with 5.42% word error rate (X, Blog, HF)

* Google DeepMind Lyria 3 Pro — full 3-minute music tracks with structural control (X, Announcement)

* Irodori-TTS-500M — Japanese TTS with emoji emotion control (X, HF)

* AI Art & Diffusion & 3D

* Luma Labs Uni-1 — thinks and generates pixels simultaneously, #1 human preference Elo (X, Announcement)

* Modular FLUX.2 — sub-1-second image generation, 99% cheaper than cloud (X)

* Phota Labs launches Phota Studio + API — a photography-focused image model with identity-preserving personalization (X, try it)

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe