SIGNAL//SYNTH

Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI Research Scientist Noam Brown

No Priors: Artificial Intelligence | Technology | Startups

aired Jun 26, 2026 · 36.0m

▸ Listen ↗ Source

Signal

83.9/ 100

High signal

confidence 0.99

Orig53.7

Actn100.0

Dens100.0

Dpth100.0

Clty60.6

Summary

But GBT-3, you couldn't scale test time compute. Like if you gave it a budget of $10 million and said, OK, well, let's see what GBT-3 can do.

Why listen

It goes beyond the title with direct discussion of like, think, models, including: Like if you gave it a budget of $10 million and said, OK, well, let's see what GBT-3 can do.

Key takeaways

01But GBT-3, you couldn't scale test time compute
02The Procurentist frameworks and responsible scaling policies, they don't really account for the amount of test time compute
03They just say, OK, well, what's the capability of the model?

Best for

research-minded practitioners comparing model behavior