SIGNAL//SYNTH
Ai

Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI Research Scientist Noam Brown

aired Jun 26, 2026 · 36.0m
Signal
83.9/ 100
High signal
confidence 0.99
Orig53.7
Actn100.0
Dens100.0
Dpth100.0
Clty60.6
Summary

But GBT-3, you couldn't scale test time compute. Like if you gave it a budget of $10 million and said, OK, well, let's see what GBT-3 can do.

Why listen

It goes beyond the title with direct discussion of like, think, models, including: Like if you gave it a budget of $10 million and said, OK, well, let's see what GBT-3 can do.

Key takeaways
  1. 01But GBT-3, you couldn't scale test time compute
  2. 02The Procurentist frameworks and responsible scaling policies, they don't really account for the amount of test time compute
  3. 03They just say, OK, well, what's the capability of the model?
Best for
research-minded practitioners comparing model behavior