SIGNAL//SYNTH

OpenAI’s Karan Singal on HealthBench and the Future of Medical AI

NEJM AI Grand Rounds

aired Jun 17, 2026

▸ Listen ↗ Source

Signal

86.8/ 100

Essential

confidence 0.99

Orig53.6

Actn100.0

Dens100.0

Dpth100.0

Clty60.2

Summary

And so, we got together a team with several amazing colleagues and proposed this Brain moonshot. Large language models had begun to scale.

Why listen

It goes beyond the title with direct discussion of like, think, kind, including: Back then, this was pre-ChatGPT era, you know, predating ChatGPT by, by more than a year when we were pitching this.

Key takeaways

01Large language models had begun to scale
02We had results kind of like around GPT-3 time, and you'd, you'd start to see, you know, more than coherent pieces of text from large language models, and you'd also seen the early
03And so, it was unclear whether these models would have capabilities in medicine to begin with and whether they could be made better in medicine

Best for

research-minded practitioners comparing model behavior