Clinical Performance of AI on Real Cases
This is a pretty interesting paper published in the April 30 edition of Science ( Peter G. Brodeur et al., Performance of a large language model on the reasoning tasks of a physician. Science 392,524-527 (2026). DOI:10.1126/science.adz4433 ).
https://www.science.org/doi/10.1126/science.adz4433
It discusses some of the earlier OpenAI models’ (e.g. o1-preview and GPT-4) performances on generating differential diagnoses and then looked at how o1 and 4o performed on real world ED and ICU admissions when compared to two Internal Medicine physicians at Beth Israel Deaconess in Boston.
Excerpts from the article:

