Andrew White | EdisonScientific| GTC 26
Building, Measuring, and Using AI Scientists
Andrew White
EdisonScientific
GTC[S81694]
March 2026
Science is changing independent of AI
Arxiv.org,10.6084/m9.figshare.17064419.v3
Number of Researchers are Growing
International R&D spending
PhD Researchers
NSF - https://ncses.nsf.gov/pubs/nsf24332; UNESCO UISI SDG9
Intellectual bottlenecks are growing
📝 Increasing paper count ($\approx$10M per year)
🧬 Larger data sets from cheaper
experiments (genome at
$200 per person, $1 / GB of sequencing)
🔍95% decline in disruptive papers since 1980
Park, M. et al. Nature 613, 138-144 (2023); Scannell, J.W. et al. Nat. Rev. Drug Discov. 11, 191–200 (2012); Deloitte 2025: Pharma innovation returns.
Building an AI Scientist
What is an AI Scientist?
An AI Scientist is a system whose input is a general direction of discovery and whose output is experimental results, analysis, and a paper describing a novel discovery.
| LLM | model (knowledge, text generation) |
| Agent | model + tools + task (can take actions) |
| Co-Scientist | agent + conversation + long-running (human-in-the-loop) |
| AI Scientist | autonomous + long-running + can execute experiments (novel discoveries) |
| LLM | GPT-5.21, GLM 4.52 |
| Agent | Deep Research3, ChemCrow4 |
| Co-Scientist | Google Co-Scientist5, Biomni6 |
| AI Scientist | Kosmos7 |
1OpenAI, 2025. 2Zhipu AI, 2025. 3OpenAI, 2025; arXiv:2312.07559. 4Bran et al., Nat. Mach. Intell., 2024. 5Gottweis & Natarajan, Google Research, 2025. 6biomni.stanford.edu. 7arXiv:2511.02824, 2025.
Benchmarking an AI Scientist
Devil facial tumor disease (DFTD) is a disease that is decimating the population of Tasmanian devils. The disease passes from one animal to another through bites and is caused by parasites. The parasites cause cancerous tumors that spread throughout an infected animal's body and kill it. What is the best description of DFTD?
A frameshift mutation is created when
Approximately what percentage of Drosophila with a H3.3K36R mutation finish developing and enclose?
In a bioinformatics lab, Watterson's estimator (θ) and π (nucleotide diversity) will be calculated from variant call files. Will these calculations be biased if we are aiming to measure diversity of a whole population?
Deletion of which residues from C. elegans protein COSA-1 would most likely affect the ability of COSA-1 to recruit MSH5 and ZHP3?
Residues 31–40
Training Agents
RL expands capabilities beyond what supervised data can teach
Computational checks (code execution, tool outputs) produce objective reward signals — no human feedback needed
BixBench: verifiable bioinformatics tasks as RL training signal
Swanson et al., BixBench, 2025; Narayanan et al., Aviary, 2024; NVIDIA Technical Blog
How many peripheral immune cell types show significant differential expression (adjusted p-value < 0.05) of SOCS3?
Narayanan et al., Aviary: training language agents on challenging scientific tasks, 2024
Wide: mechanism to target to drug
ROBIN: A Multi-Agent System for Automating Scientific Discovery
Ali Essam Ghareeb*, Benjamin Chang*, Ludovico Mitchener, Angela Yiu, Caralyn J. Szostkiewicz, Jon M. Laurent, Muhammed T. Razzak, Andrew D. White†, Michaela M. Hinks‡, Samuel G. Rodriques
Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C Landsness, Daniel L Barabasi, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha Foiani, Aizad Kamal, Leah P Shriver, Fang Cao, Asmamaw T Wassie, Jon M Laurent, Edwin Melville-Green, Mayk Caldas, Albert Bou, Kaleigh F Roberts, Sladjana Zagorac, Timothy C Orr, Miranda E Orr, Kevin J Zwezdaryk, Ali E Ghareeb, Laurie McCoy, Bruna Gomes, Euan A Ashley, Karen E Duff, Tonio Buonassisi, Tom Rainforth, Randall J Bateman, Michael Skarlinski, Samuel G Rodriques, Michaela M Hinks, Andrew D White
arXiv:2511.02824, 2025
How do you validate systems like this?
Work with external groups. Input is their experimental data. Three discoveries reproduced in unpublished work. Four novel discoveries.
Kosmos overview
independent expert annotation of task difficulty and correctness
Example Discovery: What kosmos found
Human validation
Kosmos Scale