LBNL

Automating Scientific Discovery

Andrew White

FutureHouse
August 2025

FutureHouse Structure

Non-profit founded in 2023
Funded primarily by Eric Schmidt
Based in San Francisco
25 employees

Science is changing independent of AI

Arxiv.org,10.6084/m9.figshare.17064419.v3

Intellectual bottlenecks are growing

📝 Increasing paper count ($\approx$5M per year)

🧬 Larger data sets from cheaper experiments (genome at $200 per person, $1 / GB of sequencing)

🔍95% decline in disruptive papers since 1980

Park, M. et al. Nature 613, 138-144 (2023); Scannell, J.W. et al. Nat. Rev. Drug Discov. 11, 191–200 (2012); Deloitte 2025: Pharma innovation returns.

FutureHouse Mission

Accelerate Scientific Discovery

What is an agent?

Agent: trained, makes decisions

Environment: untrained, has tools, state

Protein Design Environment

Protein design with 5 existing deep learning models
Molecular dynamics, bioinformatics, literature research agent
Input: "design 92 binders for PD-L1"

Wet lab validation

Aviary Code

First message to agent

Aviary Code

Create tools

Aviary Code

Use environment

Learning vs Frontier Models

Crows

	Environment	Key Tools
Crow/Falcon	Literature Research	Search, Citation Traversal
ProteinCrow	Designing novel proteins	AlphaFold2, Molecular Dynamics
ChemCrow/Phoenix	Designing new molecules	Retrosynthesis, self-driving robotic lab
Data analysis crow	Generating discoveries from data	bioinformatics tools, code, file system

Agent vs ML Model

Modify surface residues of IL-10 to increase expression and solubility in E. coli without disrupting dimerization or receptor interaction.

link

Automating research of scientific literature

Language agents achieve superhuman synthesis of scientific knowledge

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, Andrew D. White arXiv:2409.13740, 2024

Better at answering questions than PhD biology experts

Improving over time

Better than human written Wikipedia articles

FutureHouse Platform

Free, with rate limits
API - can be incorporated into your pipeline/agents
Majority of code is open source

API

Literature Research Agent Scale

Tasks per minute: 150
Research Papers 100,000,000
Wiki page for all diseases every 14 hours
All arxiv papers per week 25,000 papers / month
Check for contradictions 6.3M papers / year
All Wikipedia every 6 weeks

Model intelligence will continue to increase

Complete cycle of disease to mechanism to target to drug

ROBIN: A Multi-Agent System for Automating Scientific Discovery

Ali Essam Ghareeb*, Benjamin Chang*, Ludovico Mitchener, Angela Yiu, Caralyn J. Szostkiewicz, Jon M. Laurent, Muhammed T. Razzak, Andrew D. White†, Michaela M. Hinks‡, Samuel G. Rodriques

Scientific Reasoning Models

Training a Scientific Reasoning Model for Chemistry

Siddharth M. Narayanan, James D. Braza, Ryan-Rhys Griffiths, Albert Bou, Geemi Wellawatte, Mayk Caldas Ramos, Ludovico Mitchener, Samuel G. Rodriques, Andrew D. White arXiv:2506.17238, 2025

Improving Models

Pretraining	Large Data, Large Compute
Scaffolding	Domain knowledge
Reasoning	Domain knowledge, small data, small compute

Reasoning scaling

Can we build scientific reasoning models?

chemistry reasoning model

Works with molecular structures, but reasons in English

Start from base LLM and teach it chemistry

What can a reasoning model do?

Q:Propose a 1-step synthesis path that uses only commercially available reagents

Q: Propose a modification to this molecule to increase its solubility by about 1 LogS unit without affecting its scaffold.

data

Task	Subtasks	Examples	Verifier	Templates	Data source name
functional group	1	74562	code	6	ChEMBL
organism molecular formula	1	74164	molecule comparison	10	COCONUT
IUPAC name	1	74994	code	10	COCONUT
SMILES completion	1	74990	code	10	COCONUT
solubility edit	3	115977	ML model, code	15	ChEMBL
scent	180	4240	multiple choice	8	pyFUME
reaction prediction	1	61205	molecule comparison	10	ORD
retrosynthesis	1	67252	ML model, database	8	mcule
BBB permeability	2	2064	multiple choice	8	BBB
pKa	4	336	multiple choice	8	IUPAC
safety	11	5687	multiple choice	8	Pubchem
molecular formula	1	18738	code	10	COCONUT
ADME	12	1030	multiple choice	8	Fang ADME
LD50	2	342	multiple choice	8	Pubchem
Human receptor binding	150	1663	multiple choice	8	EveBio
property-regression-solubility	2	464	multiple choice	8	AqSolDB
property-regression-photo	1	23	multiple choice	8	Photoswitches
Total	374	577790	8	81*	12

Training Stages

Can learn from zero accuracy

Results vs humans and frontier models

Reasoning behavior

Ablation

More data efficient

Ablation

questions