Research at the Speed of AI: What Works, What Breaks, and What Still Needs You
by Frank Schilder | at Minnebar20
Karpathy's AutoResearch ran 700 experiments while he slept. It works because the objective is locked: one metric, lower is better, don't touch the evaluation function. Most research doesn't start there. Deciding what to measure, whether your benchmark is even valid, what a result means: that's the part that still requires a human. This talk is about that part.
I built a stateful AI research assistant to help me run an enterprise knowledge graph project: can organizational KG signals (reporting chains, projects, meeting attendance) help LLM agents resolve ambiguous people references in enterprise conversations? Think "set up a meeting with the data scientist on the Atlas project" when three people hold that role, two of whom changed positions last quarter. We built a synthetic org, generated a benchmark, ran experiments across three models and six retrieval conditions.
Here's what I learned the hard way.
The assistant has knowledge boundaries it won't volunteer. Infrastructure failures can corrupt results silently. Design flaws in your data go unquestioned if you don't ask. Confident-sounding write-ups can outrun what the numbers actually support. And fast-generated code works until you need to defend it.
These aren't reasons to stop. They're reasons to develop practices: specify behavior before you ask for implementation, verify outputs against raw sources rather than summaries, make "why did you do it this way?" a standing question, and treat the first draft narrative as a starting point, not a conclusion.
I'll walk through the failure modes concretely, show what the collaboration actually looked like, and share what moved the needle on both speed and rigor.
Format: Story + examples + Q&A
Frank Schilder
Frank Schilder is a senior principal scientist at Thomson Reuters Labs, specializing in AI and machine learning. He holds a Ph.D. in Cognitive Science from the University of Edinburgh and previously taught at the University of Hamburg, Germany.
Are you interested in this session?
This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.
No participants yet
Similar Sessions
Help us find similar sessions by signing up for them!