Research at the Speed of AI: What Works, What Breaks, and What Still Needs You

Karpathy's AutoResearch ran 700 experiments while he slept. It works because the objective is locked: one metric, lower is better, don't touch the evaluation function. Most research doesn't start there. Deciding what to measure, whether your benchmark is even valid, what a result means: that's the part that still requires a human. This talk is about that part.

I built a stateful AI research assistant to help me run an enterprise knowledge graph project: can organizational KG signals (reporting chains, projects, meeting attendance) help LLM agents resolve ambiguous people references in enterprise conversations? Think "set up a meeting with the data scientist on the Atlas project" when three people hold that role, two of whom changed positions last quarter. We built a synthetic org, generated a benchmark, ran experiments across three models and six retrieval conditions.

Here's what I learned the hard way.

The assistant has knowledge boundaries it won't volunteer. Infrastructure failures can corrupt results silently. Design flaws in your data go unquestioned if you don't ask. Confident-sounding write-ups can outrun what the numbers actually support. And fast-generated code works until you need to defend it.

These aren't reasons to stop. They're reasons to develop practices: specify behavior before you ask for implementation, verify outputs against raw sources rather than summaries, make "why did you do it this way?" a standing question, and treat the first draft narrative as a starting point, not a conclusion.

I'll walk through the failure modes concretely, show what the collaboration actually looked like, and share what moved the needle on both speed and rigor.

Format: Story + examples + Q&A

Advanced
Development
AI/ML

Frank Schilder

Frank Schilder is a senior principal scientist at Thomson Reuters Labs, specializing in AI and machine learning. He holds a Ph.D. in Cognitive Science from the University of Edinburgh and previously taught at the University of Hamburg, Germany.

Are you interested in this session?

This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.

Interested Participants

Similar Sessions

Does this session sound interesting? You may also like these:

Research at the Speed of AI: What Works, What Breaks, and What Still Needs You

Frank Schilder

Are you interested in this session?

Interested Participants

Similar Sessions

10x'ing Myself and My Team: Leading With Agents, Not Just Using Them

Building a Knowledge Graph Your AI Can Actually Use

AI Told You So Again: How I Built a $1B Unfair Advantage ... And So Can You (Meet OpenClaw)

Skill(.md)s, context engineering, and a way to take the garbage out

What tech lessons should we learn from the ICE invasion?