Mouth to MIDI in 11 Milliseconds: A Live Beatbox Drum Engine I Built on Nights and Weekends

by Sean Hobin | at Minnebar20 | 1:25 – 2:05 in Texas | View Schedule

Lots of AI sessions this year are about text. This one is about your mouth.

BeastBox is a real-time beatbox-to-MIDI engine I've been building on nights and weekends. You make a kick-drum sound. 11 milliseconds later, a real kick drum hits. Snare, hi-hat? Same thing. No quantization, no lag you can feel, no cloud round-trip. Just a microphone, a tiny CNN, and a lot of C++ behind the curtain.

What you'll see, live on stage:

  • I'll beatbox a full drum pattern and you'll hear it come back as a real drum kit in real time
  • The visualizer shows the onset detector firing and the CNN's class probabilities updating hit by hit
  • A guided tour of the architecture: an ONNX model trained in PyTorch, a native JUCE audio engine, cross-platform native code, and a Flutter UI bridged in with a lock-free ring buffer
  • What breaks, what surprised me, what I'd do differently
  • The specific tricks that got end-to-end latency under 20ms (typically 9 to 11) on a laptop CPU

BeastBox is a personal passion project, not a pitch. I want feedback from a room of builders, especially people who care about DSP, on-device ML, or real-time systems. If you've ever wanted to beatbox a whole drum kit into existence, or if you just want to see what it takes to run an extremely low-latency neural net inside an audio-thread budget, come hang out.

Sean Hobin

I'm an AI engineer at Securian Financial, where I build production ML systems and cloud infrastructure. My background spans healthcare ML, RAG pipelines, and production audio-thread C++ - the last of which I learned entirely by accident while building BeastBox. I'm also an avid rock climber and hobbyist musician, which mostly means I spend my free time either on a wall or making mouth sounds at a computer πŸ˜› This is my first Minnebar session.


Are you interested in this session?

This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.

Interested Participants