Mouth to MIDI in 11 Milliseconds: A Live Beatbox Drum Engine I Built on Nights and Weekends

by Sean Hobin | at Minnebar20

Lots of AI sessions this year are about text. This one is about your mouth.

BeastBox is a real-time beatbox-to-MIDI engine I've been building on nights and weekends. You make a kick-drum sound. 11 milliseconds later, a real kick drum hits. Snare, hi-hat? Same thing. No quantization, no lag you can feel, no cloud round-trip. Just a microphone, a tiny CNN, and a lot of C++ behind the curtain.

What you'll see, live on stage:

  • I'll beatbox a full drum pattern and you'll hear it come back as a real drum kit in real time
  • The visualizer shows the onset detector firing and the CNN's class probabilities updating hit by hit
  • A guided tour of the architecture: an ONNX model trained in PyTorch, a native JUCE audio engine, cross-platform native code, and a Flutter UI bridged in with a lock-free ring buffer
  • What breaks, what surprised me, what I'd do differently
  • The specific tricks that got end-to-end latency under 20ms (typically 9 to 11) on a laptop CPU

BeastBox is a personal passion project, not a pitch. I want feedback from a room of builders, especially people who care about DSP, on-device ML, or real-time systems. If you've ever wanted to beatbox a whole drum kit into existence, or if you just want to see what it takes to run an extremely low-latency neural net inside an audio-thread budget, come hang out.

Sean Hobin

This person hasn't yet added a bio.


Are you interested in this session?

This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.

Interested Participants