No Cloud Required: Running an AI RAG Pipeline on your Phone

by Eric Collom | at Minnebar 20

Every RAG system you've ever seen has a dirty secret, it's phoning home. Every query, document chunk, and embedding sent to a server, processed in a data center, returned over a network you don't control. What if you could cut the cord entirely?

In this session, we'll walk through a fully on-device Retrieval-Augmented Generation pipeline featuring embedding models, a vector store, and a language model running entirely on consumer mobile hardware. No API keys. No latency spikes. No round trips. Full privacy.

Through live demos and architectural deep dives, we'll cover the full stack: converting embedding models to run on device, building an efficient chunking and indexing pipeline, performing vector similarity search without a backend, and using a local LLM.

The cloud isn't always the answer sometimes it is the edge.

Intermediate

Eric Collom

Eric has been an iOS developer 12 year and gets unreasonably excited about on-device AI. He builds Swift applications across a wide range of domains, and lately he has been deep in the world of on-device machine learning, exploring what becomes possible when you stop waiting on the cloud and start trusting the hardware already in your pocket.

When he's not coding, you can find him rock climbing, hiking, or camping. He likes his beer hoppy, his food spicy, and his coffee black.

LinkedIn


Are you interested in this session?

This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.

No participants yet

Similar Sessions

Does this session sound interesting? You may also like these: