NLP intro: Some (bad) ways to generate text

by Sam Cartford | at Minnebar 15 | Thu, Oct 8 • 10:30 – 10:55 in Phalen Track | View Schedule

Natural Language Processing (NLP) is probably the fastest growing and most impactful Machine Learning sub-field. New state-of-the-art language models have been leapfrogging each other for a few years, and it's hard to keep up. Here we'll investigate interpretable ways to generate text, and see how training set and model size affect the output we see. We'll start with simple statistical models and work toward neural networks that use word embeddings, looking at the quirks along the way!

This is meant to be a fun demo of basic approaches not used in production. If you've ever been curious about how algorithms might generate text, this should be a good intro. If you're looking for ways to improve your models' F1 scores, you probably won't find that here, but I'd love to have you join to help answer questions and improve the general discussion. If you'd like to see any of the demos or ask questions ahead of time, feel free to email me (!


Sam Cartford

Hello and welcome! I'm Sam. Usually I'm working on cybersecurity and data projects, and now I also moonlight as an entrepreneur trying to build retail investors a research platform. I love climbing and snowboarding, and all my projects are usually built in python and on AWS. Please reach out if you 1) are interested in helping retail investors increase their alpha, 2) have any questions you think I may be able to answer, 3) are a freelance designer, or 4) teach the tenor saxophone (I'm a good student, I swear). You can reach me at