Jupyter Notebooks are cool: from a MySQL MOOC to an Apache Spark Cluster

by Mark McCahill | at MinneBar 12 | 12:10 – 1:00 in Texas | View Schedule

Jupyter notebooks are open-source interactive web applications that make it fun and easy to author documents combining live code, visualizations, equations, and text -- just what you would want to learn or experiment with a data analysis tool.

This session will demonstrate how Jupyter is used at Duke University for interactive course applications ranging from a Managing Big Data with MySQL Coursera MOOC featuring live MySQL queries for thousands of students to a graduate-level computational biology class where students develop and run genome analysis tools on an Apache Spark cluster.

We will also look at the infrastructure used to run Jupyter at scale, how to containerize Jupyter notebook servers with Docker, automate deployment of new versions of the notebooks with GitLab continuous integration, and the pitfalls of providing students with live SQL query capability against million row databases - and countermeasure for those pitfalls.

All levels

Mark McCahill

Mark McCahill has been involved in developing and popularizing a number of Internet technologies since the late 1980s.

McCahill led the development of the Gopher protocol, the effective predecessor of the World Wide Web, he was involved in creating and codifying the standard for Uniform Resource Locators (URLs) and he led the development of POPmail, one of the first e-mail clients which had a foundational influence on later e-mail clients and the popularization of graphical user interfaces in Internet technologies more broadly. He also coined the phrase "surfing the Internet.

He currently works as a systems architect at the Office of Information Technology at Duke University.

wikipedia entry: [https://en.wikipedia.org/wiki/Mark_P._McCahill]