As part of our Data Science tools and models course, targeted at biomedical informatics graduate and post-doctoral students, we utilize several common data science tools (PostgreSQL, Python/Numpy, Spark, and TensorFlow) that can be complex and/or time consuming to install on individual computers. We utilize two infrastructure tools, Docker and a JupyterHub, that have significantly reduced our start-up time in getting students up and running on tools with complex installations. Docker is a suite of tools that enable user-friendly containerization that can reduce the burden of environment setup, while JuptyerHub is a server that supports authentication and runs Jupyter Notebooks, our main assignment delivery mechanism.
We will demonstrate the steps involved in setting up and using Docker for complex data science assignments and show how we use JupyterHub to provide an install-free workspace for students. We will discuss the challenges and opportunities we have encountered as we scale our installation in preparation for a Department of Engineering wide deployment of these tools.
Gabe Vacaliuc, Rice University
Risa Myers (Presenter)