Much of the first wave of Data Science programs was built on a foundation of already existing computing-oriented classes; less effort was spent on how people from diverse backgrounds and disciplines approach or could optimally collaborate on data science problems. At Carnegie Mellon, the Department of Statistics & Data Science teaches thousands of students with future degrees ranging from Pre-Med to Rhetoric to Chemistry to Business to Statistics & Machine Learning and are well positioned to tackle this pedagogical challenge. The last three years we built and developed ISLE (Interactive Statistics Learning Environment), an interactive platform that removes the computing cognitive load and lets students and re-training professionals explore Statistics & Data Science concepts in both structured and unstructured ways. See http://www.stat.cmu.edu/isle for more details. The platform also supports student-driven inquiry and case studies. We track and model every click, word used, and decision made throughout the data analysis pipeline from loading the data to the final written report. The platform is flexible enough to allow adaptation, providing different modes of data analysis and active learning, and collaborative opportunities for different subsets of the population. The resulting data sets are invaluable in capturing behavioral data science information and generate interesting statistical methodological questions about how to model the learning processes using data of mixed modality (clicks, text, audio, video, etc). We present some initial methodological work with an emphasis on developing variable selection methods when clustering circular data (text). In short, teaching Data Science while simultaneously learning how we (should) do it.
During the Spring 2020 academic quarter, the CSSS Seminar Series will be conducted online. Please contact Will Brown if you are interested in attending (brownw at uw dot edu).