Finished the final course of the Master of Computer Science program, CS 513: Theory and Practice of Data Cleaning. This course is a joint effort with the Library and Information Sciences (LIS) department and the CS department. Which was awesome for me since there was a point where I considered continuing on after undergrad to the LIS Masters program at UIUC. During the course of this class, I was introduced to a number of tools. My favorite was OpenRefine, which an open source tool initially from Google with a lot of options to clean data. For my final project I worked on the data that was crowd sourced by the NYPL on with over 1 million menu items from over 17,000 menus. This data is super messy as it is input by anyone, however can also tell a lot of interesting stories over the last 120+ years. The project itself was humbling and brought my personal Mac to it’s knees many times even feeding over half of the 32 GB of RAM. This project helped show me that truly 80% or more of the time is spent on cleaning the data to answer a simple research question.

LinkedIn Post