Open Data

College Scorecard Application

Valerie Parham-Thompson

Still working on the College Scorecard dataset. Previously I explored the dataset in a real-world application, talked about how to clean the data, and worked with the data API.

Now I’ve decided I want to put this in a web application so others can use the dataset in the same flexible way that I have been using it. (Reminder that several popular college-search websites exist, but they are limited in the ways you can filter the data. Also they tend to gather personal data for I suppose ad generation.)

College Scorecard API

Valerie Parham-Thompson

After I finished the YugabyteDB universe network mapping example, I started thinking about other things to map. Anything with latitude and longitude will work. College locations from my previous work on the College Scorecard data set were an obvious choice.

Previously, I had exported the data and transformed it to allow for sorting and analysis. That’s still a valid method if you want to play with the pull data set, since the API allows only page size of max 100 at a time. However, with the right filters, that might be enough, and the API is a quicker path to getting the data.

Code as Instructional Technology

Valerie Parham-Thompson

I’ve had the chance to share my database expertise in a variety of venues: speaking at meetups and conferences, leading hands-on workshops, mentoring new technologists, and of course writing.

I had been brewing a new idea for sharing content when a great opportunity landed in my lap.

The idea was: share what I know about managing a specific database product in code. Instead of creating a runbook for how to set up replication, I would write code that sets up replication. The key part is that it would have to be well-organized, commented, and documented to be useful to learners. Making it interactive would help users understand the options and parameters as they chose the commands and added flags. Even the error statements would give them insight into how it all works.

Cleaning College Scorecard Data

Valerie Parham-Thompson

Cleaning the College Scorecard data before using it locally to query columns of interest to us in a college search allowed me to use correct datatypes and to fit the data into a Postgres table. In case you haven’t had a chance to see other walkthroughs of my automation process for various demo needs, the full Ansible setup will download data from a source and then load it into a table. Previously, I have used the process for MoMA art and artists data, and for generating a million-row table, and storing these in different YugabyteDB topologies. I leveraged this recently to load College Scorecard data for our child’s college search.

Open Data College Scorecard

Valerie Parham-Thompson

I’ve been pretty quiet recently, I know. My youngest has been going through the college application phase, which has taken a lot of time for both of us. I wouldn’t give up all the lovely college visits and overnights, but I might easily part ways with the paperwork.

I do have something fun to share from the experience. I didn’t like the limitations of common college search websites. In particular, we were looking for a college in a subset of surrounding states. Most college search forms allow you to enter only a region, and the Southeastern region was too broad for her search. I also didn’t like that signing up for the sites subjected you to a lot of marketing.