My first semester of grad school, I took DSC-530 - a Data Visualization course. The course was centered around using D3.js to create the most effective visualizations for a given data set, and to expose patterns in data that are not so obvious with standard charts. The final project in the course was to pick your own data source, and then generate a visualization with three different views to express that data.

Here is my completed interactive visualization, with some explanation below:

Phil Igoe Data Visualization (DSC 530/602-02), 2015 College Crime in the United States

Total Crimes

Students per crime

Crime filter:

: Ranking of Colleges

Rank	School

Brief explanation on how to use - there are three main views: country view, state view, and college view. You start at the country view, and clicking on a state enters you into the state view, then clicking on a dot (represents a college) brings you to the college view.

There is a legend and vis controls on the right, and when you are in the state view (click on a state) there is a table below the vis that prints all the values in the current view.

Obtaining and parsing data

The first step to creating this vis was obtain the data. Luckily enough, the FBI keeps publicly available crime statistics on colleges in the United States. The statistics are labeled as Table 9 in the FBI’s Crime in the United States dataset, and is available as a CSV download. To view / download the data, visit the following URL: FBI Table 9

One challenge with the data was the fact that it did not give latitude and longitude for the colleges. As a result, I had to write a post processing script which would use Mapzen to use a reverse lookup using the college name, and if applicable the campus name, to get the coordinates of the college. I also wrote another post processing script to aggregate the total values per state for every college. I could have done this with the frontend javascript, though it made more sense to me to generate another CSV with the per state aggregated values for performance reasons.

Here’s the script I wrote to process the initial data:

And here’s a peek at how the outputted data looks:

I then wrote another script to calculate the aggregate totals for all colleges within a state, providing the vis with state by state totals. Here’s how the first few rows of this looks:

Now we’re all set: all we needed for the vis are these two outputted CSV files - one containing aggregated totals for each state, and then one containing information for each college in all states.

Wrap up

While making this, the one thing I would have changed was to make the code more componentized and modular - when I started out the scope wasn’t as big, and as it grew it became more difficult to add new features / fix bugs. It definitely would have benefited from being more organized in this way.

You can see the source code by inspecting the above live example, or by looking at the public GitHub repository here.