Exploring Data Journalism: November 2011

Monday, November 28, 2011

As we have mentioned before, we have one table in particular that has been giving us headaches, the locations. We have over 2,400 locations that have been given as an intersection instead of one street with a block number. The first set of instructions that our professor, Chris Krug gave us was to use Google maps to try and find a block number for the largest street of the two in the intersection.

After realizing that this would be an extremely long process we were advised to contact the City Planner's Office in Norman. I talked to Joyce Green, Manager GIS Services Division, I was hoping that the department would have some sort of list of the intersections within the city's limits and what block it falls into on both streets. Joyce informed me that no such list exists. The office actually uses a software program called Geographic Information Systems. I was also told that there is a detailed map that we could print for $200. Since neither Lilly nor I have large amounts of money lying around this isn't a possibility.

So, we have now been advised to try to find the longitude and latitude of the intersections that have been impossible to find block numbers for. The ones that have been the most problematic are the ones that involve Highway 9 and the Interstate 35. We hope to be able to use another facet of Google maps to figure out this information on the more difficult intersections. We are hoping that by building our table around Google maps and latitude and longitude the future viewers of our database and website will be able to see the locations of these tickets in a more visual way on a map.

Tuesday, November 22, 2011

The reason that we decided to go ahead and process the simpler tables was because many of our tables were extremely complex. The most complex one by far is the location of where the ticket took place. Some of the entries are block numbers with the accompanying street name. For example: 800 Classen Blvd. Others are entered as intersections. For example: Robinson St. and 24th St. This has caused major delays in assembling the table. Our professor, Chris Krug has suggested that we organize all the locations by block number. This has required Lilly and I to go through each piece of data and choose the most prominent street of the two and figure out what block of the street the intersection is on. Since there are over 2,400 unique locations that tickets were given we are attempting to work with the city planner's office to see if they have records that already hold this information in order to speed up the process. The picture above is to show how inconsistent the data is. Lindsey St. and 12th St. are entered into the police database in four different ways.

Sunday, November 20, 2011

We have started to sort through our thousands of entries of data and split them into categories of similarity. Some of the tables are easier to sort through than others. For example, the gender of the driver is rather simple but others like the location of where the ticket happened is much more complex because sometimes block numbers are given and other times intersections. This requires us to give each piece of data individual attention. Because of the complex categories we have decided to go ahead and begin building the tables in our database that contain the simpler categories. These categories are: Citations, Genders, Ticket Types, Officers, and License Plate States. We will come back and add the more complex categories after they have been properly sorted through.

Thursday, November 10, 2011

After a few semesters of building websites, reporting on in-depth trend stories in our communities and classes full of discussions on what the future of journalism might look like, Ashley and I decided to take all of this preparation and actually do something with it. We've been doing class projects for years--some that could definitely be considered good, solid journalism--but they were always graded and filed away, never actually doing their job of informing citizens. So, during our last year of school (or last semester, in Ashley's case), we're building the real deal. Although this project is technically an independent study under Chris Krug, our professor, it is a real journalistic endeavor for us.

We have been working with Cpt. Tom Easley of the Norman Police Department to gather all the information of every traffic ticket or warning issued in the past year in Norman, Okla. We have a massive spreadsheet with the date, time, location, issuing officer and violation of the ticket/warning, the make, model, color and state of the car pulled over, and the driver's gender, race and age. We are in the middle of processing all the information--building spreadsheets for each individual variable and fleshing out the information. Once we get all this in order, we are going to build a website with a detailed search function so that users may compare any of the variables to find trends. The website will also eventually contain summaries and infographics of the main trends we've found, and maybe a forum or ability for users to comment on the findings.

Right now, we're excited and overwhelmed with this project. We initially wanted to look at tickets and warnings for the past five years, but when we found out that just one year of data consists of over 38,000 entries, we agreed to start with that. Already, we are finding that a lot of the records are incomplete, ambiguous, and that there's no set system for how the information is entered. Going through and trying to decipher the abbreviations for violations, car models, etc. is taking longer than we thought it would, but getting all of the data cleaned up and organized will make things easier in the long run.

This is my first foray into data journalism, and I'm really excited about it. The data is straightforward and objective, and we'll definitely see a lot of patterns emerge without having to rely on human opinion. It's especially thrilling to apply it to the mysterious process of traffic violations--there are so many questions and rumors about how this works, and these can only really be answered through data. If you asked an officer if they pull over more red cars than others, if they give out more tickets at the end of the month, or other similar questions, you may get an ambiguous answer. By answering these questions through crunching data, you get undeniable facts. That's what appeals to me most about data journalism.

Although we don't even have a name for our website yet, we are really optimistic about this project. Depending on how things go, we will probably continue to work on this long after the semester has ended. If it's successful, we may expand to other towns and compare their data to Norman's. Hopefully this will be something widely utilized by citizens and journalists alike, and will promote the endless possibilities that data journalism has to offer.
-Lilly