Monday, November 28, 2011
Tuesday, November 22, 2011

The reason that we decided to go ahead and process the simpler tables was because many of our tables were extremely complex. The most complex one by far is the location of where the ticket took place. Some of the entries are block numbers with the accompanying street name. For example: 800 Classen Blvd. Others are entered as intersections. For example: Robinson St. and 24th St. This has caused major delays in assembling the table. Our professor, Chris Krug has suggested that we organize all the locations by block number. This has required Lilly and I to go through each piece of data and choose the most prominent street of the two and figure out what block of the street the intersection is on. Since there are over 2,400 unique locations that tickets were given we are attempting to work with the city planner's office to see if they have records that already hold this information in order to speed up the process. The picture above is to show how inconsistent the data is. Lindsey St. and 12th St. are entered into the police database in four different ways.
Sunday, November 20, 2011
We have started to sort through our thousands of entries of data and split them into categories of similarity. Some of the tables are easier to sort through than others. For example, the gender of the driver is rather simple but others like the location of where the ticket happened is much more complex because sometimes block numbers are given and other times intersections. This requires us to give each piece of data individual attention. Because of the complex categories we have decided to go ahead and begin building the tables in our database that contain the simpler categories. These categories are: Citations, Genders, Ticket Types, Officers, and License Plate States. We will come back and add the more complex categories after they have been properly sorted through.
Thursday, November 10, 2011
We have been working with Cpt. Tom Easley of the Norman Police Department to gather all the information of every traffic ticket or warning issued in the past year in Norman, Okla. We have a massive spreadsheet with the date, time, location, issuing officer and violation of the ticket/warning, the make, model, color and state of the car pulled over, and the driver's gender, race and age. We are in the middle of processing all the information--building spreadsheets for each individual variable and fleshing out the information. Once we get all this in order, we are going to build a website with a detailed search function so that users may compare any of the variables to find trends. The website will also eventually contain summaries and infographics of the main trends we've found, and maybe a forum or ability for users to comment on the findings.
Right now, we're excited and overwhelmed with this project. We initially wanted to look at tickets and warnings for the past five years, but when we found out that just one year of data consists of over 38,000 entries, we agreed to start with that. Already, we are finding that a lot of the records are incomplete, ambiguous, and that there's no set system for how the information is entered. Going through and trying to decipher the abbreviations for violations, car models, etc. is taking longer than we thought it would, but getting all of the data cleaned up and organized will make things easier in the long run.
This is my first foray into data journalism, and I'm really excited about it. The data is straightforward and objective, and we'll definitely see a lot of patterns emerge without having to rely on human opinion. It's especially thrilling to apply it to the mysterious process of traffic violations--there are so many questions and rumors about how this works, and these can only really be answered through data. If you asked an officer if they pull over more red cars than others, if they give out more tickets at the end of the month, or other similar questions, you may get an ambiguous answer. By answering these questions through crunching data, you get undeniable facts. That's what appeals to me most about data journalism.
Although we don't even have a name for our website yet, we are really optimistic about this project. Depending on how things go, we will probably continue to work on this long after the semester has ended. If it's successful, we may expand to other towns and compare their data to Norman's. Hopefully this will be something widely utilized by citizens and journalists alike, and will promote the endless possibilities that data journalism has to offer.
-Lilly