Pennsylvania's Department of Transportation released data on every crash incident reported to the police between 2004 and 2015 for the five county southwestern Pennsylvania area. The data is available on the WPRDC.
It's a huge dataset, currently with almost 150,000 crashes and almost 200 variables. I built a web app using R Shiny that displays some of the more understandable variables, such as whether or not the crash resulted in a fatality, or whether there were pedestrians or cyclists involved. This app allows users to explore how the distribution of crash types or injury counts, for example, vary when filtering on different variables.
Scroll through app below, or find full screen app here.
Crash data are derived from the information that comes from a reportable crash, generally meaning someone was injured or killed, or there was “damage to any vehicle to the extent that it cannot be driven under it’s own power in it’s customary manner without further damage or hazard to the vehicle”. Crash data does not include non-reportable crashes or near misses. This, in addition to some errors in reporting, may undercount crashes involving pedestrians and cyclists, as explained in a report from BikePGH.
So far I've started a preliminary analysis of the data. There may be a difference between all crashes and more severe crashes with respect to weekend versus weekday, time of day, and commute versus off-peak time
I found a similar difference in an analysis of UK crash data last year (snapshot below, full report here). My conclusion there was that more accidents happened in the city, during commutes, but more severe crashes happened on the weekends, at night, and in rural areas of the country. I plan to do a similar deeper analysis of the Allegheny County data, using Lasso and Random Forests to predict what factors result in more injuries and more fatalities, which could, in turn, inform policy responses.