### The case for untrustworthy data

A graph found on the Colorado Department of Transportation (CODOT) website (Figure 1) aims to visualize how 2016 monthly vehicle fatalities compare to the historical minimum, maximum, and average monthly fatality numbers in the state from 2002-2015. The graph has several dangerous flaws, which could possibly be a result of intentional deception. For one thing, there are two different scales on either side of the graph, one ranging from 0 to 120 measuring historical numbers and the other ranging from -30 to 90 measuring 2016 numbers, both of whose goal is to compare the exact same units of data (vehicle fatalities). The graph neglects the important concept of representing data points as a position along a scale, instead opting for wide intervals between gridlines and fatality numbers printed on top of the graph. This causes problems such as numbers falling on top of one another, to the point where they become illegible (see November). In addition, bar graphs, which are typically used to represent categorical data, are in this case used to represent minima, maxima, and averages.

CODOT’s data collection methodologies appear to be extremely neglectful. The source of the above graph accessed on October 24, 2016 showed the minimum value for January as 15, the average 33, the maximum 53, and the 2016 value 29. The same graph accessed November 7, 2016 shows completely different numbers. Comparisons between fatality data obtained from the National Highway Traffic Safety Administration (NHTSA)—the alleged source of the data of the graph—and data used by CODOT reveal many similar discrepancies.

A news briefing from CODOT titled Colorado Traffic Fatalities Surge in 2015 states “preliminary data from [CODOT] indicates that traffic fatalities rose by 10% in 2015. In 2015 there were 545 traffic fatalities in Colorado, compared to the 488 fatalities in 2014” (https://www.codot.gov/news/2016-news-releases/01-2016/colorado-traffic-fatalities-surge-in-2015). What CODOT fails to take into account is the statistical significance of these numbers, and whether they are even accurate to begin with. What is the true nature of the historical and current patterns of vehicle fatalities in Colorado?

### Visualizing accurate fatality trends over time

Historical fatality data for each county in Colorado was mined from the NHTSA Fatality Analysis Reporting System (FARS) encyclopedia. Though FARS data is currently only available from 1994 up to 2014, the tools used to construct the redesigned graphs can also be used to visualize 2016 data, upon its release. A series of choropleth graphs (Figure 2) was constructed in R using a package called choroplethr (https://cran.r-project.org/web/packages/choroplethr/). This package was favorable because one of its functions, county_choropleth, utilizes unique identifiers for each U.S. county, unlike most other shapefile-based packages. The package choroplethMaps was used to obtain a dataset called county.regions, which was subsequently mined to obtain the Colorado county identifiers. The identifiers were then matched up with a variety of different FARS data.

The ability to visualize how the number of vehicle fatalities in Colorado has changed over time, per region, is far more valuable than simply showing how one year’s worth of fatality data compares to historical numbers, and paints a less dismal picture of where the trends are headed than what was portrayed by CODOT’s original graph. Clearly, vehicle fatalities in Colorado have decreased over the last 20-year span. Most counties have seen a decrease in upwards of 16 fatalities, while the counties that show an increase have only gone up by as much as 5 fatalities. Though fatalities in a few major counties seems to have increased, fatalities in most other counties appear to have decreased.

Though choropleths are a great tool for visualizing large amounts of regional fatality data, they are limited in that they do not provide a microscopic view of how fatalities vary as a function of time, or where the numbers will end up in the future. This is where statistical modeling comes into play.

### Predicting future fatalities

Did 2015 demonstrate a true “surge” in fatalities, as CODOT asserted? This would be difficult to assess using the exact data CODOT used to back up this claim, given their track record with faulty data. NHTSA data provides more expansive and reliable numbers that would be far better suited for analysis.

A time series analysis reveals that the 95% confidence interval for the number of fatalities predicted to occur in 2015 (based on historical data from 1994-2014) is [422.0754, 553.9234]. This indicates that CODOT’s claim that a 10% traffic fatality increase (488 to 545 between 2014 and 2015) is a “surge” need not necessarily be true, mathematically speaking, since 545 falls within the confidence interval.

A time series analysis reveals that the 95% confidence interval for the number of fatalities predicted to occur in 2015 (based on historical data from 1994-2014) is [422.0754, 553.9234]. This indicates that CODOT’s claim that a 10% traffic fatality increase (488 to 545 between 2014 and 2015) is a “surge” need not necessarily be true, mathematically speaking, since 545 falls within the confidence interval.

As evidenced by Figure 3, a graph of number of the number of fatalities over time, vehicle fatalities in Colorado have noticeably declined over the last two decades. There has been a gradual increase in fatalities since 2010, which CODOT claims in their news briefing is the result of fewer people wearing seatbelts. However, claiming that the increase in fatalities between 2014 and 2015 is a “surge” is a bit misleading, as this language would indicate that the positive difference in fatality numbers between the two years was statistically significant, when in fact it was not. This goes to show that providing numbers and percentages to an audience with no further historical or mathematical context can easily be used to manipulate people’s perception of the data.