Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Data and Algorithmic Modeling Approaches in Crash Analysis

Yashu Kang, University of Nebraska - Lincoln


Traffic crashes cause significant loss of life and property across the world. Analyzing transportation safety data provides insights and assists identification of cause-and-effect relationships with crash probabilities and outcomes. There are two paradigms in analyzing crash data, data modeling approach and algorithmic modeling approach, that reflect the process of using statistical (or machine learning) methods for inference or prediction, respectively. This study utilized eight years of police-reported crash data obtained from the Nebraska Department of Transportation. Data models including negative binomial regression and multinomial logistic regression and algorithmic models such as stacked regression and deep neural network were used in this study. This research found algorithmic models outperformed data models in predictive capabilities in both regression (crash frequency) and classification (crash injury severity) problems. While the lack of interpretability power of algorithmic models limits their usage, the adoption of SHAP (SHapley Additive exPlanations) values was an improvement. Conclusions drawn from both approaches are generally consistent. Salient findings regarding contributing factors include: with 95% significance level, the model indicated vehicle miles traveled (VMT) was statistically significant and associated with crash frequency. Each one million VMT increase per year in a county could lead to an estimated increase of 17.67% to 24.47% in the estimated number of crashes over eight years. Sufficient statistical evidence was not found to indicate significance of the other variables such as unemployment rate, median age, median household income and gender ratio. With 95% significance level, 16 variables were found statistically significant in at least one scenario of: injury relative to property damage only (PDO) and disabling/fatal relative to PDO. These variables are total trucks/buses, total pedestrians, total occupants, female driver involvement, motorcycle involvement, pedestrian involvement, farm equipment involvement, alcohol involvement, rural area, crash type, painted median, road surface, crash occurrence on roadway and in traffic, etc. For instance, keeping all other variables constant, an alcohol-related crash was 5.1 times more likely associated with disabilities/fatalities compared to PDO crash (with 417.6% higher risk). These findings are beneficial for improving highway safety and making better transportation safety policies.

Subject Area

Transportation|Statistical physics|Geographic information science|Applied Mathematics|Civil engineering

Recommended Citation

Kang, Yashu, "Data and Algorithmic Modeling Approaches in Crash Analysis" (2021). ETD collection for University of Nebraska - Lincoln. AAI28713639.