Supplemental Materials

The association between crashes and safety-critical events:
synthesized evidence from crash reports and naturalistic driving data among commercial truck drivers

2020-05-12

DOI

This website serves as the supplementary materials for the manuscript The association between crashes and safety-critical events: synthesized evidence from crash reports and naturalistic driving data among commercial truck drivers by Miao Cai, Mohammad Ali Alamdar Yazdi, Amir Mehdizadeh, Qiong Hu, Alexander Vinel, Karen Davis, Fadel Megahed, Hong Xian, and Steven Rigdon.

The website is hold on a GitHub repository. A fake ping data set (similar to real data but not the real data for confidentiality reasons) is accessible at the data folder. The Rmarkdown file that includes all the code to reproduce this website is accessible here.

1 Ping data and trip aggregation

1.1 Ping data demonstration

## also installing the dependencies 'selectr', 'rvest', 'readr', 'webshot'
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0/PACKAGES'
## package 'selectr' successfully unpacked and MD5 sums checked
## package 'rvest' successfully unpacked and MD5 sums checked
## package 'readr' successfully unpacked and MD5 sums checked
## package 'webshot' successfully unpacked and MD5 sums checked
## package 'kableExtra' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\zheng\AppData\Local\Temp\Rtmpcl9A1E\downloaded_packages
## 
## kableExtra installed
## also installing the dependency 'mnormt'
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0/PACKAGES'
## package 'mnormt' successfully unpacked and MD5 sums checked
## package 'psych' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\zheng\AppData\Local\Temp\Rtmpcl9A1E\downloaded_packages
## 
## psych installed
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0/PACKAGES'
## package 'corrplot' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\zheng\AppData\Local\Temp\Rtmpcl9A1E\downloaded_packages
## 
## corrplot installed
## also installing the dependencies 'iterators', 'bitops', 'foreach', 'gdata', 'caTools', 'TSP', 'qap', 'gclus', 'dendextend', 'gplots', 'registry', 'seriation'
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.0/PACKAGES'
## package 'iterators' successfully unpacked and MD5 sums checked
## package 'bitops' successfully unpacked and MD5 sums checked
## package 'foreach' successfully unpacked and MD5 sums checked
## package 'gdata' successfully unpacked and MD5 sums checked
## package 'caTools' successfully unpacked and MD5 sums checked
## package 'TSP' successfully unpacked and MD5 sums checked
## package 'qap' successfully unpacked and MD5 sums checked
## package 'gclus' successfully unpacked and MD5 sums checked
## package 'dendextend' successfully unpacked and MD5 sums checked
## package 'gplots' successfully unpacked and MD5 sums checked
## package 'registry' successfully unpacked and MD5 sums checked
## package 'seriation' successfully unpacked and MD5 sums checked
## package 'corrgram' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\zheng\AppData\Local\Temp\Rtmpcl9A1E\downloaded_packages
## 
## corrgram installed
Fake ping data
driver ping_time speed lat long
a fake driver 2014-09-30 23:00:00 0 35.4 -77.9
a fake driver 2014-09-30 23:15:00 0 35.4 -77.9
a fake driver 2014-09-30 23:16:00 0 35.4 -77.9
a fake driver 2014-09-30 23:18:00 0 35.4 -77.9
a fake driver 2014-09-30 23:21:00 12 35.4 -77.9
a fake driver 2014-09-30 23:24:00 54 35.4 -77.9
a fake driver 2014-09-30 23:30:00 56 35.4 -78.0
a fake driver 2014-09-30 23:30:00 58 35.4 -78.0
a fake driver 2014-09-30 23:30:00 1 35.4 -78.0
a fake driver 2014-09-30 23:37:00 23 35.4 -78.0

The above table shows a fake copy of our ping data, which is similar to but not the real data for confidentiality reasons. It is provided here to give the readers a flavor of how our ping data looks like. The data set includes the date and time of the record (year, month, day, hour, and minute [the seconds here are rounded to 0]), latitude and longitude (specific to five decimal places), speed, and drivers’ anonymized unique ID.

1.2 Aggregating ping data into trips

The self-defined function segment_0() below is used to separate the fake ping data into trips according to a threshold value.

The code block below shows R code to separate the fake ping data into different trips by adding a trip_id column. Here we set the threshold value as 30 minutes, which means that the ping data will be separated into different trips when the vehicle is not moving (the speed of the ping equals zero) for more than 30 minutes. The pings that has trip_id of 0 are stopping pings.

Sample ping data with time difference and trip ID
driver ping_time speed lat long diff trip_id
a fake driver 2014-09-30 23:00:00 0 35.4 -77.9 0 0
a fake driver 2014-09-30 23:15:00 0 35.4 -77.9 15 0
a fake driver 2014-09-30 23:16:00 0 35.4 -77.9 1 0
a fake driver 2014-09-30 23:18:00 0 35.4 -77.9 2 0
a fake driver 2014-09-30 23:21:00 12 35.4 -77.9 3 1
a fake driver 2014-09-30 23:24:00 54 35.4 -77.9 3 1
a fake driver 2014-09-30 23:30:00 56 35.4 -78.0 6 1
a fake driver 2014-09-30 23:30:00 58 35.4 -78.0 0 1
a fake driver 2014-09-30 23:30:00 1 35.4 -78.0 0 1
a fake driver 2014-09-30 23:37:00 23 35.4 -78.0 7 1

The table below is the aggregated trips data, including start time, end time, trip length, and the median trip time. The ping table and trips table can be merged into one table using driver ID and trip ID.

Aggregated trips from sample ping
driver trip_id start_time end_time trip_length trip_median
a fake driver 1 2014-09-30 23:21:00 2014-10-01 05:45:00 384 2014-10-01 02:33:00
a fake driver 2 2014-10-01 06:26:00 2014-10-01 07:45:00 79 2014-10-01 07:05:30
a fake driver 3 2014-10-01 08:54:00 2014-10-01 12:26:00 212 2014-10-01 10:40:00
a fake driver 4 2014-10-01 23:06:00 2014-10-02 00:04:00 58 2014-10-01 23:35:00
a fake driver 5 2014-10-02 01:15:00 2014-10-02 02:00:00 45 2014-10-02 01:37:30
a fake driver 6 2014-10-02 04:21:00 2014-10-02 05:32:00 71 2014-10-02 04:56:30
a fake driver 7 2014-10-02 08:13:00 2014-10-02 12:45:00 272 2014-10-02 10:29:00

1.3 Ping and trip data visualization

The figure above shows the data aggregation process. The x-axis shows the data and time of pings, and the y-axis presents the speed of the ping (miles per hour, MPH). Each point represented a ping at that date and time, with different colors indicating the real-time speed category. Whenever the truck stopped (the grey points) for at least 30 minutes, the pings were separated into different trips, indicated by the purple arrows in the bottom (Trip 1, Trip 2, \(\ldots\), Trip 6). The trip time was then calculated by taking the difference between the trip end time and start time.

2 Statistical modeling

2.1 Bayesian NB regression using rstanarm

The code below shows the code for Bayesian NB regression models. For demonstration purpose, we only use the first 1,000 observations of the data, 1 Markov chain with 1,000 iterations and the first 500 of them are warm-up iterations.

The table above shows the incident rate ratios of the Bayesian NB regression, as well as their 95% credible intervals. The Bayesian interpretation is that there is 95% chance that the IRRs were within these intervals.

2.2 Model comparison and diagnostics using loo

The code above will give a figure showing the posterior predictive checks, which is a measure of the prediction accuracy. It compares the observed data to 100 replicated datasets generated from the posterior parameters distributions. For each simulated dataset, the proportion of zero crashes was computed, and the blue histograms shows the simulated distribution of the proportions. The black solid vertical lines are the observed proportion of zero crashes in observed data. When the observed proportion (black solid line) is near the center of the plot, it demonstrates good model fit.

The above block shows the expected log predicted density (elpd_loo), estimate number of parameters (p_loo), and the LOO Information Criterion (looic) for a new dataset from Pareto smoothed importance-sampling leave-one-out (PSIS-LOO) cross-validation (CV). They can be used to check the goodness-of-fit of and compare different models (Vehtari, Gelman, and Gabry 2017, 2015). In a well-specified model, p_loo should be close to the total number of parameters. Similar to the Akaike information criterion (AIC) and Bayesian information criterion (BIC), the looic in the output can be used to compare different models, with lower values indicating better models.

With two model fits fit and fit_new above, researchers can compare the model fit using compare() from the loo package, as shown above. It compares the expected predictive accuracy by the difference in elpd_loo, with positive difference elpd_diff suggesting the second model while negative difference favoring the first model.

Session information

The R session information when building this website is shown below:

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## system code page: 936
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] corrgram_1.13     scales_1.1.0      corrplot_0.84     psych_1.9.12.31  
## [5] kableExtra_1.1.0  lubridate_1.7.8   ggplot2_3.3.0     dplyr_0.8.5      
## [9] data.table_1.12.8
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6       lattice_0.20-41    gtools_3.8.2       assertthat_0.2.1  
##  [5] digest_0.6.25      foreach_1.5.0      R6_2.4.1           evaluate_0.14     
##  [9] highr_0.8          httr_1.4.1         pillar_1.4.3       gplots_3.0.3      
## [13] rlang_0.4.5        rstudioapi_0.11    gdata_2.18.0       rmarkdown_2.1     
## [17] webshot_0.5.2      readr_1.3.1        stringr_1.4.0      munsell_0.5.0     
## [21] compiler_4.0.0     xfun_0.13          pkgconfig_2.0.3    mnormt_1.5-7      
## [25] htmltools_0.4.0    tidyselect_1.0.0   gridExtra_2.3      tibble_3.0.1      
## [29] seriation_1.2-8    codetools_0.2-16   dendextend_1.13.4  viridisLite_0.3.0 
## [33] crayon_1.3.4       withr_2.2.0        MASS_7.3-51.5      bitops_1.0-6      
## [37] grid_4.0.0         nlme_3.1-147       gtable_0.3.0       lifecycle_0.2.0   
## [41] registry_0.5-1     pacman_0.5.1       magrittr_1.5       KernSmooth_2.23-16
## [45] stringi_1.4.6      farver_2.0.3       viridis_0.5.1      xml2_1.3.2        
## [49] ellipsis_0.3.0     generics_0.0.2     vctrs_0.2.4        iterators_1.0.12  
## [53] tools_4.0.0        glue_1.4.0         purrr_0.3.4        gclus_1.3.2       
## [57] hms_0.5.3          parallel_4.0.0     yaml_2.2.1         colorspace_1.4-1  
## [61] cluster_2.1.0      caTools_1.18.0     TSP_1.1-10         rvest_0.3.5       
## [65] knitr_1.28

Reference

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.

———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and Waic.” Statistics and Computing 27 (5): 1413–32.