Initilization

The snippet below documents the list of R packages and functions that were used in this research. For convenience, we used the pacman package since it allows for installing/loading the needed packages in one step. Please make sure that the package is installed on your system using the command install.packages("pacman") before running this code chunk.

rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
# p_load is equivalent to combining both install.packages() and library()
p_load(dataPreparation, DataExplorer, DT, tidyverse, MVN)

1 Useable Security

As an illustration of the statistical (process control) opportunities in improving the utilization of human-generated data for cyber security applications, let us examine the dataset from Mohamed and Saxena (2016).

1.1 Loading the Data into R

source("functions/functions.R") # to load data.types()
df <- read.csv("data/dcg_fetears_all.csv")

cat(paste("We have read the Full Dataset into a data.frame (df) titled df.", "The df consists of", nrow(df),"and",
          paste0(ncol(df),"."),"Additionally, R initially divides the columns of different types. We summarize these in the table below."))

We have read the Full Dataset into a data.frame (df) titled df. The df consists of 8651 and 67. Additionally, R initially divides the columns of different types. We summarize these in the table below.

# First tab, where we summarize the column types
cat(paste("###","Column Types","{-}","\n"))

Column Types

types <- data.types(df) # see functions.R file
types

cat("\n") # Printing a line break

## Second Tab (Missing Data)
cat(paste("###","Missing Data","{-}","\n"))

Missing Data

cat("In the plot below, we sample 40 columns at random from the dataset to show the actual percentage of the data that is missing for each variable. The colors are used to denote the data quality for that column using a traffic light scheme (where green is good and red is bad).")

In the plot below, we sample 40 columns at random from the dataset to show the actual percentage of the data that is missing for each variable. The colors are used to denote the data quality for that column using a traffic light scheme (where green is good and red is bad).

df.na.plot <- df[,sample(colnames(df),40)] %>% plot_missing()

## Third Tab (Clean Data)
cat(paste("###","Clean Data","{-}","\n"))

Clean Data

cat("Using the fastFilterVariables() from the dataPreparation R package, we can remove: (a) constant columns: they take the same value for every line; (b) double columns: they have an exact copy in the data set; and (c)bijection columns: there is another column containing the exact same information (but maybe coded differently) for example col1: Men/Women, col2 M/W. The results from this analysis is saved into a data frame titled: df.cleaned.")

Using the fastFilterVariables() from the dataPreparation R package, we can remove: (a) constant columns: they take the same value for every line; (b) double columns: they have an exact copy in the data set; and (c)bijection columns: there is another column containing the exact same information (but maybe coded differently) for example col1: Men/Women, col2 M/W. The results from this analysis is saved into a data frame titled: df.cleaned.

df.cleaned <- fastFilterVariables(df)

[1] “fastFilterVariables: I check for constant columns.” [1] “fastFilterVariables: I delete 1 constant column(s) in dataSet.” [1] “fastFilterVariables: I check for columns in double.” [1] “fastFilterVariables: I check for columns that are bijections of another column.” [1] “fastFilterVariables: I delete 15 column(s) that are bijections of another column in dataSet.”

cat(paste0("The data frame df.cleaned consists of ", ncol(df.cleaned), " columns. Note that the original data frame df had ", ncol(df), " columns."))

The data frame df.cleaned consists of 51 columns. Note that the original data frame df had 67 columns.

saveRDS(df.cleaned, "results/sec_cleaned.RDS")

1.2 Subsetting the Data and MVN Test

df.cleaned <- readRDS("results/sec_cleaned.RDS") %>% 
  subset(select = c(ID, speed_touch_mean, 
                    pause_and_drop_mean))
df.cleaned.num <- select_if(df.cleaned, is.numeric) %>% 
  slice(1:5000)
mvn(df.cleaned.num, mvnTest = "hz", univariatePlot = "qqplot", 
    multivariatePlot = "contour")$multivariateNormality

2 Statistical modeling

3 Summary

References

Mohamed, Manar, and Nitesh Saxena. 2016. “Gametrics: Towards Attack-Resilient Behavioral Authentication with Simple Cognitive Games.” In Proceedings of the 32nd Annual Conference on Computer Security Applications, 277–88. ACM.

Email: fmegahed@miamioh.edu | Phone: +1-513-529-4185 | Website: Miami University Official ↩
Email: farmerl2@miamioh.edu | Phone: +1-513-529-4823 | Website: Miami University Official ↩
Email: miao.cai@slu.edu | Phone: +1-314-326-8418 | Website: Saint Louis University ↩
Email: steve.rigdon@slu.edu | Phone: +1-314-977-8127 | Website: Saint Louis University Official ↩
Email: mohamem@miamioh.edu | Phone: +1-513-529-0346 | Website: Miami University Official ↩

Supplementary materials: A Statistical (Process Monitoring) Perspective on HumanPerformance Modeling in the Age of Cyber-Physical Systems

Fadel M. Megahed¹

Allison Jones-Farmer²

Miao Cai³

Steve Rigdon⁴

Manar Mohamed⁵

December 16, 2019

Initilization

1 Useable Security

1.1 Loading the Data into R

Column Types

Missing Data

Clean Data

1.2 Subsetting the Data and MVN Test

2 Statistical modeling

3 Summary

References

Supplementary materials: A Statistical (Process Monitoring) Perspective on HumanPerformance Modeling in the Age of Cyber-Physical Systems

Fadel M. Megahed1

Allison Jones-Farmer2

Miao Cai3

Steve Rigdon4

Manar Mohamed5

December 16, 2019

Initilization

1 Useable Security

1.1 Loading the Data into R

Column Types

Missing Data

Clean Data

1.2 Subsetting the Data and MVN Test

2 Statistical modeling

3 Summary

References

Fadel M. Megahed¹

Allison Jones-Farmer²

Miao Cai³

Steve Rigdon⁴

Manar Mohamed⁵