1.Introduction

As the most mass shooting country in the world, United States suffered at least 15 mass shooting in the past decade. According to Wikipedia, 305 people died and more than 1100 were injured in those shooting. On the night of October 1, 2017, a gunman opened fire on a crowd attending the final night of a country music festival in Las Vegas, killing 58 people and injuring more than 800. This shooting is the worst mass shooting in modern history of the United States.

The gun control issue has always been debated for a long time, probably ever since they were invented.However, the project isn’t about arguing the gun control issue or Second Amendment, the purpose of the project is only about to gain some statistic insights on these mass shooting incidents and get a better understanding of the relationships between characteristics of the shooters and the incidents by performing some spatial and statistical analysis on the mass shooting dataset from Kaggle. The dataset is also avaiable in my Github.

2.Methodology

1.Load all the required libraries

library(data.table)
library(leaflet)
library(lubridate)
library(magrittr)
library(maps)
library(plotly)
library(stringr)
library(tidyverse)

2.Load & take a glimpse of the dataset

shoot <- read.csv("C:/SPS/projects/Mass Shootings Dataset Ver 5.csv",header=TRUE,stringsAsFactors=FALSE)
head(shoot)
##   S.                               Title               Location       Date
## 1  1          Texas church mass shooting Sutherland Springs, TX  11/5/2017
## 2  2 Walmart shooting in suburban Denver           Thornton, CO  11/1/2017
## 3  3     Edgewood businees park shooting           Edgewood, MD 10/18/2017
## 4  4       Las Vegas Strip mass shooting          Las Vegas, NV  10/1/2017
## 5  5          San Francisco UPS shooting      San Francisco, CA  6/14/2017
## 6  6   Pennsylvania supermarket shooting        Tunkhannock, PA   6/7/2017
##                                 Incident.Area Open.Close.Location
## 1                                      Church               Close
## 2                                    Wal-Mart                Open
## 3                            Remodeling Store               Close
## 4 Las Vegas Strip Concert outside Mandala Bay                Open
## 5                                UPS facility               Close
## 6                                Weis grocery               Close
##      Target     Cause
## 1    random   unknown
## 2    random   unknown
## 3 coworkers   unknown
## 4    random   unknown
## 5 coworkers          
## 6 coworkers terrorism
##                                                                                                                                                                                                                                                                                                                                         Summary
## 1                                                                                                                                                                                    Devin Patrick Kelley, 26, an ex-air force officer, shot and killed 26 people and wounded 20 at a church in Texas. He was found dead later in his vehicle. 
## 2                  Scott Allen Ostrem, 47, walked into a Walmart in a suburb north of Denver and fatally shot two men and a woman, then left the store and drove away. After an all-night manhunt, Ostrem, who had financial problems but no serious criminal history, was captured by police after being spotted near his apartment in Denver.
## 3 Radee Labeeb Prince, 37, fatally shot three people and wounded two others around 9am at Advance Granite Solutions, a home remodeling business where he worked near Baltimore. Hours later he shot and wounded a sixth person at a car dealership in Wilmington, Delaware. He was apprehended that evening following a manhunt by authorities.
## 4                                                                                                                                     Stephen Craig Paddock, opened fire from the 32nd floor of Manadalay Bay hotel at Last Vegas concert goers for no obvious reason. He shot himself and died on arrival of law enforcement agents. He was 64
## 5                                                                                                                                                             Jimmy Lam, 38, fatally shot three coworkers and wounded two others inside a UPS facility in San Francisco. Lam killed himself as law enforcement officers responded to the scene.
## 6                                                                             Randy Stair, a 24-year-old worker at Weis grocery fatally shot three of his fellow employees. He reportedly fired 59 rounds with a pair of shotguns before turning the gun on himself as another co-worker fled the scene for help and law enforcement responded.
##   Fatalities Injured Total.victims Policeman.Killed Age Employeed..Y.N.
## 1         26      20            46                0  26              NA
## 2          3       0             3                0  47              NA
## 3          3       3             6                0  37              NA
## 4         59     527           585                1  64              NA
## 5          3       2             5                0  38               1
## 6          3       0             3               NA  24               1
##             Employed.at Mental.Health.Issues  Race Gender Latitude
## 1                                         No White      M       NA
## 2                                         No White      M       NA
## 3 Advance Granite Store                   No Black      M       NA
## 4                                    Unclear White      M 36.18127
## 5                                        Yes Asian      M       NA
## 6          Weis grocery              Unclear White      M       NA
##   Longitude
## 1        NA
## 2        NA
## 3        NA
## 4 -115.1341
## 5        NA
## 6        NA
glimpse(shoot)
## Observations: 323
## Variables: 21
## $ S.                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ Title                <chr> "Texas church mass shooting", "Walmart sh...
## $ Location             <chr> "Sutherland Springs, TX", "Thornton, CO",...
## $ Date                 <chr> "11/5/2017", "11/1/2017", "10/18/2017", "...
## $ Incident.Area        <chr> "Church", "Wal-Mart", "Remodeling Store",...
## $ Open.Close.Location  <chr> "Close", "Open", "Close", "Open", "Close"...
## $ Target               <chr> "random", "random", "coworkers", "random"...
## $ Cause                <chr> "unknown", "unknown", "unknown", "unknown...
## $ Summary              <chr> "Devin Patrick Kelley, 26, an ex-air forc...
## $ Fatalities           <int> 26, 3, 3, 59, 3, 3, 5, 3, 3, 5, 5, 3, 5, ...
## $ Injured              <int> 20, 0, 3, 527, 2, 0, 0, 0, 0, 6, 0, 3, 11...
## $ Total.victims        <int> 46, 3, 6, 585, 5, 3, 5, 3, 3, 11, 5, 6, 1...
## $ Policeman.Killed     <int> 0, 0, 0, 1, 0, NA, NA, 1, NA, NA, NA, 3, ...
## $ Age                  <chr> "26", "47", "37", "64", "38", "24", "45",...
## $ Employeed..Y.N.      <int> NA, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA, N...
## $ Employed.at          <chr> "", "", "Advance Granite Store", "", "", ...
## $ Mental.Health.Issues <chr> "No", "No", "No", "Unclear", "Yes", "Uncl...
## $ Race                 <chr> "White", "White", "Black", "White", "Asia...
## $ Gender               <chr> "M", "M", "M", "M", "M", "M", "M", "M", "...
## $ Latitude             <dbl> NA, NA, NA, 36.18127, NA, NA, NA, NA, NA,...
## $ Longitude            <dbl> NA, NA, NA, -115.13413, NA, NA, NA, NA, N...
summary(shoot)
##        S.           Title             Location             Date          
##  Min.   :  1.0   Length:323         Length:323         Length:323        
##  1st Qu.: 81.5   Class :character   Class :character   Class :character  
##  Median :162.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :162.0                                                           
##  3rd Qu.:242.5                                                           
##  Max.   :323.0                                                           
##                                                                          
##  Incident.Area      Open.Close.Location    Target         
##  Length:323         Length:323          Length:323        
##  Class :character   Class :character    Class :character  
##  Mode  :character   Mode  :character    Mode  :character  
##                                                           
##                                                           
##                                                           
##                                                           
##     Cause             Summary            Fatalities        Injured       
##  Length:323         Length:323         Min.   : 0.000   Min.   :  0.000  
##  Class :character   Class :character   1st Qu.: 1.000   1st Qu.:  1.000  
##  Mode  :character   Mode  :character   Median : 3.000   Median :  3.000  
##                                        Mean   : 4.437   Mean   :  6.176  
##                                        3rd Qu.: 5.500   3rd Qu.:  5.000  
##                                        Max.   :59.000   Max.   :527.000  
##                                                                          
##  Total.victims    Policeman.Killed     Age            Employeed..Y.N. 
##  Min.   :  3.00   Min.   :0.0000   Length:323         Min.   :0.0000  
##  1st Qu.:  4.00   1st Qu.:0.0000   Class :character   1st Qu.:0.0000  
##  Median :  5.00   Median :0.0000   Mode  :character   Median :1.0000  
##  Mean   : 10.26   Mean   :0.1293                      Mean   :0.6269  
##  3rd Qu.:  9.00   3rd Qu.:0.0000                      3rd Qu.:1.0000  
##  Max.   :585.00   Max.   :5.0000                      Max.   :1.0000  
##                   NA's   :6                           NA's   :256     
##  Employed.at        Mental.Health.Issues     Race          
##  Length:323         Length:323           Length:323        
##  Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character     Mode  :character  
##                                                            
##                                                            
##                                                            
##                                                            
##     Gender             Latitude       Longitude      
##  Length:323         Min.   :21.33   Min.   :-161.79  
##  Class :character   1st Qu.:33.57   1st Qu.:-110.21  
##  Mode  :character   Median :36.44   Median : -88.12  
##                     Mean   :37.23   Mean   : -94.43  
##                     3rd Qu.:41.48   3rd Qu.: -81.70  
##                     Max.   :60.79   Max.   : -69.71  
##                     NA's   :20      NA's   :20

3.Reconstruct the dataset 3.1 Sepearte the Location into City and State two variables

shoot <- shoot %>% separate(Location,into = c("City","State"), sep = ", ",remove = FALSE)

From the warning messages, almost 15% of the samples in the column Location contain NA and the location of the incidents is not difficiult to figure out. So manually fill out those NA seems necessary.

shoot1 <- read.csv("C:/SPS/projects/Mass Shootings Dataset fixed.csv",header=TRUE,stringsAsFactors=FALSE)
shoot1 <- shoot1 %>% 
  separate(Location,into = c("City","State"), sep = ", ", remove = FALSE)

Abbreviations needed to be repleace by names of states.

pattern <- c("CA|CO|LA|MD|NV|PA|TX|WA|VA")
replacement <- c("California","Colorado","Louisiana","Maryland","Nevada","Pennsylvania","Texas","Washington","Virginia")
shoot1$State <- shoot1$State %>%
  str_replace_all(pattern = pattern, replacement = replacement)
shoot1$State <- shoot1$State %>%
  str_replace_all(c("Texas "="Texas"," Virginia"="Virginia"))

3.2 Extract year and month from column Date

shoot1 <- shoot1 %>% 
  mutate(Date=mdy(shoot1$Date),year=year(Date))
shoot1 <- shoot1 %>% 
  mutate(month=month(Date))
shoot1 <- shoot1 %>%
  mutate(decade = case_when(year >=1960 & year<1970 ~ "1960s",
                            year >=1970 & year<1980 ~ "1970s",
                            year >=1980 & year<1990 ~ "1990s",
                            year >=1990 & year<2000 ~ "1990s",
                            year >=2000 & year<2010 ~ "2000s",
                            year >=2010 & year<2020 ~ "2010s"))

3.3 Deal with column Mental.Health.Issues

shoot1$Mental.Health.Issues <- if_else(shoot1$Mental.Health.Issues=="unknown","Unknown",shoot1$Mental.Health.Issues)

3.4 Deal with column Race

shoot1$Race <- if_else(str_detect(shoot1$Race,"Black American or African American"),"Black",shoot1$Race)
shoot1$Race <- if_else(str_detect(shoot1$Race,"White American or European American"),"White",shoot1$Race)
shoot1$Race <- if_else(str_detect(shoot1$Race,"Asian American"),"Asian",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Some other race","Other",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Two or more races","Other",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Native American or Alaska Native","Native",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "","Other",shoot1$Race)
shoot1$Race <- str_to_upper(shoot1$Race)

3.5 Column Gender abbreviations needed to be repleace

shoot1$Gender <- if_else(shoot1$Gender=="M","Male",shoot1$Gender)
shoot1$Gender <- if_else(shoot1$Gender=="F","Female",shoot1$Gender)
shoot1$Gender <- if_else(shoot1$Gender=="M/F","Male/Female",shoot1$Gender)

3.6 Deal with column Cause

shoot1$Cause <- if_else(shoot1$Cause=="","unknown",shoot1$Cause)

3.7 Deal with column Age

temp <- shoot1 %>%
  separate_rows(Age,sep=",") %>%
  select(Age) %<>%
  mutate(Age = cut(as.integer(Age),breaks = c(10,20,30,40,50,60,70),
                   labels=c("10-20","20-30","30-40","40-50","50-60","60-70"))) %>%
  na.omit()

4 Exploratory Data Analysis 4.1 Analyze the trend by year Victims by year

victims_year <- shoot1 %>%
  group_by(year) %>%
  summarise(total=sum(Total.victims)) %>%
  ggplot(aes(x=year,y=total))+
  geom_bar(stat = 'identity',fill='blue')+
  labs(title = "US Mass Shooting Victims from 1966 to 2017",
       xlab = "year", ylab = "Number of Victims")
ggplotly(victims_year)

Incidents by year

incidents_year <- shoot1 %>%
  group_by(year) %>%
  count() %>%
  ggplot(aes(x=year,y=n))+
  geom_bar(stat = 'identity',fill='blue')+
  labs(title = "US Mass Shooting Incidents from 1966 to 2017")+
  xlab("Year")+
  ylab("Number of Incidents")
ggplotly(incidents_year)

Incidents by month

incidents_month <- shoot1 %>%
  group_by(month) %>%
  count() %>%
  ggplot(aes(x=factor(month),y=n))+
  geom_bar(stat = 'identity',fill='blue')+
  labs(title = "Incidents happened by month ")+
  xlab("Month")+
  ylab("Number of Incidents")
ggplotly(incidents_month)

4.2 Analyze the distribution of the incidents The 10 states have the most incidents

incidents_states <- shoot1 %>%
  group_by(State) %>%
  count() %>%
  arrange(desc(n)) %>%
  head(10) %>%
  ggplot(aes(x=reorder(State,n),y=n))+
  geom_bar(stat = 'identity',aes(fill = State))+
  labs(title = "The 10 states have the most incidents")+
  xlab("State")+
  ylab("Number of Incidents")+
  theme(legend.position='none')+
  coord_flip()
ggplotly(incidents_states)

Analyze incidents in maps

incid_states <- shoot1 %>%
  group_by(State) %>%
  count() 
colnames(incid_states)<-c("region","Incidents")
incid_states$region <- str_to_lower(incid_states$region)

states <- map_data("state")  

comb <- states %>%
  left_join(incid_states,by="region")

incident_map <- comb %>% 
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group,fill = Incidents))+ 
  geom_point(data=shoot1,aes(x=Longitude,y=Latitude,size=Total.victims),color='red',alpha=0.5)+
  xlim(-130,-65)+
  ylim(25,50)
ggplotly(incident_map)

Further analyze total victims in maps

shoot1 %>%
  leaflet() %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  fitBounds(-124,30,-66,43) %>%
  addCircles(color = "blue",lng = ~Longitude,lat = ~Latitude,weight = 1,
             radius = ~sqrt(Total.victims) * 25000,popup = ~Summary)

Click to see details in maps

shoot1 %>%
  leaflet() %>%
  fitBounds(-124,30,-66,43) %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group="Dark") %>%
  addProviderTiles(providers$CartoDB.Positron, group="Light") %>%
  addLayersControl(baseGroups=c('Dark','Light')) %>%
  addTiles() %>%
  addMarkers(~Longitude, ~Latitude, 
             clusterOptions = markerClusterOptions(),
             popup = ~Summary,
             label = ~Location)

Analyze the characters of the shooters Race of the shooter

shooters_race <- shoot1 %>%
  group_by(Race) %>%
  summarise(total=sum(Total.victims)) %>%
  ggplot(aes(x=reorder(Race,total),y=total))+
  geom_bar(stat = 'identity',aes(fill = Race))+
  labs(title = 'Race of the shooter')+
  xlab("Race")+
  ylab("Number")+
  theme(legend.position='none')
ggplotly(shooters_race)

The Ratio of the race of the shooter

trend_race <- shoot1 %>%
  group_by(decade) %>%
  ggplot(aes(x=decade,fill=Race))+
  geom_histogram(stat="count",position="fill")+
  labs(title = "Ratio of the race")+
  xlab("Decade")+
  ylab("Ratio")
ggplotly(trend_race)

Gender of the shooter

shooters_gender <- shoot1 %>%
  group_by(Gender) %>%
  count() %>%
  ggplot(aes(x=reorder(Gender,n),y=n))+
  geom_bar(stat = 'identity',aes(fill = Gender))+
  labs(title = 'Gender of the shooter')+
  xlab("Gender")+
  ylab("Number")+
  coord_flip()+
  theme(legend.position='none')
ggplotly(shooters_gender)

Motive of the shooter

cause_shoot <- shoot1 %>%
  group_by(Cause) %>%
  count() %>%
  arrange(desc(n)) %>%
  ggplot(aes(x=reorder(Cause,n),y=n))+
  geom_bar(stat = 'identity',aes(fill = Cause))+
  labs(title = "Motive of the shooter")+
  xlab("Cause")+
  ylab("")+
  theme(legend.position='none')+
  coord_flip()
ggplotly(cause_shoot)

Mental Status of the shooter

shooters_mental <- shoot1 %>%
  group_by(Mental.Health.Issues) %>%
  count() %>%
  ggplot(aes(x=reorder(Mental.Health.Issues,n),y=n))+
  geom_bar(stat = 'identity',aes(fill = Mental.Health.Issues))+
  labs(title = 'Mental Status of the shooter')+
  xlab("Mental Status")+
  ylab("Number")+
  theme(legend.position='none')
ggplotly(shooters_mental)

Age of the shooter

age_shoot <- temp %>%
  group_by(Age) %>%
  count() %>%
  ggplot(aes(x=Age,y=n))+
  geom_bar(stat = 'identity',aes(fill = Age))+
  labs(title = "Age of the shooter")+
  xlab("Age")+
  ylab("Number of shooters")+
  theme(legend.position='none')
ggplotly(age_shoot)

Age Distribution

age_dis <- shoot1 %>%
  ggplot(aes(x=decade,y=as.numeric(Age),fill=decade))+
  geom_boxplot()+
  labs(title = "Age Distribution")+
  xlab("")+
  ylab("Age")+
  theme(legend.position='none')
ggplotly(age_dis)

Reference:

Samriddhi S. 2016. NYC Data Science Academy: Analyzing Mass Shooting in US. https://nycdatascience.com/blog/student-works/r-visualization/mass-shooting-us/

Bonnie Berkowitz, Denise Lu and Chris Alcantara. 2018. The Washington Post: The terrible numbers that grow with each mass shooting. https://www.washingtonpost.com/graphics/2018/national/mass-shootings-in-america/