As the most mass shooting country in the world, United States suffered at least 15 mass shooting in the past decade. According to Wikipedia, 305 people died and more than 1100 were injured in those shooting. On the night of October 1, 2017, a gunman opened fire on a crowd attending the final night of a country music festival in Las Vegas, killing 58 people and injuring more than 800. This shooting is the worst mass shooting in modern history of the United States.
The gun control issue has always been debated for a long time, probably ever since they were invented.However, the project isn’t about arguing the gun control issue or Second Amendment, the purpose of the project is only about to gain some statistic insights on these mass shooting incidents and get a better understanding of the relationships between characteristics of the shooters and the incidents by performing some spatial and statistical analysis on the mass shooting dataset from Kaggle. The dataset is also avaiable in my Github.
1.Load all the required libraries
library(data.table)
library(leaflet)
library(lubridate)
library(magrittr)
library(maps)
library(plotly)
library(stringr)
library(tidyverse)
2.Load & take a glimpse of the dataset
shoot <- read.csv("C:/SPS/projects/Mass Shootings Dataset Ver 5.csv",header=TRUE,stringsAsFactors=FALSE)
head(shoot)
## S. Title Location Date
## 1 1 Texas church mass shooting Sutherland Springs, TX 11/5/2017
## 2 2 Walmart shooting in suburban Denver Thornton, CO 11/1/2017
## 3 3 Edgewood businees park shooting Edgewood, MD 10/18/2017
## 4 4 Las Vegas Strip mass shooting Las Vegas, NV 10/1/2017
## 5 5 San Francisco UPS shooting San Francisco, CA 6/14/2017
## 6 6 Pennsylvania supermarket shooting Tunkhannock, PA 6/7/2017
## Incident.Area Open.Close.Location
## 1 Church Close
## 2 Wal-Mart Open
## 3 Remodeling Store Close
## 4 Las Vegas Strip Concert outside Mandala Bay Open
## 5 UPS facility Close
## 6 Weis grocery Close
## Target Cause
## 1 random unknown
## 2 random unknown
## 3 coworkers unknown
## 4 random unknown
## 5 coworkers
## 6 coworkers terrorism
## Summary
## 1 Devin Patrick Kelley, 26, an ex-air force officer, shot and killed 26 people and wounded 20 at a church in Texas. He was found dead later in his vehicle.
## 2 Scott Allen Ostrem, 47, walked into a Walmart in a suburb north of Denver and fatally shot two men and a woman, then left the store and drove away. After an all-night manhunt, Ostrem, who had financial problems but no serious criminal history, was captured by police after being spotted near his apartment in Denver.
## 3 Radee Labeeb Prince, 37, fatally shot three people and wounded two others around 9am at Advance Granite Solutions, a home remodeling business where he worked near Baltimore. Hours later he shot and wounded a sixth person at a car dealership in Wilmington, Delaware. He was apprehended that evening following a manhunt by authorities.
## 4 Stephen Craig Paddock, opened fire from the 32nd floor of Manadalay Bay hotel at Last Vegas concert goers for no obvious reason. He shot himself and died on arrival of law enforcement agents. He was 64
## 5 Jimmy Lam, 38, fatally shot three coworkers and wounded two others inside a UPS facility in San Francisco. Lam killed himself as law enforcement officers responded to the scene.
## 6 Randy Stair, a 24-year-old worker at Weis grocery fatally shot three of his fellow employees. He reportedly fired 59 rounds with a pair of shotguns before turning the gun on himself as another co-worker fled the scene for help and law enforcement responded.
## Fatalities Injured Total.victims Policeman.Killed Age Employeed..Y.N.
## 1 26 20 46 0 26 NA
## 2 3 0 3 0 47 NA
## 3 3 3 6 0 37 NA
## 4 59 527 585 1 64 NA
## 5 3 2 5 0 38 1
## 6 3 0 3 NA 24 1
## Employed.at Mental.Health.Issues Race Gender Latitude
## 1 No White M NA
## 2 No White M NA
## 3 Advance Granite Store No Black M NA
## 4 Unclear White M 36.18127
## 5 Yes Asian M NA
## 6 Weis grocery Unclear White M NA
## Longitude
## 1 NA
## 2 NA
## 3 NA
## 4 -115.1341
## 5 NA
## 6 NA
glimpse(shoot)
## Observations: 323
## Variables: 21
## $ S. <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ Title <chr> "Texas church mass shooting", "Walmart sh...
## $ Location <chr> "Sutherland Springs, TX", "Thornton, CO",...
## $ Date <chr> "11/5/2017", "11/1/2017", "10/18/2017", "...
## $ Incident.Area <chr> "Church", "Wal-Mart", "Remodeling Store",...
## $ Open.Close.Location <chr> "Close", "Open", "Close", "Open", "Close"...
## $ Target <chr> "random", "random", "coworkers", "random"...
## $ Cause <chr> "unknown", "unknown", "unknown", "unknown...
## $ Summary <chr> "Devin Patrick Kelley, 26, an ex-air forc...
## $ Fatalities <int> 26, 3, 3, 59, 3, 3, 5, 3, 3, 5, 5, 3, 5, ...
## $ Injured <int> 20, 0, 3, 527, 2, 0, 0, 0, 0, 6, 0, 3, 11...
## $ Total.victims <int> 46, 3, 6, 585, 5, 3, 5, 3, 3, 11, 5, 6, 1...
## $ Policeman.Killed <int> 0, 0, 0, 1, 0, NA, NA, 1, NA, NA, NA, 3, ...
## $ Age <chr> "26", "47", "37", "64", "38", "24", "45",...
## $ Employeed..Y.N. <int> NA, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA, N...
## $ Employed.at <chr> "", "", "Advance Granite Store", "", "", ...
## $ Mental.Health.Issues <chr> "No", "No", "No", "Unclear", "Yes", "Uncl...
## $ Race <chr> "White", "White", "Black", "White", "Asia...
## $ Gender <chr> "M", "M", "M", "M", "M", "M", "M", "M", "...
## $ Latitude <dbl> NA, NA, NA, 36.18127, NA, NA, NA, NA, NA,...
## $ Longitude <dbl> NA, NA, NA, -115.13413, NA, NA, NA, NA, N...
summary(shoot)
## S. Title Location Date
## Min. : 1.0 Length:323 Length:323 Length:323
## 1st Qu.: 81.5 Class :character Class :character Class :character
## Median :162.0 Mode :character Mode :character Mode :character
## Mean :162.0
## 3rd Qu.:242.5
## Max. :323.0
##
## Incident.Area Open.Close.Location Target
## Length:323 Length:323 Length:323
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Cause Summary Fatalities Injured
## Length:323 Length:323 Min. : 0.000 Min. : 0.000
## Class :character Class :character 1st Qu.: 1.000 1st Qu.: 1.000
## Mode :character Mode :character Median : 3.000 Median : 3.000
## Mean : 4.437 Mean : 6.176
## 3rd Qu.: 5.500 3rd Qu.: 5.000
## Max. :59.000 Max. :527.000
##
## Total.victims Policeman.Killed Age Employeed..Y.N.
## Min. : 3.00 Min. :0.0000 Length:323 Min. :0.0000
## 1st Qu.: 4.00 1st Qu.:0.0000 Class :character 1st Qu.:0.0000
## Median : 5.00 Median :0.0000 Mode :character Median :1.0000
## Mean : 10.26 Mean :0.1293 Mean :0.6269
## 3rd Qu.: 9.00 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :585.00 Max. :5.0000 Max. :1.0000
## NA's :6 NA's :256
## Employed.at Mental.Health.Issues Race
## Length:323 Length:323 Length:323
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Gender Latitude Longitude
## Length:323 Min. :21.33 Min. :-161.79
## Class :character 1st Qu.:33.57 1st Qu.:-110.21
## Mode :character Median :36.44 Median : -88.12
## Mean :37.23 Mean : -94.43
## 3rd Qu.:41.48 3rd Qu.: -81.70
## Max. :60.79 Max. : -69.71
## NA's :20 NA's :20
3.Reconstruct the dataset 3.1 Sepearte the Location into City and State two variables
shoot <- shoot %>% separate(Location,into = c("City","State"), sep = ", ",remove = FALSE)
From the warning messages, almost 15% of the samples in the column Location contain NA and the location of the incidents is not difficiult to figure out. So manually fill out those NA seems necessary.
shoot1 <- read.csv("C:/SPS/projects/Mass Shootings Dataset fixed.csv",header=TRUE,stringsAsFactors=FALSE)
shoot1 <- shoot1 %>%
separate(Location,into = c("City","State"), sep = ", ", remove = FALSE)
Abbreviations needed to be repleace by names of states.
pattern <- c("CA|CO|LA|MD|NV|PA|TX|WA|VA")
replacement <- c("California","Colorado","Louisiana","Maryland","Nevada","Pennsylvania","Texas","Washington","Virginia")
shoot1$State <- shoot1$State %>%
str_replace_all(pattern = pattern, replacement = replacement)
shoot1$State <- shoot1$State %>%
str_replace_all(c("Texas "="Texas"," Virginia"="Virginia"))
3.2 Extract year and month from column Date
shoot1 <- shoot1 %>%
mutate(Date=mdy(shoot1$Date),year=year(Date))
shoot1 <- shoot1 %>%
mutate(month=month(Date))
shoot1 <- shoot1 %>%
mutate(decade = case_when(year >=1960 & year<1970 ~ "1960s",
year >=1970 & year<1980 ~ "1970s",
year >=1980 & year<1990 ~ "1990s",
year >=1990 & year<2000 ~ "1990s",
year >=2000 & year<2010 ~ "2000s",
year >=2010 & year<2020 ~ "2010s"))
3.3 Deal with column Mental.Health.Issues
shoot1$Mental.Health.Issues <- if_else(shoot1$Mental.Health.Issues=="unknown","Unknown",shoot1$Mental.Health.Issues)
3.4 Deal with column Race
shoot1$Race <- if_else(str_detect(shoot1$Race,"Black American or African American"),"Black",shoot1$Race)
shoot1$Race <- if_else(str_detect(shoot1$Race,"White American or European American"),"White",shoot1$Race)
shoot1$Race <- if_else(str_detect(shoot1$Race,"Asian American"),"Asian",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Some other race","Other",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Two or more races","Other",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "Native American or Alaska Native","Native",shoot1$Race)
shoot1$Race <- if_else(shoot1$Race == "","Other",shoot1$Race)
shoot1$Race <- str_to_upper(shoot1$Race)
3.5 Column Gender abbreviations needed to be repleace
shoot1$Gender <- if_else(shoot1$Gender=="M","Male",shoot1$Gender)
shoot1$Gender <- if_else(shoot1$Gender=="F","Female",shoot1$Gender)
shoot1$Gender <- if_else(shoot1$Gender=="M/F","Male/Female",shoot1$Gender)
3.6 Deal with column Cause
shoot1$Cause <- if_else(shoot1$Cause=="","unknown",shoot1$Cause)
3.7 Deal with column Age
temp <- shoot1 %>%
separate_rows(Age,sep=",") %>%
select(Age) %<>%
mutate(Age = cut(as.integer(Age),breaks = c(10,20,30,40,50,60,70),
labels=c("10-20","20-30","30-40","40-50","50-60","60-70"))) %>%
na.omit()
4 Exploratory Data Analysis 4.1 Analyze the trend by year Victims by year
victims_year <- shoot1 %>%
group_by(year) %>%
summarise(total=sum(Total.victims)) %>%
ggplot(aes(x=year,y=total))+
geom_bar(stat = 'identity',fill='blue')+
labs(title = "US Mass Shooting Victims from 1966 to 2017",
xlab = "year", ylab = "Number of Victims")
ggplotly(victims_year)
Incidents by year
incidents_year <- shoot1 %>%
group_by(year) %>%
count() %>%
ggplot(aes(x=year,y=n))+
geom_bar(stat = 'identity',fill='blue')+
labs(title = "US Mass Shooting Incidents from 1966 to 2017")+
xlab("Year")+
ylab("Number of Incidents")
ggplotly(incidents_year)
Incidents by month
incidents_month <- shoot1 %>%
group_by(month) %>%
count() %>%
ggplot(aes(x=factor(month),y=n))+
geom_bar(stat = 'identity',fill='blue')+
labs(title = "Incidents happened by month ")+
xlab("Month")+
ylab("Number of Incidents")
ggplotly(incidents_month)
4.2 Analyze the distribution of the incidents The 10 states have the most incidents
incidents_states <- shoot1 %>%
group_by(State) %>%
count() %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(x=reorder(State,n),y=n))+
geom_bar(stat = 'identity',aes(fill = State))+
labs(title = "The 10 states have the most incidents")+
xlab("State")+
ylab("Number of Incidents")+
theme(legend.position='none')+
coord_flip()
ggplotly(incidents_states)
Analyze incidents in maps
incid_states <- shoot1 %>%
group_by(State) %>%
count()
colnames(incid_states)<-c("region","Incidents")
incid_states$region <- str_to_lower(incid_states$region)
states <- map_data("state")
comb <- states %>%
left_join(incid_states,by="region")
incident_map <- comb %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group,fill = Incidents))+
geom_point(data=shoot1,aes(x=Longitude,y=Latitude,size=Total.victims),color='red',alpha=0.5)+
xlim(-130,-65)+
ylim(25,50)
ggplotly(incident_map)
Further analyze total victims in maps
shoot1 %>%
leaflet() %>%
addProviderTiles(providers$OpenStreetMap) %>%
fitBounds(-124,30,-66,43) %>%
addCircles(color = "blue",lng = ~Longitude,lat = ~Latitude,weight = 1,
radius = ~sqrt(Total.victims) * 25000,popup = ~Summary)
Click to see details in maps
shoot1 %>%
leaflet() %>%
fitBounds(-124,30,-66,43) %>%
addProviderTiles(providers$CartoDB.DarkMatter, group="Dark") %>%
addProviderTiles(providers$CartoDB.Positron, group="Light") %>%
addLayersControl(baseGroups=c('Dark','Light')) %>%
addTiles() %>%
addMarkers(~Longitude, ~Latitude,
clusterOptions = markerClusterOptions(),
popup = ~Summary,
label = ~Location)
Analyze the characters of the shooters Race of the shooter
shooters_race <- shoot1 %>%
group_by(Race) %>%
summarise(total=sum(Total.victims)) %>%
ggplot(aes(x=reorder(Race,total),y=total))+
geom_bar(stat = 'identity',aes(fill = Race))+
labs(title = 'Race of the shooter')+
xlab("Race")+
ylab("Number")+
theme(legend.position='none')
ggplotly(shooters_race)
The Ratio of the race of the shooter
trend_race <- shoot1 %>%
group_by(decade) %>%
ggplot(aes(x=decade,fill=Race))+
geom_histogram(stat="count",position="fill")+
labs(title = "Ratio of the race")+
xlab("Decade")+
ylab("Ratio")
ggplotly(trend_race)
Gender of the shooter
shooters_gender <- shoot1 %>%
group_by(Gender) %>%
count() %>%
ggplot(aes(x=reorder(Gender,n),y=n))+
geom_bar(stat = 'identity',aes(fill = Gender))+
labs(title = 'Gender of the shooter')+
xlab("Gender")+
ylab("Number")+
coord_flip()+
theme(legend.position='none')
ggplotly(shooters_gender)
Motive of the shooter
cause_shoot <- shoot1 %>%
group_by(Cause) %>%
count() %>%
arrange(desc(n)) %>%
ggplot(aes(x=reorder(Cause,n),y=n))+
geom_bar(stat = 'identity',aes(fill = Cause))+
labs(title = "Motive of the shooter")+
xlab("Cause")+
ylab("")+
theme(legend.position='none')+
coord_flip()
ggplotly(cause_shoot)
Mental Status of the shooter
shooters_mental <- shoot1 %>%
group_by(Mental.Health.Issues) %>%
count() %>%
ggplot(aes(x=reorder(Mental.Health.Issues,n),y=n))+
geom_bar(stat = 'identity',aes(fill = Mental.Health.Issues))+
labs(title = 'Mental Status of the shooter')+
xlab("Mental Status")+
ylab("Number")+
theme(legend.position='none')
ggplotly(shooters_mental)
Age of the shooter
age_shoot <- temp %>%
group_by(Age) %>%
count() %>%
ggplot(aes(x=Age,y=n))+
geom_bar(stat = 'identity',aes(fill = Age))+
labs(title = "Age of the shooter")+
xlab("Age")+
ylab("Number of shooters")+
theme(legend.position='none')
ggplotly(age_shoot)
Age Distribution
age_dis <- shoot1 %>%
ggplot(aes(x=decade,y=as.numeric(Age),fill=decade))+
geom_boxplot()+
labs(title = "Age Distribution")+
xlab("")+
ylab("Age")+
theme(legend.position='none')
ggplotly(age_dis)
Samriddhi S. 2016. NYC Data Science Academy: Analyzing Mass Shooting in US. https://nycdatascience.com/blog/student-works/r-visualization/mass-shooting-us/
Bonnie Berkowitz, Denise Lu and Chris Alcantara. 2018. The Washington Post: The terrible numbers that grow with each mass shooting. https://www.washingtonpost.com/graphics/2018/national/mass-shootings-in-america/