U.S Tornadoes

Author

jabir

Data

This data-set is about the tornadoes that effected the U.S from 1950-2022 showing their magnitudes and other useful information

Data Dictionary

variable class description
om integer Tornado number. Effectively an ID for this tornado in this year.
yr integer Year, 1950-2022.
mo integer Month, 1-12.
dy integer Day of the month, 1-31.
date date Date.
time time Time.
tz character Canonical tz database timezone.
datetime_utc datetime Date and time normalized to UTC.
st character Two-letter postal abbreviation for the state (DC = Washington, DC; PR = Puerto Rico; VI = Virgin Islands).
stf integer State FIPS (Federal Information Processing Standards) number.
mag integer Magnitude on the F scale (EF beginning in 2007). Some of these values are estimated (see fc).
inj integer Number of injuries. When summing for state totals, use sn == 1 (see below).
fat integer Number of fatalities. When summing for state totals, use sn == 1 (see below).
loss double Estimated property loss information in dollars. Prior to 1996, values were grouped into ranges. The reported number for such years is the maximum of its range.
slat double Starting latitude in decimal degrees.
slon double Starting longitude in decimal degrees.
elat double Ending latitude in decimal degrees.
elon double Ending longitude in decimal degrees.
len double Length in miles.
wid double Width in yards.
ns integer Number of states affected by this tornado. 1, 2, or 3.
sn integer State number for this row. 1 means the row contains the entire track information for this state, 0 means there is at least one more entry for this state for this tornado (om + yr).
f1 integer FIPS code for the 1st county.
f2 integer FIPS code for the 2nd county.
f3 integer FIPS code for the 3rd county.
f4 integer FIPS code for the 4th county.
fc logical Was the mag column estimated?

Packages

packages(“ggplot2”)

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(maps)

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
library(mapproj)
library(usmap)
library(readr)
library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(RColorBrewer)
library(esquisse)

Explorations

What are the states most at risk?

Texas is most at risk for tornadoes and some secondary states like florida.

#reading in the data
tornado <- read_csv("tornado.csv")
Rows: 68693 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (2): tz, st
dbl  (21): om, yr, mo, dy, stf, mag, inj, fat, loss, slat, slon, elat, elon,...
lgl   (1): fc
dttm  (1): datetime_utc
date  (1): date
time  (1): time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_states <- map_data("state")

tornfixed <- tornado |>
  select(st,ns) |>
  mutate(region = recode(st,
  "AL" = "alabama",
  "AK" = "alaska",
  "AZ" = "arizona",
  "AR" = "arkansas",
  "CA" = "california",
  "CO" = "colorado",
  "CT" = "connecticut",
  "DE" = "delaware",
  "FL" = "florida",
  "GA" = "georgia",
  "HI" = "hawaii",
  "ID" = "idaho",
  "IL" = "illinois",
  "IN" = "indiana",
  "IA" = "iowa",
  "KS" = "kansas",
  "KY" = "kentucky",
  "LA" = "louisiana",
  "ME" = "maine",
  "MD" = "maryland",
  "MA" = "massachusetts",
  "MI" = "michigan",
  "MN" = "minnesota",
  "MS" = "mississippi",
  "MO" = "missouri",
  "MT" = "montana",
  "NE" = "nebraska",
  "NV" = "nevada",
  "NH" = "new hampshire",
  "NJ" = "new jersey",
  "NM" = "new mexico",
  "NY" = "new york",
  "NC" = "north carolina",
  "ND" = "north dakota",
  "OH" = "ohio",
  "OK" = "oklahoma",
  "OR" = "oregon",
  "PA" = "pennsylvania",
  "RI" = "rhode island",
  "SC" = "south carolina",
  "SD" = "south dakota",
  "TN" = "tennessee",
  "TX" = "texas",
  "UT" = "utah",
  "VT" = "vermont",
  "VA" = "virginia",
  "WA" = "washington",
  "WV" = "west virginia",
  "WI" = "wisconsin",
  "WY" = "wyoming",
  "DC" = "district of columbia"
)) |>
  select(region,ns)

torn_totals <- tornfixed |>
  group_by(region) |>
  summarize(n_tornadoes = sum(ns))


map <- left_join(us_states,torn_totals)
Joining with `by = join_by(region)`
map |>
  ggplot(aes(long, lat, group = subregion)) +
  geom_map(
    aes(map_id = region),
    map = us_states
  ) +
  geom_polygon(aes(group = group, fill = n_tornadoes), color = "black") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red") +
  coord_quickmap()

What state usually have the most tornadoes?

It looks like Texas has had the most tornadoes over the years

most_tornadoes <- tornado |>
  select(st,ns) |>
  group_by(st) |>
  summarise(n = n()) |>
  arrange(desc(n)) |>
  top_n(10)
Selecting by n
most_tornadoes |>
  ggplot(aes(x = reorder(st, -n), y = n)) +
  geom_bar(stat = "identity", fill = "#d16002") +
  labs(
    title = "States with the most tornadoes over the years",
    x = "states in abbreviations",
    y = "number of tornadoes"
  ) +
  theme_dark()

But i want to see which state has had the most tornadoes in the past 5 years?

Texas is in the top 10 in 3 of the past 5 years Texas has the most in 2019,2022 and 2020.

past5 <- tornado |>
  select(st,ns,yr) |>
  filter(yr %in% c("2018","2019","2020","2021","2022"), st %in% c("TX","OK","IA","MS","AL")) |> 
  group_by(st,yr) |>
  summarise(n = n()) |>
  arrange(desc(n))
`summarise()` has grouped output by 'st'. You can override using the `.groups`
argument.
past5 |>
  ggplot(aes(x = n, y = yr, color = st)) +
  geom_line() +
  labs(
    title = "Most Tornadoes In The Past 5 Years",
    subtitle = "Showing the Unpredictability of Tornadoes",
    x = "Number of Tornadoes",
    y = "year"
  ) +
  theme_classic()

past5 |>
  ggplot(aes(x = st, y = n, color = yr)) +
  geom_point()

past5 |>
  ggplot(aes(x = st,y = n, fill = yr)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Tornadoes over The Past 5 Years",
    subtitle = "showing the impact more clearly",
    x = "states",
    y = "Number of Tornadoes"
  )

What was the most dangerous tornado in the U.S between 1950-2022?

The tornado that was the most dangerous from 1950 to 2022 was the tornado in 2011 that happened in Missouri there were 158 fatalities the tornado was called Joplin.

tornado |>
  select(st,mag,yr,fat) |>
  filter(mag %in% "5",) |>
  group_by(fat) |>
  arrange(desc(fat))
# A tibble: 59 × 4
# Groups:   fat [37]
   st      mag    yr   fat
   <chr> <dbl> <dbl> <dbl>
 1 MO        5  2011   158
 2 MI        5  1953   116
 3 TX        5  1953   114
 4 OK        5  1955    80
 5 AL        5  2011    72
 6 MS        5  1966    58
 7 LA        5  1971    47
 8 KS        5  1957    44
 9 MS        5  1953    38
10 OH        5  1974    36
# ℹ 49 more rows