[1] 5
[1] 4
[1] 9
[1] 82 89 92 75 74 99
Data Wrangling and Exploring Data with R
Overview
Exploratory Data Analysis
(Live)
Use R Projects because:
Variable names can’t include spaces and must start with a letter.
[1] 5
[1] 4
[1] 9
[1] 82 89 92 75 74 99
Tip
c()
is a function that ‘concatenates’ a vector. More on vectors in a bit!
Functions are pre-defined code that accomplish one specific task. A function has two components: (1) the name of the function; and (2) the input or ‘arguments’. The value returned is called the ‘output.’ Running or executing a function is called ‘calling’ a function.
[1] NA
[1] 10.33333
Types
Types
Types
Types
Factors are used for categorical, ‘ordinal’ variables.
Types
A vector is a one dimensional collection of elements of the same class.
[1] 94 83 79 55 65
[1] "apples" "carrots" "ice cream" "hot sauce"
[1] TRUE TRUE FALSE FALSE
Important
Be wary of R’s implicit coercion
Types
Elements can be accessed using ‘subscripts’ or ‘indices’, which are specified using brackets:
Types
A matrix is a two-dimensional collection of elements of the same type.
Types
A data frame is a table made of equal length vectors.
x y z
1 1 5 a
2 2 6 b
3 3 7 c
Types
Use $
to access columns (vectors) within a data frame.
Subscripts/indices for data frames are pairs specifying the row and column numbers.
Category | Operator | Operation | Example |
---|---|---|---|
Artithmetic | + | Addition | x + y |
Artithmetic | - | Subtraction | x - y |
Artithmetic | * | Multiplication | x * y |
Artithmetic | / | Division | x / y |
Artithmetic | ^ | Exponent | x ^ y |
Artithmetic | %% | Modulus (Remainder from division) | x %% y |
Comparison | == | Equal | x == y |
Comparison | != | Not equal | x != y |
Comparison | > | Greater than | x > y |
Comparison | < | Less than | x < y |
Comparison | >= | Greater than or equal to | x >= y |
Comparison | <= | Less than or equal to | x <= y |
Logical | & | AND | x & y |
Logical | | | OR | x | y |
Logical | ! | NOT | !(x > y) |
Logical | %in% | IN | x %in% y |
Sequence | : | Sequence | 1:10 |
Many of R’s operations are ‘vectorized’, meaning a given operation will operate on each element of a vector without explicit specification.
Packages extend R’s functionality beyond the functions available in the ‘base’ version.
Before you can use the functions from a package, the package must first be installed.
After a package has been installed, it can then be loaded into your session.
The Comprehensive R Archive Network serves as a repository for most packages (21,1145 as of August 2024).
“The tidyverse is an collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.”
Rows: 344 Columns: 8
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(RSocrata)
crashes <- read.socrata("https://data.sfgov.org/resource/dau3-4s8f.csv")
# crashes <- read_csv("https://data.sfgov.org/resource/dau3-4s8f.csv")
glimpse(crashes)
Rows: 287
Columns: 27
$ unique_id <int> 269, 286, 268, 291, 271, 263, 302, 116, 267, 262…
$ case_id_fkey <chr> "200252819", "200592740", "", "200704616", "2002…
$ latitude <dbl> 37.75243, 37.74968, 37.76478, 37.76578, 37.75243…
$ longitude <dbl> -122.3946, -122.3973, -122.4241, -122.4075, -122…
$ collision_year <int> 2020, 2020, 2020, 2020, 2020, 2020, 2021, 2016, …
$ death_date <dttm> 2020-04-21, 2020-10-02, 2020-03-17, 2020-12-07,…
$ death_time <chr> "19:35:00", "04:50:00", "21:44:00", "", "21:45:0…
$ death_datetime <dttm> 2020-04-21 19:35:00, 2020-10-02 04:50:00, 2020-…
$ collision_date <dttm> 2020-04-21, 2020-10-02, 2020-03-17, 2020-11-21,…
$ collision_time <chr> "17:30:00", "", "20:15:00", "19:31:00", "17:30:0…
$ collision_datetime <dttm> 2020-04-21 17:30:00, 2020-10-02 00:00:00, 2020-…
$ location <chr> "Dakota Street and 25th Street", "Cesar Chavez S…
$ age <int> 28, 55, 36, NA, 32, 49, 27, 72, 45, 67, 39, 66, …
$ sex <chr> "Female", "Male", "Male", "Male", "Female", "Mal…
$ deceased <chr> "Driver", "Pedestrian", "Motorcyclist", "Driver"…
$ collision_type <chr> "Motor Vehicle Collision", "Pedestrian vs Motor …
$ street_type <chr> "City Street", "City Street", "City Street", "Ci…
$ on_vz_hin_2017 <chr> "false", "true", "true", "true", "false", "true"…
$ in_coc_2018 <chr> "true", "false", "false", "true", "true", "true"…
$ publish <chr> "true", "true", "true", "true", "true", "true", …
$ on_vz_hin_2022 <chr> "true", "true", "true", "true", "false", "true",…
$ in_epa_2021 <chr> "false", "false", "false", "false", "false", "fa…
$ point <chr> "POINT (-122.394586978 37.752426801)", "POINT (-…
$ data_loaded_at <dttm> 2024-07-01 16:48:29, 2024-07-01 16:48:29, 2024-…
$ analysis_neighborhood <chr> "Bayview Hunters Point", "Bayview Hunters Point"…
$ supervisor_district <int> 10, 10, 8, 9, 10, 9, 7, 11, 6, 3, 9, 5, 5, 5, 7,…
$ police_district <chr> "BAYVIEW", "BAYVIEW", "MISSION", "MISSION", "BAY…
# A tibble: 344 × 4
species island sex body_mass_g
<fct> <fct> <fct> <int>
1 Adelie Torgersen male 3750
2 Adelie Torgersen female 3800
3 Adelie Torgersen female 3250
# … with 341 more rows
# A tibble: 344 × 4
species island bill_length_mm bill_depth_mm
<fct> <fct> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7
2 Adelie Torgersen 39.5 17.4
3 Adelie Torgersen 40.3 18
# … with 341 more rows
# A tibble: 344 × 4
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<dbl> <dbl> <int> <int>
1 39.1 18.7 181 3750
2 39.5 17.4 186 3800
3 40.3 18 195 3250
# … with 341 more rows
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Dream 32.1 15.5 188 3050 fema… 2009
2 Adelie Dream 33.1 16.1 178 2900 fema… 2008
3 Adelie Torgersen 33.5 19 190 3600 fema… 2008
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_le…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Gentoo Biscoe 59.6 17 230 6050 male 2007
2 Chinstrap Dream 58 17.8 181 3700 fema… 2007
3 Gentoo Biscoe 55.9 17 228 5600 male 2009
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² Sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 8
genus isle bill_length_mm bill_depth_mm flipper_le…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 3 × 1
sex
<fct>
1 male
2 female
3 <NA>
# A tibble: 3 × 1
island
<fct>
1 Torgersen
2 Biscoe
3 Dream
# A tibble: 5 × 2
island species
<fct> <fct>
1 Torgersen Adelie
2 Biscoe Adelie
3 Dream Adelie
# … with 2 more rows
# A tibble: 165 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
2 Adelie Torgersen 40.3 18 195 3250 fema… 2007
3 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
# … with 162 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 81 × 8
species island bill_length_mm bill_depth_mm flipper_leng…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Gentoo Biscoe 50 16.3 230 5700 male 2007
2 Gentoo Biscoe 50 15.2 218 5700 male 2007
3 Gentoo Biscoe 47.6 14.5 215 5400 male 2007
# … with 78 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 22 × 8
species island bill_length_mm bill_depth_mm flipper_leng…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Gentoo Biscoe 45.4 14.6 211 4800 fema… 2007
2 Gentoo Biscoe 46.2 14.5 209 4800 fema… 2007
3 Gentoo Biscoe 45.1 14.5 215 5000 fema… 2007
# … with 19 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 2 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen NA NA NA NA <NA> 2007
2 Gentoo Biscoe NA NA NA NA <NA> 2009
# … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
# A tibble: 342 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 339 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 292 × 8
species island bill_length_mm bill_depth_mm flipper_leng…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Biscoe 37.8 18.3 174 3400 fema… 2007
2 Adelie Biscoe 37.7 18.7 180 3600 male 2007
3 Adelie Biscoe 35.9 19.2 189 3800 fema… 2007
# … with 289 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 292 × 8
species island bill_length_mm bill_depth_mm flipper_leng…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Biscoe 37.8 18.3 174 3400 fema… 2007
2 Adelie Biscoe 37.7 18.7 180 3600 male 2007
3 Adelie Biscoe 35.9 19.2 189 3800 fema… 2007
# … with 289 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 52 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 49 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 52 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 49 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 276 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 273 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 9
species island bill_length_mm bill_de…¹ flipp…² body_…³ sex year body_…⁴
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 8.27
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007 8.38
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007 7.16
# … with 341 more rows, and abbreviated variable names ¹bill_depth_mm,
# ²flipper_length_mm, ³body_mass_g, ⁴body_mass_lb
usa_penguins <- mutate(
penguins,
body_mass_lb = body_mass_g/453.6,
flipper_length_in = flipper_length_mm/25.4
)
select(usa_penguins, species, body_mass_lb, flipper_length_in)
# A tibble: 344 × 3
species body_mass_lb flipper_length_in
<fct> <dbl> <dbl>
1 Adelie 8.27 7.13
2 Adelie 8.38 7.32
3 Adelie 7.16 7.68
# … with 341 more rows
if_else()
. Otherwise, use case_when()
.# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <dbl> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
new_measurements <- mutate(penguins, new_body_mass_g = case_when(
island == "Biscoe" ~ body_mass_g - 50,
island == "Dream" ~ body_mass_g - 75,
island == "Torgersen" ~ body_mass_g - 100
)
)
select(new_measurements, island, body_mass_g, new_body_mass_g)
# A tibble: 344 × 3
island body_mass_g new_body_mass_g
<fct> <int> <dbl>
1 Torgersen 3750 3650
2 Torgersen 3800 3700
3 Torgersen 3250 3150
# … with 341 more rows
penguins_with_ids <- mutate(penguins, id = paste(island, species, sex, year, sep = "-"))
select(penguins_with_ids, island, species, sex, year, id)
# A tibble: 344 × 5
island species sex year id
<fct> <fct> <fct> <int> <chr>
1 Torgersen Adelie male 2007 Torgersen-Adelie-male-2007
2 Torgersen Adelie female 2007 Torgersen-Adelie-female-2007
3 Torgersen Adelie female 2007 Torgersen-Adelie-female-2007
# … with 341 more rows
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <chr> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 m 2007
2 Adelie Torgersen 39.5 17.4 186 3800 f 2007
3 Adelie Torgersen 40.3 18 195 3250 f 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <chr> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 Male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 Fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 Fema… 2007
# … with 341 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
# A tibble: 3 × 2
sex n
<fct> <int>
1 female 165
2 male 168
3 <NA> 11
# A tibble: 3 × 2
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
# A tibble: 8 × 3
sex species n
<fct> <fct> <int>
1 female Adelie 73
2 male Adelie 73
3 male Gentoo 61
4 female Gentoo 58
5 female Chinstrap 34
6 male Chinstrap 34
7 <NA> Adelie 6
8 <NA> Gentoo 5
# A tibble: 3 × 2
island n_island_dwellers
<fct> <int>
1 Biscoe 168
2 Dream 124
3 Torgersen 52
# A tibble: 2 × 2
`island == "Biscoe"` n
<lgl> <int>
1 FALSE 176
2 TRUE 168
# A tibble: 3 × 2
`body_mass_g < 3000` n
<lgl> <int>
1 FALSE 333
2 TRUE 9
3 NA 2
# A tibble: 1 × 1
mean_flipper_length
<dbl>
1 NA
summarize(
penguins,
mean_flipper_length = mean(flipper_length_mm, na.rm = TRUE),
mean_body_mass = mean(body_mass_g, na.rm = TRUE),
mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
)
# A tibble: 1 × 3
mean_flipper_length mean_body_mass mean_bill_length
<dbl> <dbl> <dbl>
1 201. 4202. 43.9
penguins_grouped_by_sex <- group_by(penguins, sex)
summarize(penguins_grouped_by_sex, mean_body_mass = mean(body_mass_g, na.rm = TRUE))
# A tibble: 3 × 2
sex mean_body_mass
<fct> <dbl>
1 female 3862.
2 male 4546.
3 <NA> 4006.
penguins_grouped_by_sex_and_species <- group_by(penguins, sex, species)
summarize(penguins_grouped_by_sex_and_species, mean_body_mass = mean(body_mass_g, na.rm = TRUE))
# A tibble: 8 × 3
# Groups: sex [3]
sex species mean_body_mass
<fct> <fct> <dbl>
1 female Adelie 3369.
2 female Chinstrap 3527.
3 female Gentoo 4680.
4 male Adelie 4043.
5 male Chinstrap 3939.
6 male Gentoo 5485.
7 <NA> Adelie 3540
8 <NA> Gentoo 4588.
# A tibble: 3 × 2
sex mean_body_mass
<fct> <dbl>
1 male 4546.
2 female 3862.
3 <NA> 4006.
We typically want to run numerous operations on a data frame, and saving the intermediate outputs as separate variables is tedious. The ‘pipe’ operator (%>%
or |>
), passes the output from one function directly into another.
Ctrl + M
; Mac: Cmd + M
Source: Air Traffic Passenger Statistics
air_traffic <- read.socrata("https://data.sfgov.org/resource/rkru-6vcg.csv")
# How many passengers deplaned from airlines with 'China' in their name?
air_traffic %>%
filter(
str_detect(operating_airline, "China"),
activity_type_code == "Deplaned"
) %>%
group_by(operating_airline) %>%
summarize(passengers = sum(passenger_count)) %>%
arrange(desc(passengers))
# How many flights for each operating airline in 2020?
air_traffic %>%
filter(
activity_period_start_date >= as.Date("2020-01-01") &
activity_period_start_date <= as.Date("2020-12-31")
) %>%
count(operating_airline, sort = TRUE, name = "flights") %>%
head()
If a row in ‘x’ or the left-hand side matches a row in ‘y’ or the right-hand side, the columns from the y table are joined to the x table.
# A tibble: 3 × 1
x
<int>
1 1
2 2
3 3
# A tibble: 2 × 2
x y
<dbl> <chr>
1 1 first
2 2 second
# A tibble: 3 × 2
x y
<dbl> <chr>
1 1 first
2 2 second
3 3 <NA>
If a row in ‘x’ or the left-hand side has multiple matches in ‘y’ or the right-hand side, all the matching rows in y will be joined to x.
# A tibble: 3 × 1
id
<int>
1 1
2 2
3 3
# A tibble: 3 × 2
code y
<dbl> <chr>
1 1 first
2 1 second
3 2 third
# A tibble: 4 × 2
id y
<dbl> <chr>
1 1 first
2 1 second
3 2 third
4 3 <NA>
penguins_2007 <- penguins %>% filter(year == 2007)
penguins_2008 <- penguins %>% filter(year == 2008)
nrow(penguins_2007)
[1] 110
[1] 114
# A tibble: 224 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
# … with 216 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
flights <- read_rds("data/flights.rds")
airlines <- read_rds("data/airlines.rds")
planes <- read_rds("data/planes.rds")
airports <- read_rds("data/airports.rds")
left_join(flights, airlines, by = join_by(carrier))
flights %>%
left_join(airports, join_by(dest == faa)) %>%
select(year, month, day, origin, dest, tzone)
flights %>%
inner_join(planes, join_by(tailnum)) %>%
select(flight, month, day, type, engine)
Reshape your data into something longer (increasing number of rows and decreasing the number of columns) or reshape your data into something wider (increasing the number of columns and decreasing the number of rows).
Rows: 18
Columns: 11
$ religion <chr> "Agnostic", "Atheist", "Buddhist", "Catholic", "D…
$ `<$10k` <dbl> 27, 12, 27, 418, 15, 575, 1, 228, 20, 19, 289, 29…
$ `$10-20k` <dbl> 34, 27, 21, 617, 14, 869, 9, 244, 27, 19, 495, 40…
$ `$20-30k` <dbl> 60, 37, 30, 732, 15, 1064, 7, 236, 24, 25, 619, 4…
$ `$30-40k` <dbl> 81, 52, 34, 670, 11, 982, 9, 238, 24, 25, 655, 51…
$ `$40-50k` <dbl> 76, 35, 33, 638, 10, 881, 11, 197, 21, 30, 651, 5…
$ `$50-75k` <dbl> 137, 70, 58, 1116, 35, 1486, 34, 223, 30, 95, 110…
$ `$75-100k` <dbl> 122, 73, 62, 949, 21, 949, 47, 131, 15, 69, 939, …
$ `$100-150k` <dbl> 109, 59, 39, 792, 17, 723, 48, 81, 11, 87, 753, 4…
$ `>150k` <dbl> 84, 74, 53, 633, 18, 414, 54, 78, 6, 151, 634, 42…
$ `Don't know/refused` <dbl> 96, 76, 54, 1489, 116, 1529, 37, 339, 37, 162, 13…
# A tibble: 180 × 3
religion income count
<chr> <chr> <dbl>
1 Agnostic <$10k 27
2 Agnostic $10-20k 34
3 Agnostic $20-30k 60
4 Agnostic $30-40k 81
5 Agnostic $40-50k 76
6 Agnostic $50-75k 137
7 Agnostic $75-100k 122
8 Agnostic $100-150k 109
# … with 172 more rows
adelie_males_on_torgersen_in_2007 <- penguins %>%
filter(
species == "Adelie",
sex == "male",
island == "Torgersen",
year == "2007"
) %>%
select(bill_length_mm:body_mass_g)
write_csv(adelie_males_on_torgersen_in_2007, "data/adelie_males_on_torgersen_in_2007.csv")
write_rds(adelie_males_on_torgersen_in_2007, "data/adelie_males_on_torgersen_in_2007.rds")
library(writexl)
write_xlsx(adelie_males_on_torgersen_in_2007, "data/adelie_males_on_torgersen_in_2007.xlsx")
library(gt)
penguins %>%
group_by(island, species, sex) %>%
summarize(
mean_body_mass = mean(body_mass_g, na.rm = TRUE)
) %>%
ungroup() %>%
drop_na(sex) %>%
pivot_wider(
names_from = sex,
values_from = mean_body_mass
) %>%
mutate(island = paste("On", island, "island")) %>%
rename(
Island = island,
Species = species,
Female = female,
Male = male
) %>%
gt(
groupname_col = "Island",
rowname_col = "Species"
) %>%
tab_style(
style = list(cell_text(align = "right")),
locations = cells_stub(rows = TRUE)
) %>%
tab_header(
title = "Penguin Body Mass",
subtitle = "Adult penguins near Palmer Station"
)
Penguin Body Mass | ||
Adult foraging penguins near Palmer Station | ||
Female | Male | |
---|---|---|
On Biscoe island | ||
Adelie | 3369.318 | 4050.000 |
Gentoo | 4679.741 | 5484.836 |
On Dream island | ||
Adelie | 3344.444 | 4045.536 |
Chinstrap | 3527.206 | 3938.971 |
On Torgersen island | ||
Adelie | 3395.833 | 4034.783 |
Stackoverflow
Posit Community
Twitter/X/Mastadon/BlueSky
CCSF Teams Channel
dplyr selection helpers:
starts_with()
/ends_with()
contains()
/matches()
first_col()
/last_col()
everything()
across()
where()
Reports and dashboards: Tutorial: Hello, Quarto
Everything about ggplot2: ggplot2: Elegant Graphics for Data Analysis
Everything about gt: Introduction to Creating gt tables
Writing good functions: Chapter 19, ‘Functions’, in R for Data Science
Working with databases: Chapter 21, ‘Databases’, in R for Data Science
Spatial Stuff: Geocomputation in R
Watch and learn from a pro: David Robinson’s Tidy Tuesday screencasts
Interactive JavaScript visualizations in R: htmlwidgets gallery
Interactive web applications: Welcome to Shiny
Package development: R packages
Automating data pipelines: The targets user manual
Make any chart: The R Graph Gallery
Give us your feedback! (Please respond to the survey sent out after class)