Section 2. Introduction to R

Sam Frederick

5/24/23

Last Section

Setting Working Directory to Course Folder
- setwd("/path/to/your/folder")
- RProjects

RScript and RMarkdown files

Beginning functions in R
- e.g., sum(), mean(), min(), max(), sqrt()

Last Section

Vectors
- c()

Objects
- x <- 1:3

Today’s Section

Types of Objects in R
Summarizing Data in One Variable
Working with Real Data in R

Today’s Section

Types of Objects in R
- Numeric
- Categorical
- Logical

Summarizing Data in One Variable
Working with Real Data in R

Numeric Data

Integers int type
Doubles
Ways of Summarizing (Univariate):
- Mean, median, min, max, range, IQR, standard deviation
- summary() function

Summary Statistics - Central Tendency: Mean

x <- c(1, 100, 7, 6, 5)
sum(x)/length(x)

[1] 23.8

mean(x)

[1] 23.8

Summary Statistics - Central Tendency: Median

Median
- arrange vector in numerical order
- find the middle value (50% above and 50% below)
- not susceptible to outliers like the mean/average
What’s the median of this vector?

x <- c(1, 100, 7,6,5)

median(x)

[1] 6

Summary Statistics: Measures of Spread

Standard Deviation
- Measures spread around mean
- Square root of the variance

Summary Statistics: Measures of Spread

var(x)

[1] 1819.7

sqrt(var(x))

[1] 42.65794

sd(x)

[1] 42.65794

Summary Statistics: Measures of Spread

Range (minimum, maximum)

range(x)

[1]   1 100

min(x)

[1] 1

max(x)

[1] 100

Summary Statistics: Measures of Spread

Interquartile Range (IQR)
- Arrange in numerical order
- Find values below which 25% and 75% of the data lie

quantile(x, prob = c(0.25, 0.75))

25% 75% 
  5   7

IQR(x)

[1] 2

Summary Statistics

summary() function
- min, max, median, mean, IQR, # of missing observations

summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     5.0     6.0    23.8     7.0   100.0

Tidyverse Digression

Install tidyverse

install.packages("tidyverse")

Load tidyverse for use

library(tidyverse)

Tidyverse Digression

Pipe Operator: x %>% function()
- Basically puts the object x into the function
- More like writing/reading left to right

Tidyverse Digression

Tibbles:
- Tidyverse version of data.frame
- A lot of helpful functions that perform various operations
  - Example: mutate() to create and change column(s)

df <- tibble(x = x, y = 1:5)
df <- df %>%
  mutate(z = 6:10)
df

# A tibble: 5 × 3
      x     y     z
  <dbl> <int> <int>
1     1     1     6
2   100     2     7
3     7     3     8
4     6     4     9
5     5     5    10

Categorical Data

Character chr data
Factors
Ways of Summarizing (Univariate):
- Tables
- Proportion Tables

Factors

Usually turn character data into factors for analysis
- factor()
R often turns these into dummy/indicator variables
- Indicator variables: take on a value of 1 if some condition is met, 0 otherwise
- e.g., Male (1 if individual identifies as a man, 0 otherwise)
Come in default order (i.e., alphabetical or numerical order)

Factors

factor(variable, levels = c(...), labels = c(...))
- levels argument:
  - must match exact spelling of categories
  - can be used to reorder the levels/categories
- labels argument:
  - doesn’t have to match spelling (can be anything)
  - must be same length as number of levels/categories
  - must be in the same order as the levels argument

Factors

grp <- c(rep("A", 3), rep("B", 6), rep("C", 8))
grp

 [1] "A" "A" "A" "B" "B" "B" "B" "B" "B" "C" "C" "C" "C" "C" "C" "C" "C"

grp <- factor(grp)
grp

 [1] A A A B B B B B B C C C C C C C C
Levels: A B C

grp <- factor(grp, levels = c("C", "B", "A"))
grp

 [1] A A A B B B B B B C C C C C C C C
Levels: C B A

grp <- factor(grp, 
              levels = c("C", "B", "A"), 
              labels = c("Group C", "Group B", "Group A"))
grp

 [1] Group A Group A Group A Group B Group B Group B Group B Group B Group B
[10] Group C Group C Group C Group C Group C Group C Group C Group C
Levels: Group C Group B Group A

Tables and Proportion Tables

table()
- Number of observations in each category

table(grp)

grp
Group C Group B Group A 
      8       6       3

prop.table()
- Proportion of total observations in each category

prop.table(table(grp))

grp
  Group C   Group B   Group A 
0.4705882 0.3529412 0.1764706

Working with Data in R

Download 2020 House elections results from Courseworks
- “Files” > “house2020_elections.csv”
Put file into course folder
Set working directory in R or open course RProject
Read file into R using tidyverse:

intro <- read_csv("intro_survey.csv")

Next Section

Types of Data in R
Working with Data in R