Section 2. Introduction to R

Sam Frederick

5/24/23

Last Section

  • Setting Working Directory to Course Folder
    • setwd("/path/to/your/folder")
    • RProjects
  • RScript and RMarkdown files
  • Beginning functions in R
    • e.g., sum(), mean(), min(), max(), sqrt()

Last Section

  • Vectors
    • c()
  • Objects
    • x <- 1:3

Today’s Section

  • Types of Objects in R

  • Summarizing Data in One Variable

  • Working with Real Data in R

Today’s Section

  • Types of Objects in R
    • Numeric
    • Categorical
    • Logical
  • Summarizing Data in One Variable

  • Working with Real Data in R

Numeric Data

  • Integers int type

  • Doubles

  • Ways of Summarizing (Univariate):

    • Mean, median, min, max, range, IQR, standard deviation
    • summary() function

Summary Statistics - Central Tendency: Mean

x <- c(1, 100, 7, 6, 5)
sum(x)/length(x)
[1] 23.8
mean(x)
[1] 23.8

Summary Statistics - Central Tendency: Median

  • Median
    • arrange vector in numerical order
    • find the middle value (50% above and 50% below)
    • not susceptible to outliers like the mean/average
  • What’s the median of this vector?
x <- c(1, 100, 7,6,5)
median(x)
[1] 6

Summary Statistics: Measures of Spread

  • Standard Deviation
    • Measures spread around mean
    • Square root of the variance

Summary Statistics: Measures of Spread

var(x)
[1] 1819.7
sqrt(var(x))
[1] 42.65794
sd(x)
[1] 42.65794

Summary Statistics: Measures of Spread

  • Range (minimum, maximum)
range(x)
[1]   1 100
min(x)
[1] 1
max(x)
[1] 100

Summary Statistics: Measures of Spread

  • Interquartile Range (IQR)
    • Arrange in numerical order
    • Find values below which 25% and 75% of the data lie
quantile(x, prob = c(0.25, 0.75))
25% 75% 
  5   7 
IQR(x)
[1] 2

Summary Statistics

  • summary() function
    • min, max, median, mean, IQR, # of missing observations
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     5.0     6.0    23.8     7.0   100.0 

Tidyverse Digression

  • Install tidyverse
install.packages("tidyverse")
  • Load tidyverse for use
library(tidyverse)

Tidyverse Digression

  • Pipe Operator: x %>% function()
    • Basically puts the object x into the function
    • More like writing/reading left to right

Tidyverse Digression

  • Tibbles:
    • Tidyverse version of data.frame
    • A lot of helpful functions that perform various operations
      • Example: mutate() to create and change column(s)
df <- tibble(x = x, y = 1:5)
df <- df %>%
  mutate(z = 6:10)
df
# A tibble: 5 × 3
      x     y     z
  <dbl> <int> <int>
1     1     1     6
2   100     2     7
3     7     3     8
4     6     4     9
5     5     5    10

Categorical Data

  • Character chr data

  • Factors

  • Ways of Summarizing (Univariate):

    • Tables
    • Proportion Tables

Factors

  • Usually turn character data into factors for analysis
    • factor()
  • R often turns these into dummy/indicator variables
    • Indicator variables: take on a value of 1 if some condition is met, 0 otherwise
    • e.g., Male (1 if individual identifies as a man, 0 otherwise)
  • Come in default order (i.e., alphabetical or numerical order)

Factors

  • factor(variable, levels = c(...), labels = c(...))
    • levels argument:
      • must match exact spelling of categories
      • can be used to reorder the levels/categories
    • labels argument:
      • doesn’t have to match spelling (can be anything)
      • must be same length as number of levels/categories
      • must be in the same order as the levels argument

Factors

grp <- c(rep("A", 3), rep("B", 6), rep("C", 8))
grp
 [1] "A" "A" "A" "B" "B" "B" "B" "B" "B" "C" "C" "C" "C" "C" "C" "C" "C"
grp <- factor(grp)
grp
 [1] A A A B B B B B B C C C C C C C C
Levels: A B C
grp <- factor(grp, levels = c("C", "B", "A"))
grp
 [1] A A A B B B B B B C C C C C C C C
Levels: C B A
grp <- factor(grp, 
              levels = c("C", "B", "A"), 
              labels = c("Group C", "Group B", "Group A"))
grp
 [1] Group A Group A Group A Group B Group B Group B Group B Group B Group B
[10] Group C Group C Group C Group C Group C Group C Group C Group C
Levels: Group C Group B Group A

Tables and Proportion Tables

  • table()
    • Number of observations in each category
table(grp)
grp
Group C Group B Group A 
      8       6       3 
  • prop.table()
    • Proportion of total observations in each category
prop.table(table(grp))
grp
  Group C   Group B   Group A 
0.4705882 0.3529412 0.1764706 

Working with Data in R

  • Download 2020 House elections results from Courseworks

    • “Files” > “house2020_elections.csv”
  • Put file into course folder

  • Set working directory in R or open course RProject

  • Read file into R using tidyverse:

intro <- read_csv("intro_survey.csv")

Next Section

  • Types of Data in R

  • Working with Data in R