Columbia University
2/21/23
Correlation Coefficients
Law of Large Numbers
The Central Limit Theorem
Hypothesis Testing
\(Cor(X, Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)*Var(Y)}} = \frac{\frac{1}{n-1}\sum_{i=1}^n(X-\bar{X})(Y-\bar{Y})}{\sqrt{\frac{1}{n-1}\sum_{i=1}^n(X-\bar{X})^2 *\frac{1}{n-1} \sum_{i=1}^n(Y-\bar{Y})^2}}\)
Equivalently:
\(Cor(X,Y) = \frac{\sum_{i=1}^n (X-\bar{X})*(Y-\bar{Y})}{ \sqrt{\sum_{i=1}^n(X-\bar{X})^2 *\sum_{i=1}^n(Y-\bar{Y})^2}}\)
As \(n \to \infty\), sample mean approaches true population mean:
nominate <- read_csv("~/Downloads/HSall_members.csv") %>%
filter(congress==118&party_code%in%c(100, 200)&
chamber!="President")
dem <- nominate %>% filter(party_code==100) %>% drop_na(nominate_dim1)
rep <- nominate%>% filter(party_code==200) %>% drop_na(nominate_dim1)
diff_in_means <- mean(rep$nominate_dim1) - mean(dem$nominate_dim1)
denominator <- sqrt((var(rep$nominate_dim1, na.rm = T)/nrow(rep)) +
(var(dem$nominate_dim1)/nrow(dem)))
z_score <- diff_in_means/denominator
round(z_score, 3)
[1] 73.525
Often use the t distribution instead of normal distribution
t-distribution places more probability in the tails
In large samples, the t-distribution is equivalent to the normal distribution
t.test()
function
Welch Two Sample t-test
data: rep$nominate_dim1 and dem$nominate_dim1
t = 73.525, df = 500.98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.8825648 0.9310273
sample estimates:
mean of x mean of y
0.5222045 -0.3845916
We often want some idea of uncertainty in our estimates
Use Central Limit Theorem to construct “confidence intervals” around our estimates
Lower End: sample estimate - \(qnorm(0.975)*\)Standard Error
Upper End: sample estimate + \(qnorm(0.975)*\)Standard Error
Central Limit Theorem (and Law of Large Numbers) central to many scientific tasks
Used for calculating p-values, hypothesis testing, and constructing confidence intervals
p-value: probability of observing a Z-score/t-statistic at least as large as the one actually observed if the null hypothesis is true
Statistical Relationships