01-Biostatistics

Review and Introduction

Dean Adams, Iowa State University

The Particulars:

Perspectives on Statistics

Inferential vs. Exploratory Statistics

Inferential vs. Exploratory Statistics

Some examples

Statistical Hypothesis Testing

Parametric Statistics: Basic Concepts

Parametric Distributions

Parametric statistical theory has generated numerous expected distributions for various parameters, which were derived from theory by considering:

  1. The type of data examined
  2. A model of the process that generates variation

Parametric Distributions: Example

A Panoply of Parametric Distributions

Over decades, many parametric distributions have been derived from theory

Each is used to evaluate statistical summary parameters from particular hypothesis tests

Empirical Sampling Distributions

Sampling distributions may be obtained in other ways (e.g., permutation)

Empirical Sampling Distributions

Sampling distributions may be obtained in other ways (e.g., permutation)

NOTE: the null hypothesis and expected values derived from it are extremely important for designing proper permutation tests

Parameter Estimation and Model Evaluation

Parametric statistical hypothesis testing is comprised of two distinct steps:

  1. Parameter Estimation: Here we fit the data to the model, and estimate parameters that summarize that fit. These are commonly in the form of model coefficients, which for linear models are regression parameters.

  2. Model Evaluation: Here we use statistical summary measures that summarize the fit of the data to the model.

Parameter Estimation and Model Evaluation

Parametric statistical hypothesis testing is comprised of two distinct steps:

  1. Parameter Estimation: Here we fit the data to the model, and estimate parameters that summarize that fit. These are commonly in the form of model coefficients, which for linear models are regression parameters.

  2. Model Evaluation: Here we use statistical summary measures that summarize the fit of the data to the model.

Type I and Type II Error

Don’t forget there is error in model evaluation & hypothesis testing!!

Statistical Power

Univariate Versus Multivariate Analyses

Distributional Tests

Distributions: Binomial

\[\small{Pr=}\left(\begin{array}{ccc}n \\x\end{array}\right)p^{x}q^{n-x}\] - \(n\) is the total # events, x is the # successes, and p & q are the probability of success and failure

Distributions: Poisson

Distributions: Chi-Square

Distributions: Normal

Distributions: F

Data Transformations

Measures of Central Tendency

A statistic summarizing the ‘typical’ location for a sample on the number line

Moment Statistics

Moment Statistics: deviations around the mean, raised to powers

Moment Statistics

Moment Statistics: deviations around the mean, raised to powers

Standard Deviation

\[\sigma=\sqrt{\frac{1}{n-1}\sum{\left(Y_i-\bar{Y}\right)^2}}\]

Roadmap of Inferential Statistics

Inferential Statistics: Calculations

Analysis of Variance (ANOVA)

If \(\small\mathbf{X}\) contains one or more categorical factors, the LM exemplifies a comparison of groups

\[\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]

Analysis of Variance (ANOVA): Example

Do male and female sparrows differ in total length?

##             Df      SS      MS     Rsq      F      Z Pr(>F)   
## bumpus$sex   1  187.49 187.491 0.10953 16.483 2.9719  0.001 **
## Residuals  134 1524.24  11.375 0.89047                        
## Total      135 1711.74                                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Intercept) bumpus$sexm 
##  157.979592    2.445696

Linear Regression

If \(\small\mathbf{X}\) contains one or more continuous variables, the LM exemplifies a regression analysis

\[\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]

Linear Regression: Example

Does sparrow wingspan (alar extent) covary with total length?

##            Df     SS      MS     Rsq      F      Z Pr(>F)   
## bumpus$TL   1 1964.7 1964.68 0.47744 122.43 6.1954  0.001 **
## Residuals 134 2150.3   16.05 0.52256                        
## Total     135 4115.0                                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Intercept)   bumpus$TL 
##   74.264953    1.071341

Logistic Regression

\[\ln{\left[\frac{p}{1-p}\right]}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]

-Note: above formulation is a linear regression of the logits of the proportions

Contingency Tables

Expanded Roadmap of Inferential Statistics

Basic Univariate Statistics: Student’s T-Test

Note: 2-sample \(\small{t}\)-test (\(\small{n_{1}=n_{2}}\)) yields equivalent results to 2-sample ANOVA

T-Test: Example

Does sparrow wingspan (alar extent) covary with total length?

## 
##  Welch Two Sample t-test
## 
## data:  bumpus$TL by bumpus$sex
## t = -3.9134, df = 89.245, p-value = 0.0001775
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
##  -3.687434 -1.203957
## sample estimates:
## mean in group f mean in group m 
##        157.9796        160.4253

Basic Univariate Statistics: Correlation

\[\small{r}_{ij}=\frac{cov_{ij}}{s_is_j}=\frac{\frac{1}{n-1}\sum(Y_i-\bar{Y}_i)(Y_j-\bar{Y}_j)}{\sqrt{\frac{1}{n-1}\sum(Y_i-\bar{Y}_i)^2\frac{1}{n-1}\sum(Y_j-\bar{Y}_j)^2}}=\frac{\sum(Y_i-\bar{Y}_i)(Y_j-\bar{Y}_j)}{\sqrt{\sum(Y_i-\bar{Y}_i)^2\sum(Y_j-\bar{Y}_j)^2}}\]

Correlation: Example

## 
##  Pearson's product-moment correlation
## 
## data:  bumpus$AE and bumpus$TL
## t = 11.065, df = 134, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5914289 0.7697695
## sample estimates:
##       cor 
## 0.6909709

Correlation Coefficient: Comments

\[\small{var}_i=\frac{\sum(Y_i-\bar{Y}_i)^2}{n-1}=\frac{\sum(Y_i-\bar{Y}_i)(Y_i-\bar{Y}_i)}{n-1}\]

\[\small{cov}_{ij}=\frac{\sum(Y_i-\bar{Y}_i)(Y_j-\bar{Y}_j)}{n-1}\]

Basic Univariate Statistics: Chi-Square