Elements of Applied Biostatistics
Preface
0.1
Math
0.2
R and programming
Part I: R fundamentals
1
Getting Started – R Projects and R Markdown
1.1
R vs R Studio
1.2
Download and install R and R studio
1.3
Install R Markdown
1.4
Importing Packages
1.5
Create an R Studio Project for this textbook
1.5.1
Create an R Markdown file for this Chapter
1.5.2
Create a “fake-data” chunk
1.5.3
Create a “plot” chunk
1.5.4
Knit
2
Analyzing experimental data with a linear model
2.1
This text is about the estimation of treatment effects and the uncertainty in our estimates. This, raises the question, what is “an effect”?
Background physiology to the experiments in Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
Analyses for Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
2.2
useful functions
2.3
figure 2b – effect of ASK1 KO on growth (body weight)
2.3.1
figure 2b – import
2.3.2
figure 2b – exploratory plots
2.4
Figure 2c – Effect of ASK1 KO on final body weight
2.4.1
Figure 2c – import
2.4.2
Figure 2c – check own computation of weight change v imported value
2.4.3
Figure 2c – exploratory plots
2.4.4
Figure 2c – fit the model: m1 (lm)
2.4.5
Figure 2c – check the model: m1
2.4.6
Figure 2c – fit the model: m2 (gamma glm)
2.4.7
Figure 2c – check the model, m2
2.4.8
Figure 2c – inference from the model
2.4.9
Figure 2c – plot the model
2.4.10
Figure 2c – report
2.5
Figure 2d – Effect of ASK1 KO on glucose tolerance (whole curve)
2.5.1
Figure 2d – Import
2.5.2
Figure 2d – exploratory plots
2.5.3
Figure 2d – fit the model
2.5.4
Figure 2d – check the model
2.5.5
Figure 2d – inference
2.5.6
Figure 2d – plot the model
2.6
Figure 2e – Effect of ASK1 KO on glucose tolerance (summary measure)
2.6.1
Figure 2e – message the data
2.6.2
Figure 2e – exploratory plots
2.6.3
Figure 2e – fit the model
2.6.4
Figure 2e – check the model
2.6.5
Figure 2e – inference from the model
2.6.6
Figure 2e – plot the model
2.7
Figure 2f – Effect of ASK1 on glucose infusion rate
2.7.1
Figure 2f – import
2.7.2
Figure 2f – exploratory plots
2.7.3
Figure 2f – fit the model
2.7.4
Figure 2f – check the model
2.7.5
Figure 2f – inference
2.7.6
Figure 2f – plot the model
2.8
Figure 2g
2.8.1
Figure 2g – import
2.8.2
Figure 2g – exploratory plots
2.8.3
Figure 2g – fit the model
2.8.4
Figure 2g – check the model
2.8.5
Figure 2g – inference
2.8.6
Figure 2g – plot the model
2.9
Figure 2h
2.10
Figure 2i
2.11
Figure 2j
3
Data – Reading, Wrangling, and Writing
3.1
Learning from this chapter
3.2
Working in R
3.2.1
Importing data
3.3
Data wrangling
3.3.1
Reshaping data – Wide to long
3.3.2
Reshaping data – Transpose (turning the columns into rows)
3.3.3
Combining data
3.3.4
Subsetting data
3.3.5
Wrangling columns
3.3.6
Missing data
3.4
Saving data
3.5
Exercises
4
Plotting Models
4.1
Pretty good plots show the model and the data
4.1.1
Pretty good plot component 1: Modeled effects plot
4.1.2
Pretty good plot component 2: Modeled mean and CI plot
4.1.3
Combining Effects and Modeled mean and CI plots – an Effects and response plot.
4.2
Some comments on plot components
4.3
Working in R
4.3.1
Unpooled SE bars and confidence intervals
4.3.2
Adding bootstrap intervals
4.3.3
Adding modeled means and error intervals
4.3.4
Adding p-values
4.3.5
Adding custom p-values
4.3.6
Plotting two factors
4.3.7
Interaction plot
4.3.8
Plot components
Part II: Some Fundamentals of Statistical Modeling
5
Variability and Uncertainty (Standard Deviations, Standard Errors, Confidence Intervals)
5.1
The sample standard deviation vs. the standard error of the mean
5.1.1
Sample standard deviation
5.1.2
Standard error of the mean
5.2
Using Google Sheets to generate fake data to explore the standard error
5.2.1
Steps
5.3
Using R to generate fake data to explore the standard error
5.3.1
part I
5.3.2
part II - means
5.3.3
part III - how do SD and SE change as sample size (n) increases?
5.3.4
Part IV – Generating fake data with for-loops
5.4
Bootstrapped standard errors
5.4.1
An example of bootstrapped standard errors using vole data
5.5
Confidence Interval
5.5.1
Interpretation of a confidence interval
Part III: Introduction to Linear Models
6
An Introduction to Statistical Modeling
6.1
This text is about the estimation of treatment effects and the uncertainty in our estimates. This, raises the question, what is “an effect”?
6.2
An introduction to linear models
6.3
Two specifications of a linear model
6.3.1
The “error draw” specification
6.3.2
The “conditional draw” specification
6.3.3
Comparing the two ways of specifying the linear model
6.4
Statistical models are used for prediction, explanation, and description
6.5
What do we call the
\(X\)
and
\(Y\)
variables?
6.6
Modeling strategy
6.7
Fitting the model
6.8
Models fit to data in which the
\(X\)
are treatment variables are regression models
6.9
Assumptions for inference with a statistical model
6.10
Specific assumptions for inference with a linear model
6.11
“linear model,”regression model“, or”statistical model"?
7
Models with a single, continuous
X
7.1
A linear model with a single, continuous
X
is classical “regression”
7.1.1
Analysis of “green-down” data
7.1.2
Learning from the green-down example
7.1.3
What a regression coefficient means
7.1.4
Using the linear model for prediction – prediction models
7.1.5
Using a linear model for “explanation” – causal models
7.2
Working in R
7.2.1
Fitting the linear model
7.2.2
Getting to know the linear model: the
summary
function
7.2.3
Inference – the coefficient table and Confidence intervals
7.2.4
How good is our model?
8
A linear model with a single, categorical
X
8.1
A linear model with a single, categorical
X
estimates the effects of
X
on the response.
8.1.1
Table of model coefficients
8.1.2
The linear model
8.1.3
Reporting results
8.2
Comparing the results of a linear model to classical hypothesis tests
8.2.1
t-tests are special cases of a linear model
8.2.2
ANOVA is a special case of a linear model
8.3
Working in R
8.3.1
Fitting the model
8.3.2
Changing the reference level
8.3.3
An introduction to contrasts
8.3.4
Harrell plot
9
Model Checking
9.1
Do coefficients make numeric sense?
9.2
All statistical analyses should be followed by model checking
9.3
Linear model assumptions
9.4
Diagnostic plots use the residuals from the model fit
9.4.1
Residuals
9.4.2
A Normal Q-Q plot is used to check normality
9.4.3
Outliers - an outlier is a point that is highly unexpected given the modeled distribution.
9.5
Model checking homoskedasticity
9.6
Model checking independence - hapiness adverse example.
9.7
Using R
10
Model Fitting and Model Fit (OLS)
10.1
Least Squares Estimation and the Decomposition of Variance
10.2
OLS regression
10.3
How well does the model fit the data?
\(R^2\)
and “variance explained”
11
Best Practices – Issues in Inference
11.1
Power
11.1.1
“Types” of Error
11.2
multiple testing
11.2.1
Some background
11.2.2
Multiple testing – working in R
11.2.3
False Discovery Rate
11.3
difference in p is not different
11.4
Inference when data are not Normal
11.4.1
Working in R
11.4.2
Bootstrap Confidence Intervals
11.4.3
Permutation test
11.4.4
Non-parametric tests
11.4.5
Log transformations
11.4.6
Performance of parametric tests and alternatives
11.5
max vs. mean
11.6
pre-post, normalization
Part IV: More than one
\(X\)
– Multivariable Models
12
Adding covariates to a linear model
12.1
Adding covariates can increases the precision of the effect of interest
12.2
Adding covariates can decrease prediction error in predictive models
12.3
Adding covariates can reduce bias due to confounding in explanatory models
12.4
Best practices 1: A pre-treatment measure of the response should be a covariate and not subtracted from the post-treatment measure (regression to the mean)
12.4.1
Regression to the mean in words
12.4.2
Regression to the mean in pictures
12.4.3
Do not use percent change, believing that percents account for effects of initial weights
12.4.4
Do not “test for balance” of baseline measures
12.5
Best practices 2: Use a covariate instead of normalizing a response
13
Two (or more) Categorical
\(X\)
– Factorial designs
13.1
Factorial experiments
13.1.1
Model coefficients: an interaction effect is what is leftover after adding the treatment effects to the control
13.1.2
What is the biological meaning of an interaction effect?
13.1.3
The interpretation of the coefficients in a factorial model is entirely dependent on the reference…
13.1.4
Estimated marginal means
13.1.5
In a factorial model, there are multiple effects of each factor (simple effects)
13.1.6
Marginal effects
13.1.7
The additive model
13.1.8
Reduce models for the right reason
13.1.9
What about models with more than two factors?
13.2
Reporting results
13.2.1
Text results
13.3
Working in R
13.3.1
Model formula
13.3.2
Modeled means
13.3.3
Marginal means
13.3.4
Contrasts
13.3.5
Simple effects
13.3.6
Marginal effects
13.3.7
Plotting results
13.4
Problems
14
ANOVA Tables
14.1
Summary of usage
14.2
Example: a one-way ANOVA using the vole data
14.3
Example: a two-way ANOVA using the urchin data
14.3.1
How to read an ANOVA table
14.3.2
How to read ANOVA results reported in the text
14.3.3
Better practice – estimates and their uncertainty
14.4
Unbalanced designs
14.4.1
What is going on in unbalanced ANOVA? – Type I, II, III sum of squares
14.4.2
Back to interpretation of main effects
14.4.3
The anova tables for Type I, II, and III sum of squares are the same if the design is balanced.
14.5
Working in R
14.5.1
Type I sum of squares in R
14.5.2
Type II and III Sum of Squares
15
Predictive Models
15.1
Overfitting
15.2
Model building vs. Variable selection vs. Model selection
15.2.1
Stepwise regression
15.2.2
Cross-validation
15.2.3
Penalization
15.3
Shrinkage
Additional tools and information
16
Linear mixed models
16.1
Random effects
16.2
Random effects in statistical models
16.3
Linear mixed models are flexible
16.4
Blocking
16.4.1
Visualing variation due to blocks
16.4.2
Blocking increases precision of point estimates
16.5
Pseudoreplication
16.5.1
Visualizing pseduoreplication
16.6
Mapping NHST to estimation: A paired t-test is a special case of a linear mixed model
16.7
Advanced topic – Linear mixed models shrink coefficients by partial pooling
16.8
Working in R
16.8.1
coral data
17
Generalized linear models I: Count data
17.1
The generalized linear model
17.2
Count data example – number of trematode worm larvae in eyes of threespine stickleback fish
17.2.1
Modeling strategy
17.2.2
Checking the model I – a Normal Q-Q plot
17.2.3
Checking the model II – scale-location plot for checking homoskedasticity
17.2.4
Two distributions for count data – Poisson and Negative Binomial
17.2.5
Fitting a GLM with a Poisson distribution to the worm data
17.2.6
Model checking fits to count data
17.2.7
Fitting a GLM with a Negative Binomial distribution to the worm data
17.3
Working in R
17.3.1
Fitting a GLM to count data
17.3.2
Fitting a generalized linear mixed model (GLMM) to count data
17.3.3
Fitting a generalized linear model to continouus data
17.4
Problems
18
Linear models with heterogenous variance
18.1
gls
Part V: Expanding the Linear Model – Generalized Linear Models and Multilevel (Linear Mixed) Models
19
Plotting functions (#ggplotsci)
19.1
odd-even
19.2
estimate response and effects with emmeans
19.3
emm_table
19.4
pairs_table
19.5
gg_mean_error
19.6
gg_ancova
19.7
gg_mean_ci_ancova
19.8
gg_effects
Appendix 1: Getting Started with R
19.9
Get your computer ready
19.9.1
Start here
19.9.2
Install R
19.9.3
Install R Studio
19.9.4
Install R Markdown
19.9.5
(optional) Alternative LaTeX installations
19.10
Start learning R Studio
Appendix 2: Online Resources for Getting Started with Statistical Modeling in R
Appendix 3: Fake Data Simulations
19.11
Performance of Blocking relative to a linear model
Published with bookdown
Elements of Statistical Modeling for Experimental Biology
Part III: Introduction to Linear Models