Elements of Applied Biostatistics
Preface
0.1
Math
0.2
R and programming
Part I: Getting Started
1
Getting Started – R Projects and R Markdown
1.1
R vs R Studio
1.2
Download and install R and R studio
1.3
Open R Studio and modify the workspace preference
1.4
If you didn’t modify the workspace preferences from the previous section, go back and do it
1.5
R Markdown in a nutshell
1.6
Install R Markdown
1.7
Importing Packages
1.8
Create an R Studio Project for this textbook
1.9
Working on a project, in a nutshell
1.10
Create an R Markdown file for this Chapter
1.10.1
Modify the yaml header
1.10.2
Modify the “setup” chunk
1.10.3
Create a “fake-data” chunk
1.10.4
Create a “plot” chunk
1.10.5
Knit
2
Analyzing experimental data with a linear model
2.1
This text is about the estimation of treatment effects and the uncertainty in our estimates using linear models. This, raises the question, what is “an effect”?
Background physiology to the experiments in Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
Analyses for Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
2.2
Setup
2.3
Data source
2.4
control the color palette
2.5
useful functions
2.6
figure 2b – effect of ASK1 deletion on growth (body weight)
2.6.1
figure 2b – import
2.6.2
figure 2b – exploratory plots
2.7
Figure 2c – Effect of ASK1 deletion on final body weight
2.7.1
Figure 2c – import
2.7.2
Figure 2c – check own computation of weight change v imported value
2.7.3
Figure 2c – exploratory plots
2.7.4
Figure 2c – fit the model: m1 (lm)
2.7.5
Figure 2c – check the model: m1
2.7.6
Figure 2c – fit the model: m2 (gamma glm)
2.7.7
Figure 2c – check the model, m2
2.7.8
Figure 2c – inference from the model
2.7.9
Figure 2c – plot the model
2.7.10
Figure 2c – report
2.8
Figure 2d – Effect of ASK1 KO on glucose tolerance (whole curve)
2.8.1
Figure 2d – Import
2.8.2
Figure 2d – exploratory plots
2.8.3
Figure 2d – fit the model
2.8.4
Figure 2d – check the model
2.8.5
Figure 2d – inference
2.8.6
Figure 2d – plot the model
2.9
Figure 2e – Effect of ASK1 deletion on glucose tolerance (summary measure)
2.9.1
Figure 2e – message the data
2.9.2
Figure 2e – exploratory plots
2.9.3
Figure 2e – fit the model
2.9.4
Figure 2e – check the model
2.9.5
Figure 2e – inference from the model
2.9.6
Figure 2e – plot the model
2.10
Figure 2f – Effect of ASK1 deletion on glucose infusion rate
2.10.1
Figure 2f – import
2.10.2
Figure 2f – exploratory plots
2.10.3
Figure 2f – fit the model
2.10.4
Figure 2f – check the model
2.10.5
Figure 2f – inference
2.10.6
Figure 2f – plot the model
2.11
Figure 2g – Effect of ASK1 deletion on tissue-specific glucose uptake
2.11.1
Figure 2g – import
2.11.2
Figure 2g – exploratory plots
2.11.3
Figure 2g – fit the model
2.11.4
Figure 2g – check the model
2.11.5
Figure 2g – inference
2.11.6
Figure 2g – plot the model
2.12
Figure 2h
2.13
Figure 2i – Effect of ASK1 deletion on liver TG
2.13.1
Figure 2i – fit the model
2.13.2
Figure 2i – check the model
2.13.3
Figure 2i – inference
2.13.4
Figure 2i – plot the model
2.13.5
Figure 2i – report the model
2.14
Figure 2j
Part III: R fundamentals
3
Data – Reading, Wrangling, and Writing
3.1
Learning from this chapter
3.2
Working in R
3.2.1
Importing data
3.3
Data wrangling
3.3.1
Reshaping data – Wide to long
3.3.2
Reshaping data – Transpose (turning the columns into rows)
3.3.3
Combining data
3.3.4
Subsetting data
3.3.5
Wrangling columns
3.3.6
Missing data
3.4
Saving data
3.5
Exercises
4
Plotting Models
4.1
Pretty good plots show the model and the data
4.1.1
Pretty good plot component 1: Modeled effects plot
4.1.2
Pretty good plot component 2: Modeled mean and CI plot
4.1.3
Combining Effects and Modeled mean and CI plots – an Effects and response plot.
4.1.4
Some comments on plot components
4.2
Working in R
4.2.1
Source data
4.2.2
How to plot the model
4.2.3
How to use the Plot the Model functions
4.2.4
How to generate a Response Plot using ggpubr
4.2.5
How to generate a Response Plot with a grid of treatments using ggplot2
4.2.6
How to generate an Effects Plot
4.2.7
How to combine the response and effects plots
4.2.8
How to add the interaction effect to response and effects plots
Part IV: Some Fundamentals of Statistical Modeling
5
Variability and Uncertainty (Standard Deviations, Standard Errors, Confidence Intervals)
5.1
The sample standard deviation vs. the standard error of the mean
5.1.1
Sample standard deviation
5.1.2
Standard error of the mean
5.2
Using Google Sheets to generate fake data to explore the standard error
5.2.1
Steps
5.3
Using R to generate fake data to explore the standard error
5.3.1
part I
5.3.2
part II - means
5.3.3
part III - how do SD and SE change as sample size (n) increases?
5.3.4
Part IV – Generating fake data with for-loops
5.4
Bootstrapped standard errors
5.4.1
An example of bootstrapped standard errors using vole data
5.5
Confidence Interval
5.5.1
Interpretation of a confidence interval
6
P-values
6.1
A
p
-value is the probability of sampling a value as or more extreme than the test statistic if sampling from a null distribution
6.2
Pump your intuition – Creating a null distribution
6.3
A null distribution of
t
-values – the
t
distribution
6.4
P-values from the perspective of permutation
6.5
Parametric vs. non-parametric statistics
6.6
frequentist probability and the interpretation of p-values
6.6.1
Background
6.6.2
This book covers frequentist approaches to statistical modeling and when a probability arises, such as the
p
-value of a test statistic, this will be a frequentist probability.
6.6.3
Two interpretations of the
p
-value
6.6.4
NHST
6.7
Some major misconceptions of the
p
-value
6.7.1
Misconception:
p
is the probability that the null is true
and
\(1-p\)
is probability that the alternative is true
6.7.2
Misconception: a
p
-value is repeatable
6.7.3
Misconception: 0.05 is the lifetime rate of false discoveries
6.7.4
Misconception: a low
p
-value indicates an important effect
6.7.5
Misconception: a low
p
-value indicates high model fit or high predictive capacity
6.8
What the
p
-value does not mean
6.9
Recommendations
6.9.1
Primary sources for recommendations
6.10
Problems
7
Errors in inference
7.1
Classical NHST concepts of wrong
7.1.1
Type I error
7.1.2
Power
7.2
A non-Neyman-Pearson concept of power
7.2.1
Estimation error
7.2.2
Coverage
7.2.3
Type S error
7.2.4
Type M error
Part V: Introduction to Linear Models
8
An introduction to linear models
8.1
Two specifications of a linear model
8.1.1
The “error draw” specification
8.1.2
The “conditional draw” specification
8.1.3
Comparing the error-draw and conditional-draw ways of specifying the linear model
8.1.4
ANOVA notation of a linear model
8.2
A linear model can be fit to data with continuous, discrete, or categorical
\(X\)
variables
8.2.1
Fitting linear models to experimental data in which the
\(X\)
variable is continuous or discrete
8.2.2
Fitting linear models to experimental data in which the
\(X\)
variable is categorical
8.3
Statistical models are used for prediction, explanation, and description
8.4
What do we call the
\(X\)
and
\(Y\)
variables?
8.5
Modeling strategy
8.6
Predictions from the model
8.7
Inference from the model
8.7.1
Assumptions for inference with a statistical model
8.7.2
Specific assumptions for inference with a linear model
8.8
“linear model,”regression model“, or”statistical model"?
9
Linear models with a single, continuous
X
(“regression”)
9.1
A linear model with a single, continuous
X
is classical “regression”
9.1.1
Analysis of “green-down” data
9.1.2
Learning from the green-down example
9.1.3
Using a regression model for “explanation” – causal models
9.1.4
Using a regression model for prediction – prediction models
9.1.5
Using a regression model for creating a new response variable – comparing slopes of longitudinal data
9.1.6
Using a regression model for for calibration
9.2
Working in R
9.2.1
Fitting the linear model
9.2.2
Getting to know the linear model: the
summary
function
9.2.3
Inference – the coefficient table
9.2.4
How good is our model? – Model checking
9.2.5
Plotting models with continuous
X
9.2.6
Creating a table of predicted values and 95% prediction intervals
9.3
Hidden code
9.3.1
Import and plot of fig2c (ecosystem warming experimental) data
9.3.2
Import and plot efig_3d (Ecosysem warming observational) data
9.3.3
Import and plot of fig1f (methionine restriction) data
9.4
Try it
9.4.1
A prediction model from the literature
9.5
Intuition pumps
9.5.1
Correlation and $R^2
10
Linear models with a single, categorical
X
(“t-tests” and “ANOVA”)
10.1
A linear model with a single, categorical
X
variable estimates the effects of the levels of
X
on the response
10.1.1
Example 1 – two treatment levels (“groups”)
10.1.2
Understanding the analysis with two treatment levels
10.1.3
Example 2 – three treatment levels (“groups”)
10.1.4
Understanding the analysis with three (or more) treatment levels
10.2
Working in R
10.2.1
Fit the model
10.2.2
Controlling the output in tables using the coefficient table as an example
10.2.3
Using the emmeans function
10.2.4
Using the contrast function
10.2.5
How to generate ANOVA tables
10.3
Hidden Code
10.3.1
Importing and wrangling the fig_3d data for example 1
10.3.2
Importing and wrangling the fig2a data for example 2
11
Model Checking
11.1
All statistical analyses should be followed by model checking
11.2
Linear model assumptions
11.2.1
A bit about IID
11.3
Diagnostic plots use the residuals from the model fit
11.3.1
Residuals
11.3.2
A Normal Q-Q plot is used to check for characteristic departures from Normality
11.3.3
Mapping QQ-plot departures from Normality
11.3.4
Model checking homoskedasticity
11.4
Using R
11.4.1
Normal Q-Q plots
12
Linear models with violations of independence, homogeneity, or Normality
12.1
Lack of independence
12.1.1
A paired t-test is a special case of a linear model for correlated data with two groups
12.1.2
Example 2 – more than two groups
12.2
Heterogeneity of variances
12.3
The conditional response isn’t Normal
12.3.1
Example 1 (fig6f) – Linear models for non-normal count data
12.4
Hidden Code
12.4.1
Importing and wrangling the fig1b data
12.4.2
Importing and wrangling the fig2a data
12.4.3
Importing and wrangling the fig6f data
13
Issues in inference
13.1
Pre-post designs (change from baseline)
13.2
Longitudinal designs
13.3
Comparing responses normalized to a standard
13.4
Comparing responses that are ratios
13.5
Researcher degrees of freedom
Part VI: More than one
\(X\)
– Multivariable Models
14
Linear models with an added covariate (“ANCOVA”)
14.1
Adding covariates can increases the precision of the effect of interest
14.2
Understanding a linear model with an added covariate – heart necrosis data
14.2.1
Fit the model
14.2.2
Plot the model
14.2.3
Interpretation of the model coefficients
14.2.4
Everything adds up
14.2.5
Interpretation of the estimated marginal means
14.2.6
Interpretation of the contrasts
14.2.7
Adding the covariate improves inference
14.3
Understanding interaction effects with covariates
14.3.1
Fit the model
14.3.2
Plot the model with interaction effect
14.3.3
Interpretation of the model coefficients
14.3.4
What is the effect of a treatment, if interactions are modeled? – it depends.
14.3.5
Which model do we use,
\(\mathcal{M}_1\)
or
\(\mathcal{M}_2\)
?
14.4
Understanding ANCOVA tables
14.5
Working in R
14.5.1
Importing the heart necrosis data
14.5.2
Fitting the model
14.5.3
Using the emmeans function
14.5.4
ANCOVA tables
14.5.5
Plotting the model
14.6
Best practices
14.6.1
Do not use a ratio of part:whole as a response variable – instead add the denominator as a covariate
14.6.2
Do not use change from baseline as a response variable – instead add the baseline measure as a covariate
14.6.3
Do not “test for balance” of baseline measures
14.7
Best practices 2: Use a covariate instead of normalizing a response
15
Linear models with two categorical
\(X\)
– Factorial designs (“two-way ANOVA”)
15.1
Factorial experiments
15.2
Example 1: Maize defense response
15.2.1
Analysis
15.2.2
A factorial model adds an interaction effect to the coefficients table
15.2.3
The biological interpretation of an interaction effect
15.2.4
Conditional and marginal means
15.2.5
Simple (conditional) effects
15.2.6
Marginal effects
15.2.7
The additive model
15.2.8
Reduce models for the right reason
15.3
Working in R
15.3.1
Model formula
Part VII – Expanding the Linear Model
16
Linear models for longitudinal experiments – I. pre-post designs
16.1
Best practice models
16.2
Common alternatives that are not recommended
16.3
Advanced models
16.4
Understanding the alternative models
16.4.1
(M1) Linear model with the baseline measure as the covariate (ANCOVA model)
16.4.2
(M2) Linear model of the change score (change-score model)
16.4.3
(M3) Linear model of post-baseline values without the baseline as a covariate (post model)
16.4.4
(M4) Linear model with factorial fixed effects (fixed-effects model)
16.4.5
(M5) Repeated measures ANOVA
16.4.6
(M6) Linear mixed model
16.4.7
(M7) Linear model with correlated error
16.4.8
(M8) Constrained fixed effects model with correlated error (cLDA model)
16.4.9
Comparison table
16.5
Example 1 – a single post-baseline measure (pre-post design)
16.6
Working in R
16.7
Hidden code
16.7.1
Import and wrangle mouse sociability data
17
Linear models for count data – Generalized Linear Models I
17.1
The generalized linear model
17.2
Count data example – number of trematode worm larvae in eyes of threespine stickleback fish
17.2.1
Modeling strategy
17.2.2
Checking the model I – a Normal Q-Q plot
17.2.3
Checking the model II – scale-location plot for checking homoskedasticity
17.2.4
Two distributions for count data – Poisson and Negative Binomial
17.2.5
Fitting a GLM with a Poisson distribution to the worm data
17.2.6
Model checking fits to count data
17.2.7
Fitting a GLM with a Negative Binomial distribution to the worm data
17.3
Working in R
17.3.1
Fitting a GLM to count data
17.3.2
Fitting a generalized linear mixed model (GLMM) to count data
17.3.3
Fitting a generalized linear model to continouus data
17.4
Problems
18
Linear models with heterogenous variance
Part V: Expanding the Linear Model – Generalized Linear Models and Multilevel (Linear Mixed) Models
Appendix 1: Getting Started with R
18.1
Get your computer ready
18.1.1
Start here
18.1.2
Install R
18.1.3
Install R Studio
18.1.4
Install R Markdown
18.1.5
(optional) Alternative LaTeX installations
18.2
Start learning R Studio
Appendix 2: Online Resources for Getting Started with Statistical Modeling in R
Published with bookdown
Elements of Statistical Modeling for Experimental Biology
Part V: Expanding the Linear Model – Generalized Linear Models and Multilevel (Linear Mixed) Models