Biometry/Lectures/Midterm2/Midterm_2_Notes.md at master · dbarnas/Biometry

title	Midterm 2
subtitle	October 22, 2020

Biometry Midterm 2

Contents:

General Linear Models
F Ratio Statistic
ANOVA
Variance Component
Block Design
Coding
Coding ANOVA
Simple Plots

Lecture Portion of Exam:

Will cover multiple regression up through Lecture 15 (power analyses)

R portion of Exam:

Regression up through ANCOVA and Mixed Models (what was on PS5)

General Linear Models

Regression
Multiple Regression
ANOVA

Regression

1 continuous response variable
1 continuous predictor variable
y = beta-0 + (beta-1)(x) + error
Purposes:
- Description: relationship between Y and X
- Explaination: How much variatino in Y is explained by linear relationship with X
- Prediction: predict new Y-values from new X-values & determine precision of those estimates
Linear Regression by hand
1. calculate mean values of x and y
2. calculate deviations from means of x and y (xi - x-bar) and (yi - y-bar)
3. calculate xy deviation cross-products (xi - x-bar)*(yi - y-bar)
4. calculate slope (beta-1) sum(deviation cross products) / sum of squares for x
5. solve for the intercept (beta-0) y-bar = beta-0 + (beta-1)(x-bar) -> beta-0 = y-bar - (beta-1)(x-bar)
6. based on the slope and intercept, calculate yi-hat (predicted y) values for each x
7. hypothesis test with F-ratio = MS-model / MS-error, MS = SS/df = variance
  1. null hypothesis: slope = 0
8. Ordinary Least Squares
  1. SS(yi-hat - y-bar) + SS(yi - yi-hat)
  2. the line that is, on average, closest to all points in the sample
  3. minimizes sum of swuared deviations
9. Analysis of Variance
  1. Regression: SS = sum((yi-hat - y-bar)^2), df = 1, variance = Mean Square = SS/1
  2. Residual: SS = sum((yi - yi-hat)^2), df = n-2, variance = MS = SS/(n-2)
10. F = MSregression / MSresidual
  1. if MSreg >> MSresid -> variation explained by regression >> variation explained by error -> F ratio >> 1
11. r2, coefficient of determination, measures explained variation
  1. r2 = SSregression / SStotal
  2. proportion of variation in Y explained by linear relationship with X
  3. r2 = square of correlation coefficient
Assumptions of Linear Regression
1. Normality
2. homogeneity of variance
3. independence of samples
4. linearity between x and y
Residuals
1. used for examining model assumptions
2. yi - yi-hat
3. check normality of residuals
4. plot residuals against predicted y (yi-hat)
  1. want no pattern in residuals
Leverage
1. Cook's D to test if outliers are influential

Multiple Regression

more variables can reduce unexplained variation
smaller residual error produces more powerful tests of each predictor
plot residuals of factor A regression vs factor B
- there may have been a hidden relationship between factor B and the response variable that was hidden by the effect of factor A
- could the the same by opposite for factor B residuals and factor A
Hypothesis test: whether overall regression equation is significant
- H0: slope = 0
- t-tests or F tests (R gives t-tests)
Assumptions
- normality and homogeneity of variance for response variable
- independence of observations
- linearity
- no collinearity
  - collinearity = predictors are correlated = unreliable estimates of slopes
Checks for collinearity
- correlation matrix (visual)
- tolerance for each predictor: 1-r2 for regression of that predictor on all others
- VIF values (variance inflator factor): 1/tolerance, look for large values (>10)

ANOVA

1 continuous response variable
1+ categorical predictor variable(s) = factor(s)
tests whether means differ (like t-test but with more than 2 means)
Assumptions
- applied to residuals from linear model
- normality
- homogeneity of variances
- independence

F Tests/F Statistic

F tests compare explained variation from our model / unexplained variation (residual/error)
F statistic is used to get probability that our Explained = Unexplained
Null hypothesis: slope = 0
- p < 0.05 indicates slope is significantly different from 0 and there is a relationship between X and Y
MS = SS / df = variance
F = MS1 / MS2
% variation explained by elements of model = element's variance / total variance * 100
Residuals
- difference between observed value and predicted value

Fixed Factor

the factor of interest in our study
testing the means to get a p value
all levels or groups of interest are used in the study
conclusions are restricted to those particular groups (cannot extrapolate)
Among SS increased by an added component due to treatment
have "sampled" the whole population of treatment levels
have no variance components
there are no unknowns - have measured the exact effect of a treatment level
could replicate and should theoretically get the same result

Random Factor

random sampling of all groups of interest are used in the study
conclusions can be extrapolated to all possible groups
do not get a p value
don't care about testing the mean
used for explaining variation
used for avoiding pseudoreplication
Among SS increased by an added variance component
sampled a subset of all levels of a treatment that you care about
don't know the exact effect of each level (represent a broader distribution)
repeating the experiment with a different set of levels will give a different estimate of added variance

One-Way ANOVA: partitioning variation

SS Total = SS Between Groups + SS Within Groups
Explained variation is between groups (difference between group means and grand means)
unexplained variation is within groups (difference between actual value and group mean)

One-Way ANOVA table

Groups
- SS = n * sum of ((group mean - grand mean)^2)
- df = p-1
- MS = SS-groups / df-groups
Residual
- SS = sum of each group's sum of ((data point - group mean)^2)
- df = p(n-1)
- MS = SS-resid / df-resid
Total
- df = pn-1

Two-Factor ANOVA

Two factors (2 categorical predictor variables)
- Factor A (with p groups or levels)
- Factor B (with q groups or levels)
Crossed design (= orthogonal)
- every level of one factor is crossed with every level of the second factor
- if not every level is crossed with every other level, then do a nested anova
Greater efficiency
- can answer two questions with one experiment

Two-way ANOVA table

degrees of freedom are the same as a one-way anova for each independent factor
- df factor A = p-1 (# levels - 1)
- df factor B = q-1 (# levels - 1)
df for the interaction term is (p-1)*(q-1)
df residual/error is pq*(n-1)
interaction null hypothesis: no interaction between predictor factors
- p < 0.05 shows significant interaction
Look at test of interaction first, THEN determine whether to interpret main effects
- no interaction if roughly parallel lines
- different slopes means interaction
  - non intersecting: can still interpret main effects
  - intersecting: do not interpret main effects

Nested ANOVA

used to account for subsamples nested within replicates
if nesting is not acknowledged, these designs are pseudoreplicated
when all levels of one factor are not present in all levels of another factor
usually random factors
Partitioning Total Variation
- SS-A: variation among A means
- SS-B(A): variation among B means within each level of A
- SS-residual: variation among replicates within each B(A)
if you have nested factors within your treatment, you need to replicate the nested factor, not the subsamples
will want to know
- variance at each level (variance componenets)

Nested ANOVA Table

Factor A
- SS-A = pq * sum of ((group mean - grand mean)^2)
- df = p-1
- MS-A = SS-A / (p-1)
- F = MS-A / MS-B(A)
Factor B
- SS-B(A) = q * sum of each group's sum of ((data point - group mean)^2)
- df = p(q-1)
- MS-B(A) = SS-B(A) / (p(q-1))
- F = MS-B(A) / MS-residual
Residual
- SS = sum of each group's sum of ((data point - nested group mean)^2)
- df = pq(n-1)
- MS = SS-resid / (pq(n-1))

Mixed Model ANOVA

Fixed Factor: Among SS increasd by an added component due to treatment
Random Factor: Among SS increased by an added variance component (sigma-squqred-A)

Mixed Model ANOVA Table

Both factors Fixed
- MS-A: F-ratio = MS-A / MS-residual, df = p-1
- MS-B: F-ratio = MS-B / MS-residual, df = q-1
- MS-AB: F-ratio = MS-AB / MS-residual, df = (p-1)(q-1)
- MS-residual
Both factors Random
- MS-A: F-ratio = MS-A / MS-AB
- MS-B: F-ratio = MS-B / MS-AB
- MS-AB: F-ratio = MS-AB / MS-residual
- MS-residual
Factor A Fixed, Factor B Random
- MS-A: F-ratio = MS-A / MS-AB
- MS-B: F-ratio = MS-B / MS-residual
- MS-AB: F-ratio = MS-AB / MS-residual
- MS-residual

Variance component

estimates the added variance that a group of means contributes to the total
sigma-squared(among) = sigma-squared-e + n(sigma-squared-A
sigma-squared-e: variance component due to error
sigma-squraed-a: variance component due to random factor

Block Design

group replicates into blocks
reduce unexplained variation, without increasing size of experiment
used to unconfound effect of treatment from another variable that you know is present and may influence response to the treatment
Types
- completely randomized, no blocking
- randomized block - no replication within blocks (one of each treatment)
- replicated randomized block - maximize blocks, minimize replication within blocks (treatmnets replicated within blocks)
  - df-A: 1
  - df-Block:(# blocks) - 1
  - df-AB
  - df-resid
- replicated randomized block - minimize blocks, maximize replication within blocks (treatments replicated within blocks)
  - df-A: 1
  - df-Block: (# blocks) - 1 = 1
  - df-AB: 2-1 = 1
  - df-resid: (p-1)(q-1) = 20

Split Plot Design

usually 3 factors, with one partly nested within the others
1 treatment is applied to the whole plot
Another factor (B) has each of its levels applied to subplots nested within A
1 factor (C) is a block that includes each level of A
- often unreplicated

Repeated Measures ANOVA

same sampling units are measured more than once
two types of Repeated Measures
- all treatments applied to all replicates in random order
  - A (treatment): df = p-1, F = MS-A/MS-AB
  - B (subject = block): df = q-1, F = MS-B/MS-residual
  - Residual: df = (p-1)(q-1)
- each treatmenet applied to a subset of replicates
  - A (treatment, fixed): df = p-1, F = MS-A/MS-B(A)
  - B(A) (subject, random): df = q-1, F = MS-B/MS-resid
  - C (time, fixed): df = r-1, MS-C/MS-resid
  - AxC (fixed): df = (p-1)(q-1), F = MS-AC/MS-resid
  - Residual: df = pq(r-1)

ANCOVA

at least on Categorical and at least one Continuous predictor variable(s)
linear model
combines ANOVA and regression
Purposes
- reduce unexplained variation to make tests more powerful
- compare slopes of two or more regression lines
same slope between groups: no significant interaction between treatment and covariate
different slopes: significant interaction between treatment and covariate

Other approaches

Tests that allow unequal variances
- Welch's test, Wilcox Z Test
- limited to simple experimental designs
Rank-based "nonparametric tests"
- Kruskal Wallis
  - limited to simple experimental designs (one-way 'ANOVA' style)
Generalized Linear Modeling
- if non-normal distribution is known (e.g. Poisson)
Permutation Tests
- PERMANOVA

Correlation Tests

Pearson's R: for linear relationships; calculated on ranks
- requires normality

mytest1<-cor.test(mydata$cryduration, mydata$IQ, method="pearson")
mytest1

library(car)
qqp(mydata$cryduration, "norm")
qqp(mydata$IQ, "norm")

Spearman's rho: nonlinear relationships (non-parametric correlation coefficients); calculated on ranks
- if data are not normal, could either do a Spearman's rho or could transform data for normality and do Pearson's

mytest2<-cor.test(mydata$cryduration, mydata$IQ, method="spearman")
mytest2

Kendall's tau: nonlinear relationships (non-parametric correlation coefficients); concordant and discordant pairs

mytest3<-cor.test(mydata$cryduration, mydata$IQ, method="kendall")
mytest3

Outlier Test

Bonferroni test
- only works after you've already fit a model to your data
- Tells you whether most extreme value has undue influence
- If P<0.05, then point is a potential outlier

library(car)
model1<-lm(Bush~Buchanan, data=mydata)
outlierTest(model1)

Cook's D for leverage
- Points greater than 0.1 are potential outliers
- We should be concerned about leverage if the value is > 0.2 and very concerned if it is >0.5. For influence, Cook’s D values > 1-2 should cause you to carefully scrutinize the data

influencePlot(model1)
#or simply
plot(model1, 4)

#or more complex
cutoff <- 4/((nrow(mydata)-length(model1$coefficients)-2)) 
plot(model1, which=4, cook.levels=cutoff)

Coding Regression

one or more predictor (independent) variable(s), X
one response (dependent) variable, Y
F ratio
- Regression: variation in Y explained by regression
  - df = 1
  - MS = SS / 1
- Residual: variation in Y not explained by regression
  - df = n-2
  - MS = SS / n-2
- r2 coefficient of determination
  - measures explained variation; proportion of variation in Y explained by linear relationship with X
  - r2 = square of correlation coefficient, r
  - SS total = SS-reg + SS-resid
  - r2 = SS-reg / SS-total

Create the model
1. Before we can fit a regression line, we need to describe the model

model<-lm(dependent~independent, data=mydata)

Check linearity

plot(NumberSpp~Biomass, data=data4) #, col="blue", xlab="Biomass (mg/m2)", ylab="Species Richness")
abline(model4, col="blue") #adds fit from the model

Check assumptions

# diagnostic plots
plot(model)

# check normality of residuals
library(car)
qqp(residuals(model))

# homogeneity of variance using box plot
boxplot(residuals(model)~mydata$Nitrogen)

Check for outliers and high leverage points

#Check for outliers using Bonferonni Test:
outlierTest(model)

#Check for high leverage points
# With the leverage plot, you want values Cook's distance less than 1:
plot(model,4)
# or
influencePlot(model) #Points greater than 1 are potential outliers

test model significance
1. F test: Get R2 value, F, df, and p for goodness of fit of the model

summary(model)
# Estimate = slope
# gives t value and p value for each predictor variable

Coding Multiple Regression

Create the full model
1. First we want to know which model fits the data best

fullmodel<-lm(krat_density~shrubcover+seedproduction+snakedensity, data=mydata)

Check for collinearity in your data

library(GGally)
X<-mydata[,c(2,3,4)] #gives you just columns 2,3,4
ggpairs(X)

#To get actual numbers on collinearity better test:
library(mctest)
imcdiag(fullmodel)

Get AIC for model and reduced models
1. Want to choose simpler models (that doesn't have collinearity problems) - lowest AIC value

AIC(fullmodel)

#Now let's make every other possible model:
reduced1<-lm(krat_density~shrubcover+seedproduction, data=mydata) #We might be cautious with this model because these variables are co-linear
reduced2<-lm(krat_density~shrubcover+snakedensity, data=mydata)
reduced3<-lm(krat_density~seedproduction+snakedensity, data=mydata)

#We also have the simple models (single regressions). Let's get the AIC value for each model:
AIC(fullmodel)
AIC(reduced1)
AIC(reduced2)
AIC(reduced3)

Check assumptions

plot(reduced2) #homogeneity of variances of the model
library(car)
qqp(residuals(reduced2)) #normality

Test model significance or get ANOVA table

summary(reduced2)

# Traditional ANOVA table, if you like that output better
# Gives F tests instead of t and gives df
anova(reduced2)

Get the partial standardized regression coefficients for each factor
1. Sign doesn't matter. just value from 0 to 1 for effect size/importance of predictor variable

library(QuantPsyc)
lm.beta(reduced2)

Model I and Model II Regression

Model I
- measured without error. you chose specific values of X (i.e. fixed) (X is a set variable)
Model II
- There is error in the measure of x and y (X is a measured variable)
- both x and y are random variables, measured with error
- in biology, often do Model II designs but use Model I regression
  - fine for hypothesis tests (p values) but not for accurate estimates of slope and intercept
- can use Reduced Major Axis regression (Model II)
  - handles error in both x and y
  - lmodel2::RMA

#model II regression.
library(lmodel2)
model1<-lmodel2(area~height, range.y="relative", range.x="relative", data=mydata, nperm=99)
model1 #Use these parameters (from RMA) to get estimates of slope and intercept

Model Selection, AIC

AIC (Akaike's Information Criterion)
- takes the natural log of L (our likelihood) times 2, subtracted from double the number of parameters
- AIC = 2k - 2ln(L)
  - k = number of parameters
  - lower values are better fits of the model
  - can compare a model of 2 parameters to a model of 10 parameters and see if our model is stronger. AIC takes into account (penalizes you for) more parameters in your model
Likelihood Ratio Test

AIC(fullmodel)
AIC(reduced1)
AIC(reduced2)

# LRT
anova(fullmodel, reduced1)

Running an ANOVA

Load packages

library(tidyverse)
library(car) # qqp() and Anova()
library(emmeans) # post-hoc test emmeans()
library(agricolae) # HSD.test()
library(lme4) # for testing random effects
library(lmerTest) #Need this to get anova results from lmer
library(MASS)

Make sure predictors are factor type

#Need to make Treatment a factor
mydata$Treatment<-as.factor(mydata$Treatment)

Check if data are balanced

summary(mydata)

Create model

# One-way and Two-way Fixed Factor ANOVA
model<-aov(seed.mass~Treatment, data=mydata)
# or
model<-lm(seed.mass~Treatment, data=mydata)

# Random Factor and Mixed Model ANOVA
library(lme4)
library(lmerTest)
model<-lmer(seed.mass~Treatment + (1|random), data=mydata) # for random factors in two or more way anova's
# fixed: treatment
# random: random

# Nested ANOVA
model2<-lmer(Calyxarea~Genotype + (1|Ramet:Genotype), data=mydata)
# fixed: genotype
# nested: ramet within genotype

# Two-Way Fixed Factor ANOVA
model1<-lm(growth~Temperature + CO2 + Temperature:CO2, data=mydata)
# Shortcut to writing the same thing:
model1<-lm(growth~Temperature*CO2, data=mydata)

# Mixed Model ANOVA
model1<-lmer(WBCcolonies~Treatment + (1|Donor) + (1|Treatment:Donor), data=mydata)

# Mixed model with Block Design
model1<-lmer(abundance~disease*parasite + (1|block) + (1|disease:block) + (1|parasite:block) + (1|disease:parasite:block), data=mydata)

# Repeated Measures ANOVA (for time as a fixed factor)
model<-lmer(FvFm ~ temperature*day + (1|temperature:coral), data=data4) 
# fixed: temperature, day
# random: coral nested in temperature because different corals used in different treatments

# ANCOVA
model1<-lm(Weight ~ TidalHeight * Length, data=mydata)
# Make sure that the relationship with covariate is linear
plot(Weight~Length, data=mydata)
#You can stop right there if you want, but if you wanted to go further and do model selection
#Interaction term is highly non-significant, so you might want to do model selection and decide whether or not to drop it

Check assumptions of data and transform if necessary

plot(model) # first option to look for spread less than 3x for any group
library(car)
qqp(residuals(model), "norm") #When we're using lmer, we have to do the probability plot ourselves

AIC and Likelihood Ratio Test

model5<-lmer(Calyxarea~Genotype + (1|Ramet:Genotype), data=data5) #the full model with the random effect
model5b<-lm(Calyxarea~Genotype, data=data5) # model without the random effect

anova(model5, model5b)
# this is the Likelihood Ratio Test. It gives AIC values for both models
# look for lowest AIC
# and it gives us a p-value for whether one is a better fit. We COULD technically use that p-value as a hypothesis test for the random effect


# ANCOVA
model1<-lm(Weight ~ TidalHeight * Length, data=mydata)
model2<-lm(Weight ~ TidalHeight + Length, data=mydata)
AIC(model1)
AIC(model2)

Check results of ANOVA

anova(model2) # balanced data, fixed factor models

Anova(model, type="III") # unbalanced data and mixed models

Get variance components
Run all factors as random factors
summary table: random effects: % variation explained by each is Variance / Total Variance (must sum yourself, not included in table)
Residual = variation within replicates
Group = variation between groups
Nested factor = variation within groups

#To get variance components, we need to model both variables as independent random variables
library(lme4)
model1<-lmer(Calyxarea~1 + (1|Genotype) + (1|Ramet:Genotype), data=mydata)

#Check assumptions
plot(model1) #homogeneity looks good
library(car)
qqp(residuals(model1), "norm") #normality looks good

#Get variance components
summary(model1)

Contrasts aka Planned Comparisons (a priori)

Create two vectors of coefficients for the two contrasts
Check the level order R has
coefficients must be orthogonal
- each row sums to 0
- sum of the products of each column (element i of c1 * element i of c2) = 0
- grouped levels must have same sign
- compared levels must have opposite sign
- levels we aren't testing are 0

# We want to know what order R thinks out levels are in (alphabetical order)
levels(mydata$Treatment)

c1<-c(1,0,-1) #Control vs. Fence Control
c2<-c(0.5,-1,0.5) #(Control + Fence Control) vs. Exclusion

#You MUST make sure that these are orthogonal. Each row must add to zero.
#And if you multiply all the numbers in a column, the products must sum to zero also
(1*1/2) + (0*-1) + (-1*1/2) # = 0

#And now let's combine these two vectors together into a matrix:
contrastmatrix<-cbind(c1,c2)

#And now we're going to attach this contrast matrix to the dataset
contrasts(mydata$Treatment)<- contrastmatrix

#Now run the two contrasts like this:
summary(model1, split=list(Treatment=list("Effect of Fence"=1, "Effect of Rodents"=2)))

Bonferonni Adjust if cannot make coefficients orthogonal
Bonferroni adjusted pairwise tests (e.g. t-tests)
Not as powerful as Tukey HSD
Do multiple t-tests for each pair using a conservative correction 1. divide the desired level of Type I error by the number of comparisons 1. New Critical P-value = alpha (0.05) / c (number of comparisons)

c3<-c(0, -1, 1) #this is not orthogonal, so if I Bonferroni-adjust, it means I need p<0.025 to consider it significant

# Bonferonnie adjustment: new critical p value = 0.05 / 2 (only comparing 2 levels) = 0.025

contrastmatrix<-cbind(c3)

#And now we're going to attach this contrast matrix to the dataset
contrasts(mydata$Treatment)<- contrastmatrix

#Now run the two contrasts like this:
summary(model1, split=list(Treatment=list("Better Test of Rodents"=1)))

post-hoc Tukey Test

#To do a post-hoc Tukey test
TukeyHSD(model2) # requires aov() instead of lm() for model function

#Could also use HSD.test (gives you letters for different groups which is convenient)
library(agricolae)
HSD.test(model2, "Nitrogen", console=TRUE)

Get estimated means (emmeans)

#And let's create a bar plot of this data

# One-Way ANOVA
#I'm going to just get the estimated marginal means for the graph
library(emmeans)
graphdata<-as.data.frame(emmeans(model2, "Nitrogen"))
# or 
# here's another way to get Estimated Marginal Means for every level, while also doing a Tukey test
emmeans(model3log, pairwise ~ "Nitrogen", adjust="Tukey")
#I'm going to append a new column to graph data with the Tukey results in order
graphdata$tukey<-list("a","b","c","d")

# Nested ANOVA
# If we want to get the marginal means for Genotype, we need to make it a fixed effect first
model2<-lmer(Calyxarea~(Genotype) + (1|Ramet:Genotype), data=mydata)
graphdata<-as.data.frame(emmeans(model2, ~Genotype))
#keeping Ramet as a nested random effect will make sure the average of each ramet is used as a replicate

# Two-Way ANOVA
graphdata<-as.data.frame(emmeans(model1, ~Temperature*CO2))

# Repeated Measures ANOVA
graphdata<-as.data.frame(emmeans(model1, ~Temperature:day)) #ignoring the random factor

#I just want to see what these means look like independent of one another
emmeans(model1, ~disease)
emmeans(model1, ~parasite)

Plot emmeans with Tukey letters

library(ggplot2)

#GGplot plots the levels in alphabetical order. So instead I'm going to first 
#reorder my graph data to the order that I want in a new column called Temps.
graphdata$temperature<- factor(graphdata$temperature, levels = c("ambient", "warm", "hot"))

# One-Way ANOVA
ggplot(data=graphdata, aes(x=Nitrogen, y=emmean, fill=Nitrogen)) +
  theme_bw()+
  theme(axis.text.x=element_text(face="bold", color="black", size=16), axis.text.y=element_text(face="bold", color="black", size=13), axis.title.x = element_text(color="black", size=18, face="bold"), axis.title.y = element_text(color="black", size=18, face="bold"),panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
  geom_bar(colour="black", fill="DodgerBlue", width=0.5, stat="identity") + 
  guides(fill=FALSE) + 
  ylab("log Root:Shoot Ratio") +
  xlab("Soil Nitrogen") +
  scale_x_discrete(labels=c("Low" = "Low", "Ambient" = "Ambient", "1.5N" = "1.5 N", "2N" = "2 N")) + # new label = old label
  geom_errorbar(aes(ymax=emmean +SE, ymin=emmean - SE), stat="identity", position=position_dodge(width=0.9), width=0.1)+
  geom_text(aes(label=tukey), position = position_dodge(width=0.9), vjust = c(2,3.5,-2,-3.5))+ # add significance letters and adjust location on each bar
  # geom_text(aes(label=c("a","b","c","d"), position = position_dodge(width=0.9), vjust = c(2,3.5,-2,-3.5))+
  geom_hline(yintercept=0)

# Two-Way ANOVA
ggplot(data=graphdata, aes(x=Temperature, y=emmean, fill=factor(CO2), group=factor(CO2))) + 
  theme_bw() + 
  theme(axis.text.x=element_text(face="bold", color="black", size=16), axis.text.y=element_text(face="bold", color="black", size=13), axis.title.x = element_text(color="black", size=18, face="bold"), axis.title.y = element_text(color="black", size=18, face="bold"),panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
  geom_bar(colour="black", width=0.5, stat="identity", position="dodge", aes(fill=CO2)) + 
  labs(x="Temperature (*C)", y="Growth of Basal Area (mm2)", fill="CO2") + #labels the x and y axes
  geom_errorbar(aes(ymax=emmean+SE, ymin=emmean-SE), stat="identity", position=position_dodge(width=0.5), width=0.1)

# ANCOVA
#To plot the data:
predThick<-predict(model3) #Gets the predicted values from the regression lines in the ANCOVA
graphdata<-cbind(mydata, predThick) #attaches those predictions to the dataset

ggplot(data=graphdata, aes(lnThick, lnBreak, color=species)) +
  theme_bw()+
  theme(legend.title=element_text(colour="black", size=14), axis.text.x=element_text(face="bold", color="black", size=16), axis.text.y=element_text(face="bold", color="black", size=13), axis.title.x = element_text(color="black", size=18, face="bold"), axis.title.y = element_text(color="black", size=18, face="bold"),panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
  geom_point() + geom_line(aes(y=predThick)) +
  labs(x="ln Stipe Thickness", y="ln Break Force", fill="species")

Nested ANOVA

to get variance components, you must have everything as random factors
don't use AIC to tell you whether your factors are random or fixed
Ramets were random effects and nested within genotypes
Variance of residuals: % of our model not explained by our model. Due to variation within measurements for each Ramet
often the first thing people do when working with a nested effect is check the variance components (depends on what question you're trying to answer)
by putting ramets in as a nested effect in the model, we're telling the model and fucntions to take the mean of the ramet values so we don't pseudoreplicate (emmeans)

library(tidyverse)
library(car) # qqp() and Anova()
library(emmeans) # post-hoc test emmeans()
library(agricolae) # HSD.test()
library(lme4) # for testing random effects
library(lmerTest) #Need this to get anova results from lmer
library(MASS)

#Two Factor ANOVA: main effects of Location and Tidal Height
model1<-lm(Length~Location + TidalHeight, data=mydata)
plot(model1) # check assumptions
anova(model1)

#Two Factor ANOVA: main effects of Location and Tidal Height with their interaction
model2<-lm(Length~Location + TidalHeight + Location:TidalHeight, data=mydata)
#or a shortcut that includes both factors plus the interaction:
model2<-lm(Length~Location*TidalHeight, data=mydata)
plot(model2) # check assumptions
anova(model2)

#gather my summary data using emmeans
graphdata<-as.data.frame(emmeans(model2, ~ Location*TidalHeight))
graphdata

#graph
ggplot(graphdata, aes(x=Location, y=emmean, fill=factor(TidalHeight), group=factor(TidalHeight))) + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black"))+
  geom_bar(stat="identity", position="dodge", size=0.6) +
  geom_errorbar(aes(ymax=emmean+SE, ymin=emmean-SE), stat="identity", position=position_dodge(width=0.9), width=0.1) +
  labs(x="Location", y="Snail Length", fill="TidalHeight") +
  scale_fill_manual(values=c("Low"="tomato","High"="dodgerblue2")) #fill colors for the bars # fill factor identities

Plots and Graphs

Scatterplot

plot(cryduration~IQ, data=mydata)

Plot with Confidence intervals

library(ggplot2)
ggplot(mydata, aes(x=lnHeight, y=lnSA))+
  theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
  geom_point(shape=1) + 
  guides(fill=FALSE) + 
  ylab("ln Surface Area") +
  xlab("ln Height") +
  geom_smooth(method="lm")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Biometry Midterm 2

FilesExpand file tree

Midterm_2_Notes.md

Latest commit

History

Midterm_2_Notes.md

File metadata and controls

Biometry Midterm 2