Last updated: 2022-01-24
Checks: 7 0
Knit directory: genes-to-foodweb-stability/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200205)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 18d1722. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: code/.Rhistory
Ignored: output/.Rapp.history
Untracked files:
Untracked: output/all.mar1.brm.adj.rds
Untracked: output/all.mar1.brm.unadj.ar2.lag.rds
Untracked: output/all.mar1.brm.unadj.noBRBRonLYER.rds
Untracked: output/all.mar1.brm.unadj.rds
Untracked: output/all.mar1.brm.unadj.xAOP2.rds
Untracked: output/initial.mar1.brm.adj.rds
Untracked: output/initial.mar1.brm.unadj.rds
Unstaged changes:
Modified: figures/AOP2-growth-no-insects.pdf
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/prep-timeseries.Rmd
) and HTML (docs/prep-timeseries.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 18d1722 | mabarbour | 2022-01-24 | Add mark-recapture and omnibus tests, fix data point error, update analyses, and edit readme |
html | c802852 | mabarbour | 2021-06-24 | Build website |
Rmd | a054471 | mabarbour | 2021-06-24 | Update location of .rds files to Release v2.1 |
Rmd | f467085 | mabarbour | 2021-06-24 | Revise analysis to address reviewer’s comments |
original_df <- read_csv("data/InsectAbundanceSurvival.csv")
We are only using data from the last week of the experiment, for which we have estimates of LYER and Ptoids (mummies actually) underneath the leaf
# only retain cages where LYER or Parasitoids survived
# no surviving BRBR, so we can't estimate this
LYER_df_17 <- filter(original_df, Week == "17") %>% filter(LYER_Survival == 1) %>% dplyr::select(Cage, LYER)
Mummy_df_17 <- filter(original_df, Week == "17") %>% filter(Mummy_Ptoids_Survival == 1) %>% dplyr::select(Cage, Mummy_raw)
# underneath leaf counts, only available for last week of experiment
underneath_df <- read_csv("data/insect_abundance_data_2018-09-26_underneathleaf.csv") # counted to individual resolution
# merge with original data
LYER_df_17 <- left_join(LYER_df_17, dplyr::select(underneath_df, Cage, underneath_LYER = LYER))
Mummy_df_17 <- left_join(Mummy_df_17, dplyr::select(underneath_df, Cage, underneath_Mummy = Mummy))
## exploratory plots ----
# all data
x11(); ggplot(LYER_df_17, aes(x = log1p(LYER), y = log(underneath_LYER))) +
geom_point() + geom_smooth(method = "lm")
# remove outlier
x11(); ggplot(LYER_df_17[-31,], aes(x = log1p(LYER), y = log(underneath_LYER))) +
geom_point() + geom_smooth(method = "lm")
## GLM ----
# I dropped the outlier because I truly think it was an unusual data point and would bias our other estimates.
plot(MASS::glm.nb(underneath_LYER ~ log1p(LYER), LYER_df_17), which = 5)
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
summary(glm(underneath_LYER ~ log1p(LYER), LYER_df_17[-31,], family = poisson)) # residual deviance suggests a lot of overdispersion, so I'm going to use a negative binomial model
Call:
glm(formula = underneath_LYER ~ log1p(LYER), family = poisson,
data = LYER_df_17[-31, ])
Deviance Residuals:
Min 1Q Median 3Q Max
-6.9703 -3.1734 -0.9052 2.8304 7.9681
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.47991 0.05716 60.88 <2e-16 ***
log1p(LYER) 0.29613 0.01141 25.95 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1316.37 on 30 degrees of freedom
Residual deviance: 481.88 on 29 degrees of freedom
AIC: 683.74
Number of Fisher Scoring iterations: 5
LYER_underneath_glm.nb <- MASS::glm.nb(underneath_LYER ~ log1p(LYER), LYER_df_17[-31,])
summary(LYER_underneath_glm.nb)
Call:
MASS::glm.nb(formula = underneath_LYER ~ log1p(LYER), data = LYER_df_17[-31,
], init.theta = 5.709483328, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2762 -0.7531 -0.2120 0.6421 1.9575
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.41254 0.18541 18.405 < 2e-16 ***
log1p(LYER) 0.31152 0.04166 7.477 7.59e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(5.7095) family taken to be 1)
Null deviance: 80.035 on 30 degrees of freedom
Residual deviance: 32.974 on 29 degrees of freedom
AIC: 327.12
Number of Fisher Scoring iterations: 1
Theta: 5.71
Std. Err.: 1.57
2 x log-likelihood: -321.123
LYER_correction_glm.nb <- function(x) round(predict(LYER_underneath_glm.nb, newdata = data.frame(LYER = x), type = "response"), 0)
# will add between 30 and 286 individual LYER to LYER_t
LYER_correction_glm.nb(c(0,max(original_df$LYER, na.rm = T)))
1 2
30 286
## exploratory plot ----
x11(); ggplot(Mummy_df_17, aes(x = log1p(Mummy_raw), y = log(underneath_Mummy))) +
geom_point() + geom_smooth(method = "lm")
Warning: Removed 1 rows containing non-finite values (stat_smooth).
# the predicted increase is based on 1 data point
summary(Mummy_outlier_glm.nb <- MASS::glm.nb(underneath_Mummy ~ log1p(Mummy_raw), Mummy_df_17))
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
Call:
MASS::glm.nb(formula = underneath_Mummy ~ log1p(Mummy_raw), data = Mummy_df_17,
init.theta = 1.465286968, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9833 -1.0999 -0.3254 0.4295 1.1074
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4214 0.3633 3.913 9.13e-05 ***
log1p(Mummy_raw) 1.5949 0.3934 4.054 5.03e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.4653) family taken to be 1)
Null deviance: 37.1182 on 7 degrees of freedom
Residual deviance: 9.2703 on 6 degrees of freedom
AIC: 53.379
Number of Fisher Scoring iterations: 1
Theta: 1.47
Std. Err.: 1.08
2 x log-likelihood: -47.379
# hmmm, this model predicts that there could be as many as 43,291 mummies underneath the leaves, which is insane
round(predict(Mummy_outlier_glm.nb, newdata = data.frame(Mummy_raw = c(0, max(original_df$Mummy_raw, na.rm = T))), type = "response"),0)
1 2
4 55286
# I think the baseline value, for which we have 5 data points is reasonable
# but for the increase, I'm going to use the slope of LYER, which seems reasonable given that this was the primary
# consumer-resource relationship and LYER was often found underneath the leaves compared to BRBR
summary(Mummy_underneath_glm.nb <- MASS::glm.nb(underneath_Mummy ~ 1, filter(Mummy_df_17, Mummy_raw == 0)))
Call:
MASS::glm.nb(formula = underneath_Mummy ~ 1, data = filter(Mummy_df_17,
Mummy_raw == 0), init.theta = 1.094073437, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8510 -0.9981 -0.5861 0.5828 0.9838
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4214 0.4063 3.499 0.000468 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.0941) family taken to be 1)
Null deviance: 7.731 on 6 degrees of freedom
Residual deviance: 7.731 on 6 degrees of freedom
AIC: 39.452
Number of Fisher Scoring iterations: 1
Theta: 1.094
Std. Err.: 0.781
2 x log-likelihood: -35.452
round(exp(coef(Mummy_underneath_glm.nb)["(Intercept)"]),0) # predicts 5 underneath leaves if there was a zero above rosette count
(Intercept)
4
Mummy_correction_glm.nb <- function(x) round(exp(coef(Mummy_underneath_glm.nb)["(Intercept)"] + coef(LYER_underneath_glm.nb)["log1p(LYER)"]*log1p(x)),0)
Mummy_correction_glm.nb(c(0,max(original_df$Mummy_raw, na.rm = T))) # now that's more reasonable
[1] 4 26
Below, I estimate measurement error for BRBR, LYER, and Ptoid counts. This uses data for which we had overlapping counts for MAB (first author) and DTV (technician that was instrumental to the study).
# identify cage-week combinations with repeated measurements
rep_df <- original_df %>%
distinct(Cage, Week, Counter) %>%
group_by(Cage, Week) %>%
summarise(n = n()) %>%
filter(n > 1)
# make BRBR measurement error (me) df
BRBR_me_df <- left_join(rep_df, dplyr::select(original_df, Cage, Week, Counter, BRBR, BRBR_Survival)) %>%
drop_na() %>%
filter(BRBR_Survival == 1) %>%
spread(key = Counter, value = BRBR) %>%
mutate(log1p_DTV = log1p(DTV),
log1p_MAB = log1p(MAB)) %>%
ungroup()
BRBR_me <- apply(select(BRBR_me_df, log1p_DTV, log1p_MAB), MARGIN = 1, FUN = sd)
hist(BRBR_me)
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
median(BRBR_me, na.rm = T)
[1] 0.2829706
# make LYER measurement error (me) df
LYER_me_df <- left_join(rep_df, dplyr::select(original_df, Cage, Week, Counter, LYER, LYER_Survival)) %>%
drop_na() %>%
filter(LYER_Survival == 1) %>%
spread(key = Counter, value = LYER) %>%
mutate(log1p_DTV = log1p(DTV),
log1p_MAB = log1p(MAB),
abs.diff_log1p = log1p_DTV - log1p_MAB,
mean.diff_log1p = log1p_DTV - (log1p_DTV+log1p_MAB)/2) %>%
ungroup()
LYER_me <- apply(select(LYER_me_df, log1p_DTV, log1p_MAB), MARGIN = 1, FUN = sd)
hist(LYER_me)
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
median(LYER_me, na.rm = T)
[1] 0.2433557
# make Mummy_Ptoids measurement error (me) df
Mummy_Ptoids_me_df <- left_join(rep_df, dplyr::select(original_df, Cage, Week, Counter, Mummy_Ptoids, Mummy_Ptoids_Survival)) %>%
drop_na() %>%
filter(Mummy_Ptoids_Survival == 1) %>%
spread(key = Counter, value = Mummy_Ptoids) %>%
mutate(log1p_DTV = log1p(DTV),
log1p_MAB = log1p(MAB)) %>%
ungroup()
Mummy_Ptoids_me <- apply(select(Mummy_Ptoids_me_df, log1p_DTV, log1p_MAB), MARGIN = 1, FUN = sd)
hist(Mummy_Ptoids_me)
Version | Author | Date |
---|---|---|
c802852 | mabarbour | 2021-06-24 |
median(Mummy_Ptoids_me, na.rm = T)
[1] 0.1323786
# useful function
mean_integer <- function(x) as.integer(mean(x, na.rm=T)) # remove NA, otherwise, the count will be set to zero if only 1 counter counted.
# make timeseries dataset
timeseries_df <- original_df %>%
select(Cage, Week, temp = Temperature, rich = Richness, com = Composition, Col, gsm1, AOP2, AOP2.gsoh, BRBR_Survival, LYER_Survival, Mummy_Ptoids_Survival, BRBR_t = BRBR, LYER_t = LYER, Ptoid_t = Mummy_Ptoids, Mummy_t_raw = Mummy_raw) %>%
# applying adjustments before averaging across counters
mutate(LYER_t_adj = ifelse(LYER_Survival == 0 | is.na(LYER_Survival) == T, 1, # replace extinction zeros with value of 1
ifelse(LYER_Survival == 1, LYER_t + LYER_correction_glm.nb(LYER_t), LYER_t)), # add adjusted values
# Ptoid should always be zero for Week 1 and 2, as they were added after sampling aphids in week 2. But we set them to 1 for extrapolating predictions
Ptoid_t_adj = ifelse(Mummy_Ptoids_Survival == 0 | is.na(Mummy_Ptoids_Survival) == T, 1, # replace extinction zeros with value of 1
ifelse(Mummy_Ptoids_Survival == 1, Ptoid_t + Mummy_correction_glm.nb(Mummy_t_raw), Ptoid_t))) %>% # add adjusted values
# apply adjustments for log-log analysis, using minimum count resolutions
# note that we NEVER used extinct species in fitting the multivariate model, only for extrapolating predictions from multivariate model to extant species
mutate(BRBR_t = ifelse(BRBR_Survival == 0 | is.na(BRBR_Survival) == T, 1, # replace extinction zeros with value of 1
ifelse(BRBR_t == 0 & BRBR_Survival == 1 & Week < 2, 1, # replaced undetected zeros before week 2 with value of 1
ifelse(BRBR_t == 0 & BRBR_Survival == 1 & Week > 1, 5, BRBR_t))), # replace undetected zeros with value of 5
LYER_t = ifelse(LYER_Survival == 0 | is.na(LYER_Survival) == T, 1, # replace extinction zeros with value of 1
ifelse(LYER_t == 0 & LYER_Survival == 1 & Week < 2, 1, # replaced undetected zeros before week 2 with value of 1
ifelse(LYER_t == 0 & LYER_Survival == 1 & Week > 1, 5, LYER_t))), # replaced undetected zeros after week 1 with value of 5
Ptoid_t = ifelse(Mummy_Ptoids_Survival == 0 | is.na(Mummy_Ptoids_Survival) == T, 1, # replace extinction zeros with value of 1
ifelse(Ptoid_t == 0 & Mummy_Ptoids_Survival == 1, 1, Ptoid_t))) %>% # replaced undetected zeros with value of 1
group_by(Cage, Week, temp, rich, com, Col, gsm1, AOP2, AOP2.gsoh) %>%
# average across counters
summarise_all(~ mean_integer(.)) %>% # funs(mean_integer)
ungroup()
# initial additions of aphids (4 individuals)
init_timeseries_df <- timeseries_df %>%
distinct(Cage, temp, rich, com, Col, gsm1, AOP2, AOP2.gsoh) %>%
# set Ptoid values to 1 for extrapolating predictions
mutate(Week = 0, BRBR_Survival = 1, LYER_Survival = 1, Mummy_Ptoids_Survival = NA,
BRBR_t = 4, LYER_t = 4, Ptoid_t = 1, Mummy_t_raw = NA, # set Mummy_t to be consistent with its raw classification in the data
LYER_t_adj = 4, Ptoid_t_adj = 1)
init_timeseries_df
# A tibble: 60 × 18
Cage temp rich com Col gsm1 AOP2 AOP2.gsoh Week BRBR_Survival
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 20 C 2 AOP2_AOP2.… 0 0 1 1 0 1
2 2 20 C 4 Poly 1 1 1 1 0 1
3 3 20 C 2 gsm1_AOP2.… 0 1 0 1 0 1
4 4 20 C 1 AOP2 0 0 1 0 0 1
5 5 20 C 1 gsm1 0 1 0 0 0 1
6 6 20 C 2 Col_AOP2 1 0 1 0 0 1
7 7 20 C 4 Poly 1 1 1 1 0 1
8 8 20 C 2 Col_AOP2.g… 1 0 0 1 0 1
9 9 20 C 2 Col_gsm1 1 1 0 0 0 1
10 10 20 C 2 gsm1_AOP2.… 0 1 0 1 0 1
# … with 50 more rows, and 8 more variables: LYER_Survival <dbl>,
# Mummy_Ptoids_Survival <lgl>, BRBR_t <dbl>, LYER_t <dbl>, Ptoid_t <dbl>,
# Mummy_t_raw <lgl>, LYER_t_adj <dbl>, Ptoid_t_adj <dbl>
# create lagged (t-1) and leading variable at time t+1
timeseries_df <- bind_rows(init_timeseries_df, timeseries_df) %>%
arrange(Cage, Week) %>%
group_by(Cage) %>%
# leading
mutate(BRBR_Survival_t1 = lead(BRBR_Survival, n = 1, order_by = Week),
BRBR_t1 = lead(BRBR_t, n = 1, order_by = Week),
LYER_Survival_t1 = lead(LYER_Survival, n = 1, order_by = Week),
LYER_t1 = lead(LYER_t, n = 1, order_by = Week),
LYER_t1_adj = lead(LYER_t_adj, n = 1, order_by = Week),
Mummy_Ptoids_Survival_t1 = lead(Mummy_Ptoids_Survival, n = 1, order_by = Week),
Ptoid_t1 = lead(Ptoid_t, n = 1, order_by = Week),
Ptoid_t1_adj = lead(Ptoid_t_adj, n = 1, order_by = Week)) %>%
# lagged. set default = 1 to include in extrapolations for log analysis
mutate(BRBR_t0 = lag(BRBR_t, n = 1, order_by = Week, default = 1),
LYER_t0 = lag(LYER_t, n = 1, order_by = Week, default = 1),
#LYER_t0_adj = lag(LYER_t_adj, n = 1, order_by = Week),
Ptoid_t0 = lag(Ptoid_t, n = 1, order_by = Week, default = 1))#,
#Ptoid_t0_adj = lag(Ptoid_t_adj, n = 1, order_by = Week))
# add two parasitoid females to experiment after counts on week 2 and 3
timeseries_df$Ptoid_t[which(timeseries_df$Week == 2)] <- 2
timeseries_df$Ptoid_t[which(timeseries_df$Week == 3)] <- timeseries_df$Ptoid_t[which(timeseries_df$Week == 3)] + 2
# repeat for adjusted
timeseries_df$Ptoid_t_adj[which(timeseries_df$Week == 2)] <- 2
timeseries_df$Ptoid_t_adj[which(timeseries_df$Week == 3)] <- timeseries_df$Ptoid_t_adj[which(timeseries_df$Week == 3)] + 2
# and for lagged values
timeseries_df$Ptoid_t0[which(timeseries_df$Week == 2)] <- 1 # actual zero, but setting to 1 for log() analysis later.
timeseries_df$Ptoid_t0[which(timeseries_df$Week == 3)] <- 2 # 2 parasitoids added in previous week
timeseries_df$Ptoid_t0[which(timeseries_df$Week == 4)] <- timeseries_df$Ptoid_t0[which(timeseries_df$Week == 4)] + 2 # adding 2 parasitoids to count from previous week (often zero)
# and for lagged adjusted values
#timeseries_df$Ptoid_t0_adj[which(timeseries_df$Week == 2)] <- 0 # maintaing at zero, because it is, will adjust by adding 1 for log() analysis later
#timeseries_df$Ptoid_t0_adj[which(timeseries_df$Week == 3)] <- 2 # 2 parasitoids added in previous week
#timeseries_df$Ptoid_t0_adj[which(timeseries_df$Week == 4)] <- timeseries_df$Ptoid_t0_adj[which(timeseries_df$Week == 4)] + 2 # adding 2 parasitoids to count from previous week (often zero)
# before we were ignoring the initial increase of parasitoids from week 2 to week 3, since they are at zeros in the dataset, although I added 2 individuals after week 2 and 3 counts. I added the lines above to incorporate this info
# note that Ptoid_t1 will not exactly match Ptoid_t, but that is intentional since those data are derived from counts
# adjust Mummy_Ptoid_Survival so it doesn't start at Week 3, but Week 2
timeseries_df$Mummy_Ptoids_Survival[which(timeseries_df$Week == 2)] <- 1
# all NA for prediction next week at week 17, which makes sense because that's when the experiment ended so we are going to remove this week
filter(timeseries_df, Week == 17) %>%
select(BRBR_t1, LYER_t1, LYER_t1_adj, Ptoid_t1, Ptoid_t1_adj) %>%
data.frame()
Cage BRBR_t1 LYER_t1 LYER_t1_adj Ptoid_t1 Ptoid_t1_adj
1 1 NA NA NA NA NA
2 2 NA NA NA NA NA
3 3 NA NA NA NA NA
4 4 NA NA NA NA NA
5 5 NA NA NA NA NA
6 6 NA NA NA NA NA
7 7 NA NA NA NA NA
8 8 NA NA NA NA NA
9 9 NA NA NA NA NA
10 10 NA NA NA NA NA
11 11 NA NA NA NA NA
12 12 NA NA NA NA NA
13 13 NA NA NA NA NA
14 14 NA NA NA NA NA
15 15 NA NA NA NA NA
16 16 NA NA NA NA NA
17 17 NA NA NA NA NA
18 18 NA NA NA NA NA
19 19 NA NA NA NA NA
20 20 NA NA NA NA NA
21 21 NA NA NA NA NA
22 22 NA NA NA NA NA
23 23 NA NA NA NA NA
24 24 NA NA NA NA NA
25 25 NA NA NA NA NA
26 26 NA NA NA NA NA
27 27 NA NA NA NA NA
28 28 NA NA NA NA NA
29 29 NA NA NA NA NA
30 30 NA NA NA NA NA
31 31 NA NA NA NA NA
32 32 NA NA NA NA NA
33 33 NA NA NA NA NA
34 34 NA NA NA NA NA
35 35 NA NA NA NA NA
36 36 NA NA NA NA NA
37 37 NA NA NA NA NA
38 38 NA NA NA NA NA
39 39 NA NA NA NA NA
40 40 NA NA NA NA NA
41 41 NA NA NA NA NA
42 42 NA NA NA NA NA
43 43 NA NA NA NA NA
44 44 NA NA NA NA NA
45 45 NA NA NA NA NA
46 46 NA NA NA NA NA
47 47 NA NA NA NA NA
48 48 NA NA NA NA NA
49 49 NA NA NA NA NA
50 50 NA NA NA NA NA
51 51 NA NA NA NA NA
52 52 NA NA NA NA NA
53 53 NA NA NA NA NA
54 54 NA NA NA NA NA
55 55 NA NA NA NA NA
56 56 NA NA NA NA NA
57 57 NA NA NA NA NA
58 58 NA NA NA NA NA
59 59 NA NA NA NA NA
60 60 NA NA NA NA NA
timeseries_df <- filter(timeseries_df, Week != 17)
# 17 weeks of data still because we include initial Week = 0
dim(timeseries_df)[1] / length(unique(timeseries_df$Cage))
[1] 17
# add new predictors
timeseries_df <- timeseries_df %>%
mutate(temp = ifelse(temp == "20 C", 0, 3), # now on scale of 1 deg C
aop2_vs_AOP2 = Col + gsm1 - AOP2 - AOP2.gsoh,
aop2_genotypes = Col + gsm1,
AOP2_genotypes = AOP2 + AOP2.gsoh)
# load and manage plant biomass time-series data
tmp.biomass <- read_csv("data/ExperimentPlantBiomass.csv") %>%
select(Cage, Week, No_Plants, Biomass_g) # No_Plants not necessary because we're using Biomass_g as a covariate rather than a response
# note that there is no biomass data on first week, because plants were left in for 2 weeks
# we are going to interpolate this biomass based on the plants growth at
# week 2 of the experiment
# get biomass at t+1
get_biomass_t1 <- tmp.biomass %>%
mutate(Week = Week - 1) %>%
select(Cage, Week, Biomass_g_t1 = Biomass_g)
# merge biomass at time t and t+1
merge.biomass.df <- left_join(get_biomass_t1, tmp.biomass) %>%
# assuming a linear relationship in plant growth between weeks, since
# they had not hit the saturation point on their growth curve
mutate(Biomass_g = ifelse(Week == 1, Biomass_g_t1 - Biomass_g_t1/6, Biomass_g),
No_Plants = ifelse(Week == 1, 8, No_Plants)) %>%
arrange(Cage, Week)
# infer biomass data at week 0
biomass.0.df <- merge.biomass.df %>%
mutate(Week = Week - 1) %>%
select(Cage, Week, Biomass_g_t1 = Biomass_g) %>%
filter(Week == 0) %>%
mutate(Biomass_g = Biomass_g_t1 - Biomass_g_t1/5,
No_Plants = 8)
# merge week 0 and rest of biomass data
biomass.df <- bind_rows(filter(merge.biomass.df, Week > 0), biomass.0.df) %>%
arrange(Cage, Week)
timeseries_df <- left_join(timeseries_df, biomass.df) %>% # add plant biomass to timeseries data
# turned logged predictors into named variables for measurement error model
mutate(logBRBR_t0 = log(BRBR_t0),
logBRBR_t = log(BRBR_t),
logLYER_t0 = log(LYER_t0),
logLYER_t = log(LYER_t),
logLYER_t_adj = log(LYER_t_adj),
logPtoid_t0 = log(Ptoid_t0),
logPtoid_t = log(Ptoid_t),
logPtoid_t_adj = log(Ptoid_t_adj)) %>%
# add measurement error. if alive, used median observed measurement error, if not, set to small value (0.01) to allow measurement error model to extrapolate
mutate(se_logBRBRt = ifelse(BRBR_Survival == 0 | is.na(BRBR_Survival) == T, 0.01,
ifelse(BRBR_Survival == 1, median(BRBR_me, na.rm = T), NA)),
se_logLYERt = ifelse(LYER_Survival == 0 | is.na(LYER_Survival) == T, 0.01,
ifelse(LYER_Survival == 1, median(LYER_me, na.rm = T), NA)),
se_logPtoidt = ifelse(Mummy_Ptoids_Survival == 0 | is.na(Mummy_Ptoids_Survival) == T, 0.01,
ifelse(Mummy_Ptoids_Survival == 1, median(Mummy_Ptoids_me, na.rm = T), NA)))
# filter first two weeks of experiment
aphids_only_df <- filter(timeseries_df, Week %in% c(0,1))
I only retain data for which all species had positive abundances at the previous time step, and after all species have been added.
# filter data where all species survived to time t
full_df <- filter(timeseries_df, BRBR_Survival == 1, LYER_Survival == 1, Mummy_Ptoids_Survival == 1)
# tabulate variation in timeseries length for each cage
with(full_df, table(Week, Cage))
Cage
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0
7 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0
8 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cage
Week 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1
5 1 1 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 0 1
6 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1
7 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cage
Week 54 55 56 57 58 59 60
2 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1
4 0 1 1 1 1 1 1
5 0 1 1 1 0 0 0
6 0 1 0 0 0 0 0
7 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0
# filter data where BRBR is absent, but LYER and Ptoid survived to time t
LP_df <- filter(timeseries_df, BRBR_Survival == 0 | is.na(BRBR_Survival) == TRUE, LYER_Survival == 1, Mummy_Ptoids_Survival == 1)
# tabulate variation in timeseries length for each cage
with(LP_df, table(Week, Cage))
Cage
Week 4 5 6 7 8 9 12 13 14 15 17 18 20 22 23 25 26 27 28 29 30 31 32 33 34 35 36
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
6 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 1 1 1
7 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1
8 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 1
9 0 1 1 0 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1
10 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1
11 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1
12 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1
13 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1
14 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1
15 0 1 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cage
Week 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
4 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
5 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 1 1 1
6 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1
7 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 1 0
8 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0 1 1 0
9 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 1 1 0
10 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 1 1 0
11 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0
12 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0
13 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0
14 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0
15 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0
16 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0
Includes cage-time combinations where only the aphid Lipaphis erysimi persisted.
# filter data so only LYER persists
L_df <- filter(timeseries_df, BRBR_Survival == 0 | is.na(BRBR_Survival) == TRUE, LYER_Survival == 1, Mummy_Ptoids_Survival == 0 | is.na(Mummy_Ptoids_Survival) == TRUE)
# tabulate variation in timeseries length for each cage
with(L_df, table(Week, Cage))
Cage
Week 4 5 7 9 14 17 20 22 23 30 31 32 33 34 35 38 39 40 41 43 46 48 49 52 59
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
8 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
9 1 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0
10 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0
11 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0
12 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 1 1 1 1 0 1 1
13 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 1 0 1 1 1 1 1 1 1
14 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1
15 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
16 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Includes cage-time combinations where both aphids went extinct and the parasitoid Diaeretiella rapae survived for 1 to 2 more sampling periods.
# filter data where only ptoid was transiently present in absence of other species
P_df <- filter(timeseries_df, BRBR_Survival == 0 | is.na(BRBR_Survival) == TRUE, LYER_Survival == 0 | is.na(LYER_Survival) == TRUE, Mummy_Ptoids_Survival == 1)
# tabulate variation in timeseries length for each cage
with(P_df, table(Week, Cage))
Cage
Week 1 2 3 10 11 15 16 18 19 21 24 25 26 27 28 44 51 54 55 56 57 60
7 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1
8 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 0 1 0
9 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0
10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
16 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# all looks good, no NA or zero values
summary(select(timeseries_df, BRBR_t0, BRBR_t, BRBR_t1))
Cage BRBR_t0 BRBR_t BRBR_t1
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.:15.75 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 1.00
Median :30.50 Median : 1.00 Median : 1.00 Median : 1.00
Mean :30.50 Mean : 43.74 Mean : 43.74 Mean : 43.56
3rd Qu.:45.25 3rd Qu.: 21.00 3rd Qu.: 21.00 3rd Qu.: 21.00
Max. :60.00 Max. :835.00 Max. :835.00 Max. :835.00
summary(select(timeseries_df, LYER_t0, LYER_t, LYER_t1, LYER_t_adj, LYER_t1_adj))
Cage LYER_t0 LYER_t LYER_t1
Min. : 1.00 Min. : 1.0 Min. : 1.0 Min. : 1.0
1st Qu.:15.75 1st Qu.: 1.0 1st Qu.: 4.0 1st Qu.: 3.0
Median :30.50 Median : 10.0 Median : 12.5 Median : 17.5
Mean :30.50 Mean : 124.5 Mean : 126.6 Mean : 131.0
3rd Qu.:45.25 3rd Qu.: 155.0 3rd Qu.: 155.0 3rd Qu.: 170.0
Max. :60.00 Max. :1150.0 Max. :1150.0 Max. :1150.0
LYER_t_adj LYER_t1_adj
Min. : 1.0 Min. : 1
1st Qu.: 4.0 1st Qu.: 30
Median : 80.0 Median : 92
Mean : 211.9 Mean : 220
3rd Qu.: 301.0 3rd Qu.: 321
Max. :1423.0 Max. :1423
summary(select(timeseries_df, Ptoid_t0, Ptoid_t, Ptoid_t1, Ptoid_t_adj, Ptoid_t1_adj))
Cage Ptoid_t0 Ptoid_t Ptoid_t1
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.:15.75 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 1.00
Median :30.50 Median : 1.00 Median : 1.00 Median : 1.00
Mean :30.50 Mean : 15.34 Mean : 15.34 Mean : 15.18
3rd Qu.:45.25 3rd Qu.: 11.00 3rd Qu.: 11.00 3rd Qu.: 11.25
Max. :60.00 Max. :423.00 Max. :423.00 Max. :423.00
Ptoid_t_adj Ptoid_t1_adj
Min. : 1.00 Min. : 1.00
1st Qu.: 1.00 1st Qu.: 1.00
Median : 4.00 Median : 4.00
Mean : 19.25 Mean : 19.12
3rd Qu.: 18.25 3rd Qu.: 18.25
Max. :449.00 Max. :449.00
# saved on Sep 20, 2021. Only slight modifications to LP_df, which don't affect any of the structural stability analyses, but only Bayesian R2 and Figs. S6-S7 (and barely at that)
# save(aphids_only_df, full_df, LP_df, L_df, P_df, file = "output/time-series-data.RData")
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.37 cowplot_1.1.1 forcats_0.5.1 stringr_1.4.0
[5] dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4
[9] tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 MASS_7.3-54
[13] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] httr_1.4.2 sass_0.4.0 bit64_4.0.5 vroom_1.5.7
[5] jsonlite_1.7.2 splines_4.1.2 modelr_0.1.8 bslib_0.3.1
[9] assertthat_0.2.1 highr_0.9 cellranger_1.1.0 yaml_2.2.1
[13] pillar_1.6.4 backports_1.4.1 lattice_0.20-45 glue_1.6.0
[17] digest_0.6.29 promises_1.2.0.1 rvest_1.0.2 colorspace_2.0-2
[21] htmltools_0.5.2 httpuv_1.6.5 Matrix_1.4-0 pkgconfig_2.0.3
[25] broom_0.7.11 haven_2.4.3 scales_1.1.1 whisker_0.4
[29] later_1.3.0 tzdb_0.2.0 git2r_0.28.0 mgcv_1.8-38
[33] farver_2.1.0 generics_0.1.1 ellipsis_0.3.2 withr_2.4.3
[37] cli_3.1.0 magrittr_2.0.1 crayon_1.4.2 readxl_1.3.1
[41] evaluate_0.14 fs_1.5.2 fansi_1.0.0 nlme_3.1-152
[45] xml2_1.3.3 tools_4.1.2 hms_1.1.1 lifecycle_1.0.1
[49] munsell_0.5.0 reprex_2.0.1 compiler_4.1.2 jquerylib_0.1.4
[53] rlang_0.4.12 grid_4.1.2 rstudioapi_0.13 labeling_0.4.2
[57] rmarkdown_2.11 gtable_0.3.0 DBI_1.1.2 R6_2.5.1
[61] lubridate_1.8.0 fastmap_1.1.0 bit_4.0.4 utf8_1.2.2
[65] rprojroot_2.0.2 stringi_1.7.3 parallel_4.1.2 Rcpp_1.0.7
[69] vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1 xfun_0.29