Chapter 5 Use case 3: joint_model() with multiple traditional gear types

This third use case will show how to fit and interpret the joint model with paired eDNA and traditional survey data when multiple traditional gear types have been used. These different gear types may have different expected catch rates, μ, represented by gear-specific scaling coefficients q.



The data used in this example comes from a study by Keller et al. (2022) about invasive European green crab (Carcinus maenas) in Washington state. Environmental DNA samples were collected at 20 sites, along with paired baited trap sampling. The eDNA data is detection/non-detection data generated through quantitative polymerase chain reaction (qPCR).

5.1 Prepare the data

Similar to the goby data, the green crab data is still a list of matrices. Now, instead of data on site-level covariates, site_cov, there is data representing the gear type for each of the traditional samples, count_type.

library(eDNAjoint)
data(green_crab_data)
names(green_crab_data)
## [1] "pcr_n"      "pcr_k"      "count"      "count_type"

Again, all matrices should have the same number of rows (n=20), and rows across all four matrices should correspond to the same sites.

Let’s look at the count. This data is from baited trap sampling for green crab. Each integer refers to the catch of each trap (i.e., catch per unit effort, when effort = 1). The rows correspond to sites, and the columns refer to the replicated trap samples (secondary sample units) at each site, with a maximum of 420 samples.

dim(green_crab_data$count)
## [1]  20 420

Blank spaces are filled with NA at sites where fewer trap samples were collected than the maximum. In this example, the count data are integers, but continuous values can be used in the model (see Eq. 1.3 in the model description).

green_crab_data$count[1:6, 1:20]
##      1 2 3 4 5 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## [1,] 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [2,] 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [3,] 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [4,] 1 0 1 2 0 0  0  0  0  0  0  0  0  0  0  2  0  0  0  1
## [5,] 0 0 0 0 0 0  0  0 NA NA NA NA NA NA NA NA NA NA NA NA
## [6,] 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Next, let’s look at count_type, which consists of integer indicators of gear type for each trap sample. Here, 1 refers to the Fukui gear type, and 2 refers to the Minnow gear type.

green_crab_data$count_type[1:6, 1:20]
##      1 2 3 4 5 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## [1,] 1 1 1 1 2 1  2  2  2  2  2  2  1  1  1  1  2  1  1  2
## [2,] 2 2 2 2 2 2  2  2  2  1  1  1  1  1  1  1  2  2  2  2
## [3,] 1 1 1 2 1 2  1  1  1  2  1  2  2  1  2  2  1  2  1  2
## [4,] 2 2 2 2 2 2  2  2  1  2  2  2  2  2  2  2  2  2  2  2
## [5,] 1 2 1 2 1 2  1  2 NA NA NA NA NA NA NA NA NA NA NA NA
## [6,] 2 1 2 1 2 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Note that the locations of the NAs in this matrix match count.

For more data formatting guidance, see section 2.1.1.

5.2 Fit the model

Now that we understand our data, let’s fit the joint model. The key arguments of this function include:

  1. data: list of pcr_k, pcr_n, count, and count_type matrices
  2. cov: no site-level covariates are included in this model
  3. family: probability distribution used to model the trap count data. A negative binomial distribution is chosen here.
  4. p10_priors: Beta distribution parameters for the prior on the probability of false positive eDNA detection, p10. c(1, 20) is the default specification.
  5. q: logical value indicating the presence of multiple traditional gear types.

More parameters exist to further customize the MCMC sampling, but we’ll stick with the defaults.

# run the joint model with gear scaling coefficients
greencrab_fit_q_negbin <- joint_model(data = green_crab_data, cov = NULL,
                                      family = "negbin", p10_priors = c(1, 20),
                                      q = TRUE)

5.3 Model selection

We previously made a choice to model the green crab count data with a negative binomial distribution. Perhaps we want to test how that model specification compares to a model specification where count data is modeled with a poisson distribution.

# run the joint model with poisson distribution
greencrab_fit_q_pois <- joint_model(data = green_crab_data, cov = NULL,
                                    family = "poisson", p10_priors = c(1, 20),
                                    q = TRUE)

Let’s also fit some models where we assume that both gear types have the same catchability. We set q = FALSE to not estimate gear scaling coefficients.

# run the joint model with four covariates
greencrab_fit_negbin <- joint_model(data = green_crab_data, cov = NULL,
                                    family = "negbin", p10_priors = c(1, 20),
                                    q = FALSE)
greencrab_fit_pois <- joint_model(data = green_crab_data, cov = NULL,
                                  family = "poisson", p10_priors = c(1, 20),
                                  q = FALSE)

Now let’s perform model selection using leave-one-out cross validation.

# perform model selection
joint_select(
  model_fits = list(
    # include gear scaling coefficient, model count data with negative binomial
    greencrab_fit_q_negbin$model,
    # include gear scaling coefficient, model count data with poisson
    greencrab_fit_q_pois$model,
    # include gear scaling coefficient, model count data with negative binomial
    greencrab_fit_negbin$model,
    # include gear scaling coefficient, model count data with poisson
    greencrab_fit_pois$model
  )
)
##        elpd_diff se_diff
## model3    0.0       0.0 
## model1   -1.0       1.9 
## model4 -165.8      38.4 
## model2 -168.5      37.7

These results tell us that models one and three (models with and without gear scaling coefficients for the gear types) that use a negative binomial distribution for count data have similar Bayesian LOO estimates of the expected log pointwise predictive density (elpd_loo). Notably, a negative binomial distribution represents the data-generating process for our count data much better than a poisson distribution.

5.4 Interpret the output

5.4.1 Summarize posterior distributions

For the sake of illustration, let’s interpret the results of the model fit with gear scaling coefficients. Use joint_summarize() to see the posterior summaries of the model parameters.

joint_summarize(greencrab_fit_q_negbin$model, par = c("p10", "alpha", "q"))
##           mean se_mean    sd  2.5% 97.5%     n_eff Rhat
## p10      0.018   0.000 0.010 0.005 0.042 10531.838    1
## alpha[1] 1.262   0.003 0.244 0.790 1.739  8570.529    1
## q[1]     0.792   0.001 0.102 0.608 1.010  6997.257    1

This summarizes the mean, standard deviation (sd), and quantiles of the posterior estimates of p10, α, and q, as well as the effective sample size (n_eff) and R^ (Rhat) for the parameters. More informative about effective sample size and R^ can be found in the algorithm convergence section, but briefly, the R^ value is the frequently used statistic for assessing model convergence. We typically want R^ to be less than 1.05, so it looks like our model converged.

The mean estimated probability of a false positive eDNA detection is ~0.01, and the 2.5% and 97.5% quantiles show the bounds of the 95% credibility interval (the equal tailed credibility interval, to be specific.)

q[1] represents the gear scaling coefficient of gear type 2, which scales the catch rate of gear type 2 relative to gear type 1. α are regression coefficients that scale the sensitivity of eDNA relative to traditional sampling. Since there were no site-level covariates in this model, α is just a scalar, intercept term (see Eq. 4 in model description).

The parameter βi represents the site-specific sensitivity of eDNA relative to traditional sampling. We use i to index the sites. Since we don’t have any site-level covariates here, β is equal at all sites and equal to α1. For example, here are the first few βi:

head(joint_summarize(greencrab_fit_q_negbin$model, par = c("alpha", "beta")))
##           mean se_mean    sd 2.5% 97.5%    n_eff Rhat
## alpha[1] 1.262   0.003 0.244 0.79 1.739 8570.529    1
## beta[1]  1.262   0.003 0.244 0.79 1.739 8570.529    1
## beta[2]  1.262   0.003 0.244 0.79 1.739 8570.529    1
## beta[3]  1.262   0.003 0.244 0.79 1.739 8570.529    1
## beta[4]  1.262   0.003 0.244 0.79 1.739 8570.529    1
## beta[5]  1.262   0.003 0.244 0.79 1.739 8570.529    1

Now let’s look at the summary of mu.

joint_summarize(greencrab_fit_q_negbin$model, par = "mu")
##           mean se_mean    sd  2.5%  97.5%     n_eff Rhat
## mu[1,1]  0.106   0.000 0.027 0.060  0.165 10910.923    1
## mu[1,2]  0.084   0.000 0.022 0.046  0.132 11073.886    1
## mu[2,1]  0.032   0.000 0.033 0.001  0.123 12919.391    1
## mu[2,2]  0.025   0.000 0.026 0.001  0.096 13439.183    1
## mu[3,1]  0.017   0.000 0.018 0.000  0.065 13409.769    1
## mu[3,2]  0.013   0.000 0.014 0.000  0.051 13261.616    1
## mu[4,1]  0.679   0.001 0.107 0.492  0.910 10470.256    1
## mu[4,2]  0.534   0.001 0.089 0.379  0.729 11948.888    1
## mu[5,1]  0.101   0.001 0.114 0.002  0.413 12646.420    1
## mu[5,2]  0.079   0.001 0.089 0.002  0.324 12616.246    1
## mu[6,1]  0.117   0.001 0.127 0.003  0.470 12628.696    1
## mu[6,2]  0.092   0.001 0.100 0.002  0.369 12549.002    1
## mu[7,1]  0.013   0.000 0.013 0.000  0.048 13265.966    1
## mu[7,2]  0.010   0.000 0.010 0.000  0.037 13416.008    1
## mu[8,1]  0.302   0.003 0.278 0.013  1.027 11911.071    1
## mu[8,2]  0.238   0.002 0.219 0.010  0.810 12215.314    1
## mu[9,1]  0.034   0.000 0.033 0.001  0.121 13313.703    1
## mu[9,2]  0.026   0.000 0.026 0.001  0.095 13649.484    1
## mu[10,1] 1.049   0.002 0.247 0.646  1.615  9992.163    1
## mu[10,2] 0.824   0.002 0.192 0.510  1.258 12047.877    1
## mu[11,1] 0.301   0.002 0.281 0.012  1.040 12946.747    1
## mu[11,2] 0.237   0.002 0.224 0.010  0.823 13037.693    1
## mu[12,1] 0.021   0.000 0.022 0.000  0.078 12480.435    1
## mu[12,2] 0.017   0.000 0.017 0.000  0.062 11892.167    1
## mu[13,1] 7.685   0.013 1.314 5.496 10.632  9510.879    1
## mu[13,2] 6.031   0.008 0.991 4.372  8.224 13642.773    1
## mu[14,1] 0.119   0.000 0.020 0.083  0.162 10535.911    1
## mu[14,2] 0.094   0.000 0.016 0.066  0.127 12291.448    1
## mu[15,1] 0.769   0.005 0.559 0.112  2.196 11492.684    1
## mu[15,2] 0.605   0.004 0.440 0.087  1.729 11917.613    1
## mu[16,1] 3.807   0.007 0.652 2.712  5.253  8551.255    1
## mu[16,2] 2.984   0.004 0.463 2.174  4.008 14196.258    1
## mu[17,1] 0.161   0.002 0.177 0.004  0.634 12575.938    1
## mu[17,2] 0.127   0.001 0.140 0.003  0.510 13117.881    1
## mu[18,1] 3.333   0.011 1.093 1.759  5.940  9445.817    1
## mu[18,2] 2.617   0.008 0.848 1.389  4.681 10815.693    1
## mu[19,1] 3.952   0.007 0.729 2.768  5.639  9474.595    1
## mu[19,2] 3.106   0.005 0.581 2.164  4.423 12343.091    1
## mu[20,1] 0.119   0.001 0.069 0.027  0.289 14186.960    1
## mu[20,2] 0.093   0.000 0.054 0.021  0.227 14194.034    1

mu[1,1] corresponds to the expected catch rate at site 1 with gear type 1. mu[1,2] corresponds to the expected catch rate at site 1 with gear type 2. mu[2,1] corresponds to the expected catch rate at site 2 with gear type 1.

We can also use functions from the bayesplot package to examine the posterior distributions and chain convergence.

First let’s look at the posterior distribution for p10.

library(bayesplot)

# plot posterior distribution, highlighting median and 80% credibility interval
mcmc_areas(as.matrix(greencrab_fit_q_negbin$model), pars = "p10", prob = 0.8)

Next let’s look at chain convergence for p10 and α1.

# this will plot the MCMC chains for p10 and alpha
mcmc_trace(rstan::extract(greencrab_fit_q_negbin$model, permuted = FALSE),
           pars = c("p10", "alpha[1]"))

These trace plots show that the algorithm has converged. The chains are overlapping and stationary (i.e., are moving around the same mean and have a constant variance). See more about trace plots in the algorithm convergence section.

5.4.2 Effort necessary to detect presence

To further highlight these results, we can use detection_calculate() to find the units of survey effort necessary to detect presence of the species. This function is finding the median number of survey units necessary to detect species presence if the expected catch rate, μ is 0.1, 0.5, or 1. μ now represents the expected catch rate of gear type 1.

detection_calculate(greencrab_fit_q_negbin$model, mu = c(0.1, 0.5, 1),
                    probability = 0.9)
##       mu n_traditional_1 n_traditional_2 n_eDNA
## [1,] 0.1              25              31     28
## [2,] 0.5               6               8      6
## [3,] 1.0               4               5      4

We can see that it takes 27 eDNA samples, 25 trap samples (gear type 1), and 31 trap samples (gear type 2) to detect green crab presence with 0.9 probability if the expected catch rate with gear type 1 is 0.1.

We can also plot these comparisons. mu_min and mu_max define the x-axis in the plot and represent the expected catch rate of gear type 1.

detection_plot(greencrab_fit_q_negbin$model, mu_min = 0.1,
               mu_max = 1, probability = 0.9)

5.4.3 Calculate μcritical

Now let’s calculate μcritical, which is the value of μ where the probability of a false positive eDNA detection equals the probability of a true positive eDNA detection.

mu_critical(greencrab_fit_q_negbin$model, cov_val = NULL, ci = 0.9)
##               gear_1     gear_2
## median   0.059355501 0.04701782
## lower_ci 0.009451541 0.00781090
## upper_ci 0.133214323 0.10589779

This function calculates μcritical using the entire posterior distributions of parameters from the model, and ‘HDI’ corresponds to the 90% credibility interval calculated using the highest density interval. The first column corresponds to μcritical if gear type 1 is used, and the second columns corresponds to μcritical if gear type 2 is used.

5.5 traditional_model()

In some circumstances, it may be helpful to model just the traditional survey data without eDNA data for comparison. Use traditional_model here, which requires the following parameters:

  1. data: list of count and (optionally) count_type matrices
  2. family: probability distribution used to model the trap count data. A negative binomial distribution is chosen here.
  3. q: logical value indicating the presence of multiple traditional gear types.

More parameters exist to further customize the MCMC sampling, but we’ll stick with the defaults.

# run the traditional model model with gear scaling coefficients
greencrab_traditional <- traditional_model(data = green_crab_data,
                                           family = "negbin", q = TRUE)

5.6 Initial values

By default, eDNAjoint will provide initial values for parameters estimated by the model, but you can provide your own initial values if you prefer. Here is an example of providing initial values for parameters, mu,p10, alpha, and q, as an input in joint_model().

# set number of chains
n_chain <- 4

# number of sites
nsites <- dim(green_crab_data$count)[1]

# number of gear types
ngear <- sum(!is.na(unique(as.vector(green_crab_data$count_type))))

# initial values should be a list of named lists
inits <- list()
for (i in 1:n_chain) {
  inits[[i]] <- list(
    # length should equal the number of sites for each chain
    mu = stats::runif(nsites, 0.01, 5),
    # length should equal 1 for each chain
    p10 = stats::runif(1, 0.0001, 0.08),
    # length should equal 1 for each chain,
    # since there are no site-level covariates
    alpha = stats::runif(1, 0.05, 0.2),
    # length should equal the number of gear types - 1
    q = stats::runif(ngear - 1, 0.5, 1.5)
  )
}
# now fit the model
fit_inits <- joint_model(data = green_crab_data,
                         q = TRUE, initial_values = inits,
                         multicore = TRUE)
# check to see the initial values that were used
fit_inits$inits
## $chain1
## $chain1$mu_trad
##  [1] 3.5993016 0.3711284 0.4749727 4.8282951 1.3102138 4.4676456 2.0706991 3.6319775 2.5466618 2.8652517
## [11] 3.6070533 2.5641322 3.0316232 4.0130035 0.4624270 4.6359910 1.7468696 1.4669216 2.4635462 1.3310425
## 
## $chain1$mu
##  [1] 3.5993016 0.3711284 0.4749727 4.8282951 1.3102138 4.4676456 2.0706991 3.6319775 2.5466618 2.8652517
## [11] 3.6070533 2.5641322 3.0316232 4.0130035 0.4624270 4.6359910 1.7468696 1.4669216 2.4635462 1.3310425
## 
## $chain1$log_p10
## [1] -3.255889
## 
## $chain1$alpha
## [1] 0.0636451
## 
## $chain1$q_trans
##   initial_values[[i]]$q
## 1              1.250539
## 
## $chain1$p_dna
## numeric(0)
## 
## $chain1$p11_dna
## numeric(0)
## 
## 
## $chain2
## $chain2$mu_trad
##  [1] 4.0339780 0.6914057 0.6578297 4.0141040 2.3883363 0.1383832 2.6047312 4.3586066 4.9878596 1.8331313
## [11] 4.9427130 4.5567498 1.3418741 4.0673328 3.3246091 3.9121833 2.4374282 3.0095397 3.6541699 1.2886826
## 
## $chain2$mu
##  [1] 4.0339780 0.6914057 0.6578297 4.0141040 2.3883363 0.1383832 2.6047312 4.3586066 4.9878596 1.8331313
## [11] 4.9427130 4.5567498 1.3418741 4.0673328 3.3246091 3.9121833 2.4374282 3.0095397 3.6541699 1.2886826
## 
## $chain2$log_p10
## [1] -2.585617
## 
## $chain2$alpha
## [1] 0.09681332
## 
## $chain2$q_trans
##   initial_values[[i]]$q
## 1             0.5415609
## 
## $chain2$p_dna
## numeric(0)
## 
## $chain2$p11_dna
## numeric(0)
## 
## 
## $chain3
## $chain3$mu_trad
##  [1] 0.9664933 3.1962245 0.2110647 4.1554488 3.6885941 4.4177556 0.7893869 2.7534623 0.8480531 4.9144696
## [11] 1.3467712 3.5011805 2.3130947 4.5969188 0.7569342 4.8850712 4.5684928 3.8771573 1.1270949 0.3006237
## 
## $chain3$mu
##  [1] 0.9664933 3.1962245 0.2110647 4.1554488 3.6885941 4.4177556 0.7893869 2.7534623 0.8480531 4.9144696
## [11] 1.3467712 3.5011805 2.3130947 4.5969188 0.7569342 4.8850712 4.5684928 3.8771573 1.1270949 0.3006237
## 
## $chain3$log_p10
## [1] -6.967624
## 
## $chain3$alpha
## [1] 0.1819627
## 
## $chain3$q_trans
##   initial_values[[i]]$q
## 1             0.8937312
## 
## $chain3$p_dna
## numeric(0)
## 
## $chain3$p11_dna
## numeric(0)
## 
## 
## $chain4
## $chain4$mu_trad
##  [1] 1.9939702 4.9083254 2.8738762 0.1094674 4.8678392 2.6997925 4.7045889 3.4571059 3.4573439 3.0630853
## [11] 4.1054891 3.7769878 4.4161367 2.6822065 3.1673480 3.5565983 3.0899655 3.7062565 0.6801803 2.7016387
## 
## $chain4$mu
##  [1] 1.9939702 4.9083254 2.8738762 0.1094674 4.8678392 2.6997925 4.7045889 3.4571059 3.4573439 3.0630853
## [11] 4.1054891 3.7769878 4.4161367 2.6822065 3.1673480 3.5565983 3.0899655 3.7062565 0.6801803 2.7016387
## 
## $chain4$log_p10
## [1] -2.97214
## 
## $chain4$alpha
## [1] 0.0695414
## 
## $chain4$q_trans
##   initial_values[[i]]$q
## 1             0.6966426
## 
## $chain4$p_dna
## numeric(0)
## 
## $chain4$p11_dna
## numeric(0)