Chapter 6 Use case 4: `jointModel()` with semi-paired data

This final use case shows how to fit and interpret the joint model with semi-paired eDNA and traditional survey data.

In many cases, you may have some sites where only eDNA samples were collected, while other sites have paired eDNA and traditional samples (i.e., semi-paired data). We can fit the joint model to this semi-paired data and use the sites with paired data to make predictions about the sites with only un-paired eDNA data.

Let’s use the same tidewater goby data from uses cases 1 and 2, except let’s pretend we did not have traditional seine sampling at two of the 39 sites.

6.1 Simulate semi-paired data

Both eDNA and traditional survey data should have a hierarchical structure:

Sites (primary sample units) within a study area
eDNA and traditional samples (secondary sample units) collected from each site
eDNA subsamples (replicate observations) taken from each eDNA sample

First replace the traditional count data at sites 4 and 34 with NA to simulate a scenario where we did not have traditional samples at these sites.

library(eDNAjoint)
data(gobyData)

# create new dataset of semi-paired goby data
gobyData_semipaired <- gobyData
gobyData_semipaired$count[4,] <- NA
gobyData_semipaired$count[34,] <- NA

We can now see that we have no data at site 4:

gobyData_semipaired$count[4,]

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
## NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

And site 34:

gobyData_semipaired$count[34,]

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
## NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

It’s important to ensure that we replace the data with NA so that we have an empty row indicating missing data.

6.2 Prepare the data

Now let’s look at the format of the data. You can see that the data is a list of four matrices all with the same number of rows that represent each site (n=39). Across all matrices, rows in the data should correspond to the same sites (i.e., row one in all matrices corresponds to site one, and row 31 in all matrices corresponds to site 31).

qPCR.K, qPCR.N, and count are required for all implementations of jointModel(), and site.cov is optional and is used in use case 2.

Let’s look first at qPCR.K. These are the total number of positive qPCR detections for each site (row) and eDNA sample (column). The number of columns should equal the maximum number of eDNA samples collected at any site. Blank spaces are filled with NA at sites where fewer eDNA samples were collected than the maximum.

For example, at site one, 11 eDNA samples were collected, and at site five, 20 eDNA samples were collected.

head(gobyData_semipaired$qPCR.K)

##      1 2 3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
## [1,] 0 0 0  0  0  0  0  0  0  0  0 NA NA NA NA NA NA NA NA NA NA NA
## [2,] 0 0 0  0  0  0  0  0  0  0  0 NA NA NA NA NA NA NA NA NA NA NA
## [3,] 0 0 0  0  0  0  0  0  0  0  0 NA NA NA NA NA NA NA NA NA NA NA
## [4,] 0 6 6  4  6  5  4  6  5  3 NA NA NA NA NA NA NA NA NA NA NA NA
## [5,] 6 6 4  6  6  6  5  4  2  2  0  6  5  5  6  6  6  5  5  4 NA NA
## [6,] 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Now let’s look at qPCR.N. These are the total number of qPCR eDNA subsamples (replicate observations) collected for each site (row) and eDNA sample (column). In this data, six qPCR replicate observations were collected for each eDNA sample. Notice that the locations of the NAs in the matrix match qPCR.K.

head(gobyData_semipaired$qPCR.N)

##      1 2 3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
## [1,] 6 6 6  6  6  6  6  6  6  6  6 NA NA NA NA NA NA NA NA NA NA NA
## [2,] 6 6 6  6  6  6  6  6  6  6  6 NA NA NA NA NA NA NA NA NA NA NA
## [3,] 6 6 6  6  6  6  6  6  6  6  6 NA NA NA NA NA NA NA NA NA NA NA
## [4,] 6 6 6  6  6  6  6  6  6  6 NA NA NA NA NA NA NA NA NA NA NA NA
## [5,] 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6 NA NA
## [6,] 6 6 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Next, let’s look at count. This data is from seine sampling for tidewater gobies. Each integer refers to the catch of each seine sample (i.e., catch per unit effort, when effort = 1). Again, the rows correspond to sites, and the columns refer to replicated seine samples (secondary sample units) at each site, with a maximum of 22 samples. Blank spaces are filled with NA at sites where fewer seine samples were collected than the maximum, as well as at sites with no seine samples. Notice that the number of rows (i.e., number of sites) in the count data still equals the number of rows in the qPCR data, even though some of these sites have no count data.

dim(gobyData_semipaired$count)[1] == dim(gobyData_semipaired$qPCR.K)[1]

## [1] TRUE

For more data formatting guidance, see section 2.1.1.

6.3 Fit the model

Now that we understand our data, let’s fit the joint model. The key arguments of this function include:

data: list of qPCR.K, qPCR.N, and count matrices
family: probability distribution used to model the seine count data. A poisson distribution is chosen here.
p10priors: Beta distribution parameters for the prior on the probability of false positive eDNA detection, $p_{10}$. c(1,20) is the default specification. More on this later.
q: logical value indicating the presence of multiple traditional gear types. Here, we’re only using data from one traditional method.

More parameters exist to further customize the MCMC sampling, but we’ll stick with the defaults.

# run the joint model 
goby.fit.semi <- jointModel(data = gobyData_semipaired, family = 'poisson', 
                            p10priors = c(1,20), q=FALSE)

goby.fit.semi is a list containing:

model fit (goby.fit.semi$model) of the class ‘stanfit’ and can be accessed and interpreted using all functions in the rstan package.
initial values used for each chain in MCMC (goby.fit.semi$inits)

6.4 Interpret the output

6.4.1 Summarize posterior distributions

Let’s interpret goby.fit.semi. Use jointSummarize() to see the posterior summaries of the model parameters. Let’s look at the estimates of the expected catch rate at each site, $\mu$.

jointSummarize(goby.fit.semi$model, par = 'mu')

##           mean se_mean    sd    2.5%   97.5%    n_eff Rhat
## mu[1]    0.021   0.000 0.021   0.001   0.078 14169.92    1
## mu[2]    0.021   0.000 0.021   0.001   0.079 14839.58    1
## mu[3]    0.021   0.000 0.022   0.000   0.080 16158.98    1
## mu[4]    5.435   0.014 1.767   2.894   9.612 16029.53    1
## mu[5]    2.431   0.002 0.295   1.897   3.057 16191.08    1
## mu[6]    0.081   0.001 0.085   0.002   0.307 13810.17    1
## mu[7]    0.339   0.001 0.105   0.165   0.579 19116.71    1
## mu[8]    0.276   0.002 0.305   0.007   1.109 15092.18    1
## mu[9]    0.048   0.000 0.048   0.001   0.177 16823.79    1
## mu[10]   1.516   0.002 0.207   1.141   1.947 17108.83    1
## mu[11]  26.957   0.008 1.085  24.872  29.111 20751.52    1
## mu[12]   0.021   0.000 0.021   0.001   0.077 16743.22    1
## mu[13]   0.021   0.000 0.021   0.000   0.079 15696.63    1
## mu[14]   0.033   0.000 0.033   0.001   0.121 15917.98    1
## mu[15]   0.102   0.000 0.062   0.019   0.254 16244.43    1
## mu[16]   0.066   0.000 0.049   0.006   0.189 16123.91    1
## mu[17]   0.026   0.000 0.026   0.001   0.097 15807.08    1
## mu[18]   0.019   0.000 0.019   0.000   0.072 17119.49    1
## mu[19]   0.048   0.000 0.049   0.001   0.183 14610.46    1
## mu[20]   0.224   0.001 0.081   0.095   0.410 20860.72    1
## mu[21]   0.081   0.001 0.083   0.002   0.306 15697.69    1
## mu[22]   0.254   0.001 0.114   0.087   0.528 18800.63    1
## mu[23]  90.565   0.021 3.117  84.648  96.754 22667.21    1
## mu[24]   1.052   0.002 0.212   0.685   1.510 18046.54    1
## mu[25]  21.039   0.010 1.629  17.918  24.362 24528.63    1
## mu[26]   1.287   0.002 0.218   0.910   1.755 19215.62    1
## mu[27]   2.719   0.003 0.456   1.908   3.685 19551.71    1
## mu[28]   8.151   0.005 0.757   6.749   9.713 20893.12    1
## mu[29]   3.227   0.003 0.462   2.396   4.208 18148.44    1
## mu[30]  13.687   0.014 2.134   9.874  18.181 24813.12    1
## mu[31]  94.518   0.029 4.347  86.192 103.218 21759.18    1
## mu[32]   0.023   0.000 0.023   0.001   0.085 15312.38    1
## mu[33]   0.082   0.001 0.086   0.002   0.316 14540.62    1
## mu[34]   0.261   0.001 0.143   0.067   0.609 18715.51    1
## mu[35]   0.127   0.001 0.137   0.003   0.503 15735.29    1
## mu[36]   0.021   0.000 0.021   0.001   0.078 14314.00    1
## mu[37]   0.023   0.000 0.023   0.001   0.085 16350.40    1
## mu[38] 168.973   0.039 5.809 157.732 180.547 21881.17    1
## mu[39]   0.023   0.000 0.023   0.001   0.088 16771.04    1

This summarizes the mean, sd, and quantiles of the posterior estimates of $\mu$, as well as the effective sample size (n_eff) and Rhat for the parameters.

The model estimates $\beta$ using data from sites with paired samples and uses this estimate to make predictions for $\mu_{i=4}$ and $\mu_{i=34}$ where no traditional seine samples were collected.

We can also use functions from the bayesplot package to examine the posterior distributions.

First let’s look at the posterior distributions for $\mu_{i=4}$ and $\mu_{i=29}$. Traditional seine samples were collected at site 29 but not at site 4.

library(bayesplot)
# plot posterior distribution, highlighting median and 80% credibility interval
mcmc_areas(as.matrix(goby.fit.semi$model), 
           pars = c('mu[4]', 'mu[29]'), prob = 0.8)

As you could expect, the credibility interval for the expected catch rate at site four is much wider than the credibility interval at site 29, since no traditional samples were collected at site four.

Let’s do the same for $\mu_{i=7}$ and $\mu_{i=34}$. Traditional seine samples were collected at site 7 but not at site 34.

# plot posterior distribution, highlighting median and 80% credibility interval
mcmc_areas(as.matrix(goby.fit.semi$model), 
           pars = c('mu[7]', 'mu[34]'), prob = 0.8)

Again, the credibility interval for the expected catch rate at site 34 is wider with a longer tail than the credibility interval at site 7, since no traditional samples were collected at site 34.

Note: It’s important to consider that fitting the joint model with semi-paired data will be more successful if there is paired data at most sites. Since the data at paired sites is leveraged to make predictions at the un-paired sites, you want to maximize the amount of data to leverage.

6.5 Initial values

By default, eDNAjoint will provide initial values for parameters estimated by the model, but you can provide your own initial values if you prefer. Here is an example of providing initial values for parameters, mu,p10, beta, and q, as an input in jointModel().

# set number of chains
n.chain <- 4

# initial values should be a list of named lists
inits <- list()
for(i in 1:n.chain){
  inits[[i]] <- list(
    # length should equal the number of sites (dim(gobyData_semipaired$count)[1]) for each chain
    mu = stats::runif(dim(gobyData_semipaired$count)[1], 0.01, 5), 
    # length should equal 1 for each chain 
    p10 = stats::runif(1,0.0001,0.08),
    # length should equal 1 for each chain 
    beta = stats::runif(1,0.05,0.2)
    )
}

# now fit the model
fit.w.inits <- jointModel(data = gobyData_semipaired, initial_values = inits)

## 
## SAMPLING FOR MODEL 'joint_binary_pois' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000513 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.13 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 1: Iteration:  500 / 3000 [ 16%]  (Warmup)
## Chain 1: Iteration:  501 / 3000 [ 16%]  (Sampling)
## Chain 1: Iteration: 1000 / 3000 [ 33%]  (Sampling)
## Chain 1: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 1: Iteration: 2000 / 3000 [ 66%]  (Sampling)
## Chain 1: Iteration: 2500 / 3000 [ 83%]  (Sampling)
## Chain 1: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 2.146 seconds (Warm-up)
## Chain 1:                3.909 seconds (Sampling)
## Chain 1:                6.055 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'joint_binary_pois' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000117 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.17 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 2: Iteration:  500 / 3000 [ 16%]  (Warmup)
## Chain 2: Iteration:  501 / 3000 [ 16%]  (Sampling)
## Chain 2: Iteration: 1000 / 3000 [ 33%]  (Sampling)
## Chain 2: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 2: Iteration: 2000 / 3000 [ 66%]  (Sampling)
## Chain 2: Iteration: 2500 / 3000 [ 83%]  (Sampling)
## Chain 2: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 2.155 seconds (Warm-up)
## Chain 2:                5.147 seconds (Sampling)
## Chain 2:                7.302 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'joint_binary_pois' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000273 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.73 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 3: Iteration:  500 / 3000 [ 16%]  (Warmup)
## Chain 3: Iteration:  501 / 3000 [ 16%]  (Sampling)
## Chain 3: Iteration: 1000 / 3000 [ 33%]  (Sampling)
## Chain 3: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 3: Iteration: 2000 / 3000 [ 66%]  (Sampling)
## Chain 3: Iteration: 2500 / 3000 [ 83%]  (Sampling)
## Chain 3: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 2.593 seconds (Warm-up)
## Chain 3:                4.993 seconds (Sampling)
## Chain 3:                7.586 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'joint_binary_pois' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0.00015 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.5 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 4: Iteration:  500 / 3000 [ 16%]  (Warmup)
## Chain 4: Iteration:  501 / 3000 [ 16%]  (Sampling)
## Chain 4: Iteration: 1000 / 3000 [ 33%]  (Sampling)
## Chain 4: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 4: Iteration: 2000 / 3000 [ 66%]  (Sampling)
## Chain 4: Iteration: 2500 / 3000 [ 83%]  (Sampling)
## Chain 4: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 2.195 seconds (Warm-up)
## Chain 4:                4.07 seconds (Sampling)
## Chain 4:                6.265 seconds (Total)
## Chain 4: 
## Refer to the eDNAjoint guide for visualization tips:  https://bookdown.org/abigailkeller/eDNAjoint_vignette/tips.html#visualization-tips

# check to see the initial values that were used
fit.w.inits$inits

## $chain1
## $chain1$mu_trad
##  [1] 3.80569741 3.58333259 4.02588139 0.88034608 2.39102144 2.85321301 3.19553752
##  [8] 4.11689159 4.21357816 0.58820852 0.81208180 4.59726978 2.76241101 1.09808337
## [15] 1.80897540 0.08415946 0.22912673 3.88901987 1.83084809 4.09662924 3.70403968
## [22] 2.75512944 0.41795349 3.65420388 0.79756336 1.68574393 1.42799124 0.68656417
## [29] 4.21855390 1.29522717 3.86613597 1.70403476 3.93140205 3.74998951 4.91692884
## [36] 2.44362165 1.58280683
## 
## $chain1$mu
##  [1] 3.80569741 3.58333259 4.02588139 1.50470175 0.88034608 2.39102144 2.85321301
##  [8] 3.19553752 4.11689159 4.21357816 0.58820852 0.81208180 4.59726978 2.76241101
## [15] 1.09808337 1.80897540 0.08415946 0.22912673 3.88901987 1.83084809 4.09662924
## [22] 3.70403968 2.75512944 0.41795349 3.65420388 0.79756336 1.68574393 1.42799124
## [29] 0.68656417 4.21855390 1.29522717 3.86613597 1.70403476 2.94467392 3.93140205
## [36] 3.74998951 4.91692884 2.44362165 1.58280683
## 
## $chain1$log_p10
## [1] -3.343339
## 
## $chain1$beta
## [1] 0.1872188
## 
## 
## $chain2
## $chain2$mu_trad
##  [1] 1.5229404 2.5261706 4.5393737 2.5046819 1.7516272 3.2172138 3.2646758 1.6111284
##  [9] 0.1889255 4.0137314 1.8750728 1.7111294 3.2156054 0.1472553 2.9552092 3.8507409
## [17] 1.9921167 2.2295859 0.8147131 0.7711415 2.5138951 1.6955384 3.4065907 0.1856529
## [25] 1.3677171 3.8655028 2.6103950 2.6938047 4.7740216 4.0100165 4.3669264 3.7612010
## [33] 4.0589469 0.8913079 2.0014217 3.5477226 4.4054730
## 
## $chain2$mu
##  [1] 1.5229404 2.5261706 4.5393737 1.9139701 2.5046819 1.7516272 3.2172138 3.2646758
##  [9] 1.6111284 0.1889255 4.0137314 1.8750728 1.7111294 3.2156054 0.1472553 2.9552092
## [17] 3.8507409 1.9921167 2.2295859 0.8147131 0.7711415 2.5138951 1.6955384 3.4065907
## [25] 0.1856529 1.3677171 3.8655028 2.6103950 2.6938047 4.7740216 4.0100165 4.3669264
## [33] 3.7612010 3.0359823 4.0589469 0.8913079 2.0014217 3.5477226 4.4054730
## 
## $chain2$log_p10
## [1] -2.626563
## 
## $chain2$beta
## [1] 0.1768646
## 
## 
## $chain3
## $chain3$mu_trad
##  [1] 3.04875278 1.54196188 1.79788661 1.42357885 3.08428369 3.48288864 1.05738581
##  [8] 1.19936314 2.79433858 2.13807436 1.44278498 0.91723819 2.76409730 2.62652365
## [15] 0.08165406 0.26244407 4.80832827 2.71507669 1.22793599 0.43198037 3.17015670
## [22] 2.56976175 1.90110157 4.78092426 1.04669546 2.24532204 1.10847388 1.59673621
## [29] 4.16267811 0.74007828 2.81778891 3.42864284 4.37678259 0.18425663 3.45051921
## [36] 0.93785434 1.27161283
## 
## $chain3$mu
##  [1] 3.04875278 1.54196188 1.79788661 2.59615310 1.42357885 3.08428369 3.48288864
##  [8] 1.05738581 1.19936314 2.79433858 2.13807436 1.44278498 0.91723819 2.76409730
## [15] 2.62652365 0.08165406 0.26244407 4.80832827 2.71507669 1.22793599 0.43198037
## [22] 3.17015670 2.56976175 1.90110157 4.78092426 1.04669546 2.24532204 1.10847388
## [29] 1.59673621 4.16267811 0.74007828 2.81778891 3.42864284 3.60057392 4.37678259
## [36] 0.18425663 3.45051921 0.93785434 1.27161283
## 
## $chain3$log_p10
## [1] -2.600622
## 
## $chain3$beta
## [1] 0.1923878
## 
## 
## $chain4
## $chain4$mu_trad
##  [1] 2.46700214 4.65893475 1.12654137 3.31605048 0.05295857 2.56674085 2.23751851
##  [8] 4.77927916 2.89327762 0.26356331 2.23214813 2.47137927 3.76330791 1.74409390
## [15] 1.26386410 3.25034435 0.71165630 4.76585013 0.90427947 1.10725542 4.81844148
## [22] 4.18821661 2.00013974 3.53148543 3.80450293 1.13242752 2.74157841 1.48478310
## [29] 4.69588359 2.76480247 3.84184013 4.53443982 3.45898723 2.94703450 1.54455403
## [36] 1.56655813 0.27269710
## 
## $chain4$mu
##  [1] 2.46700214 4.65893475 1.12654137 1.08209782 3.31605048 0.05295857 2.56674085
##  [8] 2.23751851 4.77927916 2.89327762 0.26356331 2.23214813 2.47137927 3.76330791
## [15] 1.74409390 1.26386410 3.25034435 0.71165630 4.76585013 0.90427947 1.10725542
## [22] 4.81844148 4.18821661 2.00013974 3.53148543 3.80450293 1.13242752 2.74157841
## [29] 1.48478310 4.69588359 2.76480247 3.84184013 4.53443982 1.34631350 3.45898723
## [36] 2.94703450 1.54455403 1.56655813 0.27269710
## 
## $chain4$log_p10
## [1] -3.167847
## 
## $chain4$beta
## [1] 0.179758

Chapter 6 Use case 4: jointModel() with semi-paired data