The next steps in the Ler dispersal analysis are to confirm statistical support for among-rep heterogeneity and see if the kernel parameters are correlated with the fraction dispersing.
Housekeeping
Added the functions for the truncated distributions and to fit the models to lib/helpers.R.
Kernel heterogeneity
The total AIC for the rep-specific fits was 58230.28 and 58366.17 for the normal and lognormal models, respectively.
I think that if I pass the whole data set to the analysis function it will combine all the data. Let’s look at this:
sum(disperseLer$Seedlings[disperseLer$Distance > 4])
[1] 11928
data_loc <- subset(disperseLer, Distance > 4)
data_vec <- rep(data_loc$Distance, data_loc$Seedlings)
length(data_vec)
[1] 11928
That’s a match!
fit_dispersal_models(disperseLer)
ID AICnorm AIClnorm mu sd mulog sdlog se_mu
mean 73_0 59772.17 59799.07 -0.7149118 7.407026 1.905282 0.50168 0.5657672
se_sd se_mulog se_sdlog
mean 0.1837072 0.006796452 0.005105066
Ignore the ID on the graphical and text output.
What we see are good fits overall. Again, the normal is somewhat better than the lognormal (\(\Delta_{\text{AIC}} =\) 26.9). But the rep-specific fits have AIC’s that are more than 1400 AIC units lower!
Correlations with fraction dispersing
Regenerate the datsets, without the plots (may want to put this into a munge script):
all_Ler_fits <- ddply(disperseLer, "ID", fit_dispersal_models, plot.it = FALSE)
Warning in sqrt(diag(varcovar)): NaNs produced
Warning in sqrt(1/diag(V)): NaNs produced
Warning in cov2cor(varcovar): diag(.) had 0 or NA entries; non-finite
result is doubtful
Ler_dispersal_stats <- calc_dispersal_stats(disperseLer)
Ler_dispersal_stats <- merge(Ler_dispersal_stats, all_Ler_fits, by = "ID")
head(Ler_dispersal_stats)
ID Density Siliques Total_seeds Home_seeds Dispersing_seeds
1 100_0 1 126 131 73 58
2 101_0 1 177 336 194 142
3 104_0 390 NA 1384 628 756
4 105_0 1227 NA 1180 762 418
5 106_0 107 NA 389 257 132
6 107_0 199 NA 946 773 173
Dispersal_fraction Mean_dispersal_all RMS_dispersal_all Mean_dispersal
1 0.4427481 2.580153 4.034205 5.827586
2 0.4226190 2.875000 4.663690 6.802817
3 0.5462428 4.593931 7.099011 8.410053
4 0.3542373 2.363559 4.197760 6.672249
5 0.3393316 2.611825 5.267156 7.696970
6 0.1828753 1.286469 3.335535 7.034682
RMS_dispersal AICnorm AIClnorm mu sd mulog
1 6.062889 210.1962 209.4645 4.098267 2.598637 1.662753
2 7.173896 602.7757 592.9470 5.101474 3.277311 1.839458
3 9.605168 3917.2841 3922.7279 -66.020580 19.643467 1.781924
4 7.052944 1749.4258 1745.8809 4.371632 3.528681 1.788268
5 9.041990 646.6945 643.1798 -207.715199 30.384323 1.307769
6 7.799881 785.4472 789.4764 -69.438135 16.731328 1.359045
sdlog se_mu se_sd se_mulog se_sdlog
1 0.3158626 1.330679703 0.634838130 0.06392084 0.04807686
2 0.3314258 0.810572636 0.441435119 0.03321615 0.02656491
3 0.6310585 NaN NaN 0.04842916 0.03103006
4 0.3682102 0.648281310 0.312409282 0.02473200 0.01922509
5 0.7884596 0.007112144 0.004535006 0.30807962 0.13543948
6 0.6932002 21.032531014 2.279795670 0.21971090 0.10060063
Now look at some plots
qplot(Dispersal_fraction, mulog, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, sdlog, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, mu, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, sd, data = Ler_dispersal_stats)
There is a clear pattern with mulog, but nothing else. This is another reason to favor the lognormal distribution, as it indicates a mechanistic interconnection between the two components of the dispersal kernel. Here’s the actual relationship:
cor(Ler_dispersal_stats$mulog, Ler_dispersal_stats$Dispersal_fraction)
[1] 0.5604027
summary(lm(mulog ~ Dispersal_fraction, data = Ler_dispersal_stats))
Call:
lm(formula = mulog ~ Dispersal_fraction, data = Ler_dispersal_stats)
Residuals:
Min 1Q Median 3Q Max
-0.46413 -0.07753 0.00646 0.09439 0.49780
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.54775 0.07454 20.76 < 2e-16 ***
Dispersal_fraction 0.66056 0.16271 4.06 0.000253 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1928 on 36 degrees of freedom
Multiple R-squared: 0.3141, Adjusted R-squared: 0.295
F-statistic: 16.48 on 1 and 36 DF, p-value: 0.0002531
Implications for modeling
A simple-minded approach would be to draw a random dispersal fraction, and then use the regression (including the residuals) to get a value for mulog. But really, we should be doing draws from a multivariate distribution. The challenge, then, is taking into account the (presumably) beta distribution for the dispersal fraction. We can use Morris & Doak’s approach of simulating multivariate normals and then matching quantiles to get the beta distribution; since the latter is pretty symmetric (looks like the mean is around 0.4) it should recover the correlation coefficient pretty closely.
A bigger conceptual issue is that the points in the scatterplots above are each measured with error, in both directions (binomial sampling for the dispersal fraction, MLE standard errors in mulog). This is another reason to avoid OLS regression; but does it introduce bias into the estimate of the correlation? There may be a solution to this at https://www.rasch.org/rmt/rmt101g.htm
Third, is sdlog independent of mulog (we have already seen that it’s independent of dispersal fraction)? What about its relationship to the major and minor axes of the mulog-dispersal fraction distribution?
Finally, when we are working with the populations, do we assume that all pots in a replicate at a particular generation experience the same dispersal kernel? Or that each pot gets an independent sample of the kernel? If most of the spread is driven by seeds coming from the front pot, then it won’t matter much.
Disattenuated correlation coefficient
Actually, I’ve convinced myself that I don’t need to do this. We will be using the covariance rather than the correlation, and some calculations reveal that the covariance is not biased by measurement error. The disattenuation calculation is to correct for the inflated variances caused by measurement error that are used in calculating the correlation from the covariance.
Of course, I will need to get those corrected variances for population the VC matrix to use in simulation. The way to do this is to subtract the mean squared standard error from the raw variance estimates.
I’ve modified the kernel fitting routine to return the standard errors of the parameter estimates. So now for the corrections:
var(Ler_dispersal_stats$mulog)
[1] 0.05273925
var(Ler_dispersal_stats$mulog) - mean((Ler_dispersal_stats$se_mulog)^2)
[1] 0.03825842
Ler_dispersal_stats <- mutate(Ler_dispersal_stats,
binom_var = Dispersal_fraction * (1 - Dispersal_fraction) /
Total_seeds)
var(Ler_dispersal_stats$Dispersal_fraction)
[1] 0.03795841
var(Ler_dispersal_stats$Dispersal_fraction) - mean(Ler_dispersal_stats$binom_var)
[1] 0.03740287
The first correction is substantial, the second one minor.
Patterns of sdlog
qplot(mulog, sdlog, data = Ler_dispersal_stats)
disp_pca <- princomp(~ mulog + Dispersal_fraction, data = Ler_dispersal_stats)
screeplot(disp_pca)
biplot(disp_pca)
qplot(disp_pca$scores[,1], Ler_dispersal_stats$sdlog)
qplot(disp_pca$scores[,2], Ler_dispersal_stats$sdlog)
summary(lm(Ler_dispersal_stats$sdlog ~ disp_pca$scores[,2]))
Call:
lm(formula = Ler_dispersal_stats$sdlog ~ disp_pca$scores[, 2])
Residuals:
Min 1Q Median 3Q Max
-0.21719 -0.07239 -0.01247 0.06634 0.26884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.45126 0.01783 25.309 <2e-16 ***
disp_pca$scores[, 2] -0.27682 0.13037 -2.123 0.0407 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1099 on 36 degrees of freedom
Multiple R-squared: 0.1113, Adjusted R-squared: 0.08661
F-statistic: 4.508 on 1 and 36 DF, p-value: 0.04067
cov(subset(Ler_dispersal_stats, select = c(mulog, sdlog, Dispersal_fraction)))
mulog sdlog Dispersal_fraction
mulog 0.052739247 -0.0086961783 0.0250738618
sdlog -0.008696178 0.0132258602 0.0001367011
Dispersal_fraction 0.025073862 0.0001367011 0.0379584103
cor(subset(Ler_dispersal_stats, select = c(mulog, sdlog, Dispersal_fraction)))
mulog sdlog Dispersal_fraction
mulog 1.0000000 -0.329267986 0.560402749
sdlog -0.3292680 1.000000000 0.006101068
Dispersal_fraction 0.5604027 0.006101068 1.000000000
It looks like there is heteroskedasity (sdlog is larger for high values of PC1, which is low values of mulog), and a weak negative association with PC2.