I’ve sent a bug report to the maintainer of fitdistrplus. Here’s the MRE I provided:
# Generate some data
mydata <- rnorm(20, 0, 10)
mydata_cens <- data.frame(left = floor(mydata), right = ceiling(mydata))
# Confirm that they can be fit by "norm"
library(fitdistrplus)
fitdist(mydata, "norm")
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
estimate Std. Error
mean 0.9049781 2.634855
sd 11.7834301 1.863124
fitdistcens(mydata_cens, "norm")
Fitting of the distribution ' norm ' on censored data by maximum likelihood
Parameters:
estimate
mean 0.9529095
sd 11.6716267
# A simple distribution function without defaults
dmynorm <- function(x, mean, sd, log = FALSE) dnorm(x, mean, sd, log)
pmynorm <- function(q, mean, sd, lower.tail = TRUE, log.p = FALSE) pnorm(q, mean, sd, lower.tail, log.p)
# Fitting without start values generates errors, as expected
try(fitdist(mydata, "mynorm"))
Error in computing default starting values.
try(fitdistcens(mydata_cens, "mynorm"))
Error in computing default starting values.
# fitdist works with mynorm when start values provided
fitdist(mydata, "mynorm", start = list(mean = 0, sd = 1))
Fitting of the distribution ' mynorm ' by maximum likelihood
Parameters:
estimate Std. Error
mean 0.9206415 2.633970
sd 11.7794494 1.861552
# fitdistcens returns an error with mynorm when start values provided
try(fitdistcens(mydata_cens, "mynorm", start = list(mean = 0, sd = 1)), outFile = stdout())
<simpleError in optim(par = vstart, fn = fnobjcens, fix.arg = fix.arg, gr = gradient, rcens = rcens, lcens = lcens, icens = icens, ncens = ncens, ddistnam = ddistname, pdistnam = pdistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdistcens(mydata_cens, "mynorm", start = list(mean = 0, sd = 1)) :
the function mle failed to estimate the parameters,
with the error code 100
# Second bug: fitdist gives different results when given start values equal to the distribution default
fitdist(mydata, "norm")
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
estimate Std. Error
mean 0.9049781 2.634855
sd 11.7834301 1.863124
fitdist(mydata, "norm", start = list(mean = 0, sd = 1))
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
estimate Std. Error
mean 0.9206415 2.633970
sd 11.7794494 1.861552
The discrepancy in the two MLE estimates is somewhat troubling too!
Note that the error shown in the script above is different from what is shown in an interactive session!
Temporary fix
I tried putting a version of fitdistcens()
with my proposed fix in lib/myfitdistrplus.R
. Scripts in lib/
get loaded after the packages are loaded, so this masks the version in the package. I thought this would be more transparent than using my modified version of the whole package (fitdistrplus_bk). However, because fitdistrplus doesn’t export all its utility functions, the function fails because it’s not in the package namespace. If assign it to the package namespace (using environment(fitdistcens) <- asNamespace("fitdistrplus")
; see https://stackoverflow.com/questions/3094232/add-objects-to-package-namespace), then it gets masked by the version that’s already in the package!
I could give it a new name, but that would be harder to update if/when a new version of fitdistrplus is released. So for now I will use the munged version of the package that I created (available from devtools::install_github("BruceKendall/fitdistrplus_bk")
)—that can be updated by changing one line in global.dcf
.
So now we can do:
detach(package:fitdistrplus) # Since we loaded it above
fitdistcens(mydata_cens, "mynorm", start = list(mean = 0, sd = 10))
Fitting of the distribution ' mynorm ' on censored data by maximum likelihood
Parameters:
estimate
mean 1.499655
sd 7.376152