diff --git a/DESCRIPTION b/DESCRIPTION index 8f203cb..18f3533 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: spmodel Title: Spatial Statistical Modeling and Prediction -Version: 0.11.1 +Version: 0.12.0 Authors@R: c( person(given = "Michael", family = "Dumelle", diff --git a/NAMESPACE b/NAMESPACE index be4ce1d..f504a1a 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -188,6 +188,7 @@ S3method(model.matrix,spautor) S3method(model.matrix,spgautor) S3method(model.matrix,spglm) S3method(model.matrix,splm) +S3method(plot,eacf) S3method(plot,esv) S3method(plot,spautor) S3method(plot,spgautor) @@ -359,6 +360,7 @@ export(augment) export(covmatrix) export(dispersion_initial) export(dispersion_params) +export(eacf) export(esv) export(glance) export(glances) diff --git a/NEWS.md b/NEWS.md index 02cfd15..2258d52 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,20 @@ +# spmodel 0.12.0 + +## Major Updates + +* Added `eacf()` to compute the empirical autocovariance function. + +## Minor Updates + +* Changed the default `size` argument to the local argument in `predict(..., block = TRUE)` and `augment(..., block = TRUE)` from 1000 to 4000. This enhances the block prediction (i.e., Kriging) approximation's accuracy but can slightly increase computational complexity. +* Enhanced efficiency of block prediction (i.e., Kriging) for large `newdata` objects. +* Added a warning message that clarifies `xcoord` and `ycoord` are ignored when `data` is an `sf` object. +* Minor documentation updates. + +## Bug Fixes + +* Fixed a bug in `predict(object, newdata)` and `augment(object, newdata)` that could cause `NA` values when the spatial covariance was `"matern"` and a location in `newdata` was the exact same as a location in `data` (the `data` argument used to fit `object`). + # spmodel 0.11.1 ## Minor Updates diff --git a/R/eacf.R b/R/eacf.R new file mode 100644 index 0000000..63c5ae6 --- /dev/null +++ b/R/eacf.R @@ -0,0 +1,204 @@ +#' Compute the empirical autocovariance +#' +#' @description Compute the empirical autocovariance (i.e., empirical covariance) +#' for varying bin sizes and cutoff values. +#' +#' @param formula A formula describing the fixed effect structure. +#' @param data A data frame or \code{sf} object containing the variables in \code{formula} +#' and geographic information. +#' @param xcoord Name of the variable in \code{data} representing the x-coordinate. +#' Can be quoted or unquoted. Not required if \code{data} is an \code{sf} object. +#' @param ycoord Name of the variable in \code{data} representing the y-coordinate. +#' Can be quoted or unquoted. Not required if \code{data} is an \code{sf} object. +#' @param cloud A logical indicating whether the empirical autocovariance should +#' be summarized by distance class or not. When \code{cloud = FALSE} (the default), pairwise autocovariances +#' are binned and averaged within distance classes. When \code{cloud} = TRUE, +#' all pairwise autocovariances and distances are returned (this is known as +#' the "cloud" autocovariance). +#' @param bins The number of equally spaced bins. The default is 15. Ignored if +#' \code{cloud = TRUE}. +#' @param cutoff The maximum distance considered. +#' The default is half the diagonal of the bounding box from the coordinates. +#' @param dist_matrix A distance matrix to be used instead of providing coordinate names. +#' @param partition_factor An optional formula specifying the partition factor. +#' If specified, autocovariances are only computed for observations sharing the +#' same level of the partition factor. +#' +#' @details The empirical autocovariance (i.e., empirical covariance) is a tool used to visualize and model +#' spatial dependence by estimating the semivariance of a process at varying distances. +#' For a constant-mean process, the +#' autocovariance at distance \eqn{h} is denoted \eqn{Cov(h)} and defined as +#' \eqn{Cov(z1, z2)}. Under second-order stationarity, +#' \eqn{Cov(h) = Cov(0) - \gamma(h)}, where \eqn{gamma(h)} is the semivariance function at distance \code{h}. Typically the residuals from an ordinary +#' least squares fit defined by \code{formula} are second-order stationary with +#' mean zero. These residuals are used to compute the empirical autocovariance +#' At a distance \code{h}, the empirical autocovariance is +#' \eqn{1/N(h) \sum (r1 \times r2)}, where \eqn{N(h)} is the number of (unique) +#' pairs in the set of observations whose distance separation is \code{h} and +#' \code{r1} and \code{r2} are residuals corresponding to observations whose +#' distance separation is \code{h}. In spmodel, these distance bins actually +#' contain observations whose distance separation is \code{h +- c}, +#' where \code{c} is a constant determined implicitly by \code{bins}. Typically, +#' only observations whose distance separation is below some cutoff are used +#' to compute the empirical semivariogram (this cutoff is determined by \code{cutoff}). +#' +#' @name eacf +#' +#' @return If \code{cloud = FALSE}, a tibble (data.frame) with distance bins +#' (\code{bins}), the average distance (\code{dist}), the average autocovariance (\code{acov}), and the +#' number of (unique) pairs (\code{np}). If \code{cloud = TRUE}, a tibble +#' (data.frame) with distance (\code{dist}) and autocovariance (\code{acov}) +#' for each unique pair. +#' +#' @export +#' +#' @examples +#' eacf(sulfate ~ 1, sulfate) +#' plot(eacf(sulfate ~ 1, sulfate)) +eacf <- function(formula, data, xcoord, ycoord, cloud = FALSE, bins = 15, cutoff, dist_matrix, partition_factor) { + + # filter out missing response values + na_index <- is.na(data[[all.vars(formula)[1]]]) + data <- data[!na_index, , drop = FALSE] + # finding model frame + data_model_frame <- model.frame(formula, data, drop.unused.levels = TRUE, na.action = na.pass) + # model matrix with potential NA + ob_predictors <- complete.cases(model.matrix(formula, data_model_frame)) + if (any(!ob_predictors)) { + stop("Cannot have NA values in predictors.", call. = FALSE) + } + + # covert sp to sf + attr_sp <- attr(class(data), "package") + if (!is.null(attr_sp) && length(attr_sp) == 1 && attr_sp == "sp") { + stop("sf objects must be used instead of sp objects. To convert your sp object into an sf object, run sf::st_as_sf().", call. = FALSE) + } + + ## convert sf to data frame (point geometry) (1d objects obsolete) + ### see if data has sf class + if (inherits(data, "sf")) { + data <- suppressWarnings(sf::st_centroid(data)) + data <- sf_to_df(data) + ### name xcoord ".xcoord" to be used later + xcoord <- ".xcoord" + ### name ycoord ".ycoord" to be used later + ycoord <- ".ycoord" + } + + # compute spatial distances + if (missing(dist_matrix)) { + # non standard evaluation for the x and y coordinates + xcoord <- substitute(xcoord) + ycoord <- substitute(ycoord) + + if (missing(xcoord)) { + stop("The xcoord argument must be specified.", call. = FALSE) + } + + if (!missing(xcoord)) { + if (!as.character(xcoord) %in% colnames(data)) { + stop("The xcoord argument must match the name of a variable in data.", call. = FALSE) + } + } + + if (missing(ycoord)) { + dist_matrix <- spdist(data, xcoord) + } else { + if (!as.character(ycoord) %in% colnames(data)) { + stop("The ycoord argument must match the name of a variable in data.", call. = FALSE) + } + dist_matrix <- spdist(data, xcoord, ycoord) + } + } + + + dist_matrix <- as.matrix(dist_matrix) + dist_matrix <- dist_matrix[upper.tri(dist_matrix)] + + if (any(dist_matrix == 0)) { + warning("Zero distances observed between at least one pair. Ignoring pairs. If using splm(), consider a different estimation method.", call. = FALSE) + } + + if (missing(partition_factor)) { + partition_factor <- NULL + } + + if (!is.null(partition_factor)) { + # partition_matrix_val <- triu(partition_matrix(partition_factor, data = data), k = 1) + partition_matrix_val <- as.matrix(partition_matrix(partition_factor, data = data)) + partition_matrix_val <- partition_matrix_val[upper.tri(partition_matrix_val)] + dist_matrix <- dist_matrix * partition_matrix_val + } + + if (missing(cutoff)) { + cutoff <- NULL + } + if (is.null(cutoff)) { + cutoff <- max(dist_matrix) / 2 + } + + dist_vector <- dist_matrix + if (any(dist_vector == 0)) { + dist_index <- dist_vector > 0 & dist_vector <= cutoff + } else { + dist_index <- dist_vector <= cutoff + } + + dist_vector <- dist_vector[dist_index] + + # compute squared differences in the residuals + lmod <- lm(formula = formula, data = data) + residuals <- residuals(lmod) + residual_matrix <- outer(residuals, residuals) # ybar is the fitted mean + residual_matrix <- residual_matrix[upper.tri(residual_matrix)] + if (!is.null(partition_factor)) { + residual_matrix <- residual_matrix * partition_matrix_val + } + + residual_vector <- residual_matrix + residual_vector <- residual_vector[dist_index] + residual_vector2 <- residual_vector # just to align with esv() names + + + if (cloud) { + eacf_out <- get_eacf_cloud(residual_vector2, dist_vector) + } else { + eacf_out <- get_eacf(residual_vector2, dist_vector, bins, cutoff) + } + + + + + # remove NA + # eacf_out <- na.omit(eacf_out) + eacf_out <- structure(eacf_out, class = c("eacf", class(eacf_out)), call = match.call(), cloud = cloud) + eacf_out +} + +#' @rdname eacf +#' @method plot eacf +#' @param x An object from \code{eacf()}. +#' @param ... Other arguments passed to other methods. +#' @export +plot.eacf <- function(x, ...) { + + cal <- attr(x, "call") + if (!is.na(m.f <- match("formula", names(cal)))) { + cal <- cal[c(1, m.f)] + names(cal)[2L] <- "" + } + cc <- deparse(cal, 80) + nc <- nchar(cc[1L], "c") + abbr <- length(cc) > 1 || nc > 75 + sub.caption <- if (abbr) { + paste(substr(cc[1L], 1L, min(75L, nc)), "...") + } else { + cc[1L] + } + + dotlist <- list(...) + dotlist <- get_eacf_dotlist_defaults(x, dotlist, cloud = attr(x, "cloud")) + + do.call("plot", c(list(x = x$dist, y = x$acov), dotlist)) + title(sub = sub.caption) +} diff --git a/R/esv.R b/R/esv.R index b392dd9..71a0306 100644 --- a/R/esv.R +++ b/R/esv.R @@ -170,7 +170,7 @@ esv <- function(formula, data, xcoord, ycoord, cloud = FALSE, robust = FALSE, bi residual_vector2 <- residual_vector^2 if (cloud) { - esv_out <- get_esv_cloud(residual_vector2, dist_vector, formula) + esv_out <- get_esv_cloud(residual_vector2, dist_vector) } else { if (robust) { residual_vector12 <- sqrt(residual_vector) diff --git a/R/get_data_object.R b/R/get_data_object.R index 8d319de..4f7107b 100644 --- a/R/get_data_object.R +++ b/R/get_data_object.R @@ -23,6 +23,9 @@ get_data_object_splm <- function(formula, data, spcov_initial, xcoord, ycoord, e data_sf <- suppressWarnings(sf::st_centroid(data)) # store as data frame data <- sf_to_df(data_sf) + if (!missing(xcoord) || !missing(ycoord)) { + warning("data is an sf object. Ignoring xcoord and ycoord arguments.", call. = FALSE) + } ## name xcoord ".xcoord" to be used later xcoord <- ".xcoord" ## name ycoord ".ycoord" to be used later diff --git a/R/get_data_object_glm.R b/R/get_data_object_glm.R index f8b405f..65d1704 100644 --- a/R/get_data_object_glm.R +++ b/R/get_data_object_glm.R @@ -24,6 +24,9 @@ get_data_object_spglm <- function(formula, family, data, spcov_initial, xcoord, data_sf <- suppressWarnings(sf::st_centroid(data)) # store as data frame data <- sf_to_df(data_sf) + if (!missing(xcoord) || !missing(ycoord)) { + warning("data is an sf object. Ignoring xcoord and ycoord arguments.", call. = FALSE) + } ## name xcoord ".xcoord" to be used later xcoord <- ".xcoord" ## name ycoord ".ycoord" to be used later diff --git a/R/get_eacf.R b/R/get_eacf.R new file mode 100644 index 0000000..e10bff8 --- /dev/null +++ b/R/get_eacf.R @@ -0,0 +1,41 @@ +get_eacf <- function(residual_vector2, dist_vector, bins, cutoff, formula) { + + # compute semivariogram classes + dist_classes <- cut(dist_vector, breaks = seq(0, cutoff, length.out = bins + 1)) + + # compute squared differences within each class + acov <- tapply(residual_vector2, dist_classes, function(x) mean(x)) + + # compute pairs within each class + np <- tapply(residual_vector2, dist_classes, length) + + # set as zero if necessary + np <- ifelse(is.na(np), 0, np) + + # compute average distance within each class + dist <- tapply(dist_vector, dist_classes, mean) + + # return output + eacf_out <- tibble::tibble( + bins = factor(levels(dist_classes), levels = levels(dist_classes)), + dist = as.numeric(dist), + acov = as.numeric(acov), + np = as.numeric(np) + ) + + # set row names to NULL + # row.names(eacf_out) <- NULL + + eacf_out +} + + +get_eacf_cloud <- function(residual_vector2, dist_vector) { + + eacf_out <- tibble::tibble(dist = dist_vector, acov = residual_vector2) + + # set row names to NULL + # row.names(eacf_out) <- NULL + + eacf_out +} diff --git a/R/get_eacf_dotlist.R b/R/get_eacf_dotlist.R new file mode 100644 index 0000000..6e051a9 --- /dev/null +++ b/R/get_eacf_dotlist.R @@ -0,0 +1,68 @@ +#' Get empirical autocovariance dotlist +#' +#' @param ... Additional arguments to \code{eacf()} +#' @param max_halfdist +#' +#' @return An eacf dotlist +#' +#' @noRd +get_eacf_dotlist <- function(..., max_halfdist) { + # storing dotlist and setting defaults for eacf + dotlist <- list(...) + + if (!("bins" %in% names(dotlist))) { + dotlist$bins <- 15 + } + + if (!("cutoff" %in% names(dotlist))) { + dotlist$cutoff <- max_halfdist + } + + # make dotlist eacf + dotlist_eacf <- list(bins = dotlist$bins, cutoff = dotlist$cutoff) +} + + +get_eacf_dotlist_defaults <- function(x, dotlist, cloud) { + + names_dotlist <- names(dotlist) + + # set defaults + if (!"main" %in% names_dotlist) { + dotlist$main <- "Empirical Autocovariance" + if (cloud) dotlist$main <- paste0(dotlist$main, " (Cloud)") + } + + if (!"xlab" %in% names_dotlist) { + dotlist$xlab <- "Distance" + } + + if (!"ylab" %in% names_dotlist) { + dotlist$ylab <- "Autocovariance" + } + + if (!cloud && !"pch" %in% names_dotlist) { + dotlist$pch <- 19 + } + + if (!cloud && !"cex" %in% names_dotlist) { + dotlist$cex <- (x$np - min(x$np)) / (max(x$np) - min(x$np)) * 2 + 1 + } + + if (!"ylim" %in% names_dotlist) { + + # include zero if not in limits + + ## all greater than zero (positive) + if (all(x$acov > 0)) { + dotlist$ylim <- c(0, 1.1 * max(x$acov)) + } + + ## all less than zero (negative) + if (all(x$acov < 0)) { + dotlist$ylim <- c(1.1 * min(x$acov), 0) + } + } + + dotlist +} diff --git a/R/get_esv.R b/R/get_esv.R index 6ad4f78..0933fd7 100644 --- a/R/get_esv.R +++ b/R/get_esv.R @@ -62,7 +62,7 @@ get_esv_robust <- function(residual_vector12, dist_vector, bins, cutoff, formula esv_out } -get_esv_cloud <- function(residual_vector2, dist_vector, formula) { +get_esv_cloud <- function(residual_vector2, dist_vector) { esv_out <- tibble::tibble(dist = dist_vector, gamma = residual_vector2 / 2) diff --git a/R/get_local_list.R b/R/get_local_list.R index 052e83b..fbd8578 100644 --- a/R/get_local_list.R +++ b/R/get_local_list.R @@ -168,7 +168,7 @@ get_local_list_prediction_block <- function(local) { if (is.logical(local)) { if (local) { - local <- list(method = "covariance", size = 1000) + local <- list(method = "covariance", size = 4000) } else { local <- list(method = "all") } @@ -190,10 +190,19 @@ get_local_list_prediction_block <- function(local) { } if (local$method %in% c("distance", "covariance") && !"size" %in% names_local) { - local$size <- 1000 + local$size <- 4000 } - # no parallel required + if (!"parallel" %in% names_local) { + local$parallel <- FALSE + local$ncores <- NULL + } + + if (local$parallel) { + if (!"ncores" %in% names_local) { + local$ncores <- parallel::detectCores() + } + } local } diff --git a/R/predict.R b/R/predict.R index d011a0c..fd1170b 100644 --- a/R/predict.R +++ b/R/predict.R @@ -73,7 +73,7 @@ #' If \code{block} is \code{TRUE}, \code{local} accepts \code{method} and \code{size}, #' and \code{method} takes values of \code{"all"}, \code{"covariance"}, #' and \code{"distance"}, similar as when \code{block} is \code{FALSE}. -#' The default \code{method} is \code{"distance"} with size \code{1000}. This default \code{size} is +#' The default \code{method} is \code{"covariance"} with size \code{4000}. This default \code{size} is #' much larger than when \code{block} is \code{FALSE}. This is because when \code{block} is \code{TRUE}, #' covariances and explanatory variables are averaged before prediction, which greatly #' reduces computational burden, only requiring the Cholesky decomposition diff --git a/R/predict_block.R b/R/predict_block.R index bf458f4..24c87cf 100644 --- a/R/predict_block.R +++ b/R/predict_block.R @@ -143,7 +143,11 @@ predict_block_splm <- function(object, newdata, se.fit, scale, df, interval, lev c0 <- colMeans(covmatrix(object, newdata = newdata, cov_type = "pred.obs")) Sig <- covmatrix(object) - s0 <- mean(covmatrix(object, newdata = newdata, cov_type = "pred.pred")) + if (NROW(newdata) > 1e4) { + s0 <- get_bk_var(object, newdata, local) + } else { + s0 <- mean(covmatrix(object, newdata = newdata, cov_type = "pred.pred")) + } Xmat <- model.matrix(object) if (local$method == "all") { @@ -234,3 +238,23 @@ predict_block_splm <- function(object, newdata, se.fit, scale, df, interval, lev stop("Interval must be none, confidence, or prediction") } } + +get_bk_var <- function(object, newdata, local) { + index <- seq(1, NROW(newdata)) + object$obdata <- newdata + if (local$parallel) { + cl <- parallel::makeCluster(local$ncores) + val <- parallel::parLapply(cl, index, get_each_bk_meancov, object, newdata) + cl <- parallel::stopCluster(cl) + } else { + val <- lapply(index, get_each_bk_meancov, object, newdata) + } + mean(unlist(val)) +} + +get_each_bk_meancov <- function(index, object, newdata) { + newdata <- newdata[index, , drop = FALSE] + val <- as.vector(spmodel::covmatrix(object, newdata = newdata, cov_type = "obs.pred")) + val[index] <- val[index] + object$coefficients$spcov[["ie"]] # this is to add ie variance to diagonal (which is omitted with newdata) + mean(val) +} diff --git a/R/spcov_matrix.R b/R/spcov_matrix.R index 4e7dbd5..aaa563a 100644 --- a/R/spcov_matrix.R +++ b/R/spcov_matrix.R @@ -160,7 +160,7 @@ spcov_matrix.magnetic <- function(spcov_params, dist_matrix, diagtol = 0, ...) { spcov_matrix.matern <- function(spcov_params, dist_matrix, diagtol = 0, ...) { eta <- sqrt(2 * spcov_params[["extra"]]) * (dist_matrix / spcov_params[["range"]]) spcov_matrix_val <- spcov_params[["de"]] * 2^(1 - spcov_params[["extra"]]) / gamma(spcov_params[["extra"]]) * eta^spcov_params[["extra"]] * besselK(as.matrix(eta), nu = spcov_params[["extra"]]) # eta as sparse matrix causes error - dist_matrix_zero <- which(dist_matrix == 0) + dist_matrix_zero <- which(dist_matrix == 0) # consider epsilon threshold instead of exactly zero? spcov_matrix_val[dist_matrix_zero] <- spcov_params[["de"]] spcov_params[["ie"]] <- max(spcov_params[["ie"]], 1e-4 * spcov_params[["de"]], diagtol) diag(spcov_matrix_val) <- spcov_params[["de"]] + spcov_params[["ie"]] diff --git a/R/spcov_vector.R b/R/spcov_vector.R index 9eb941a..1b327f3 100644 --- a/R/spcov_vector.R +++ b/R/spcov_vector.R @@ -133,6 +133,8 @@ spcov_vector.magnetic <- function(spcov_params, dist_vector) { spcov_vector.matern <- function(spcov_params, dist_vector) { eta <- sqrt(2 * spcov_params[["extra"]]) * (dist_vector / spcov_params[["range"]]) spcov_vector_val <- spcov_params[["de"]] * 2^(1 - spcov_params[["extra"]]) / gamma(spcov_params[["extra"]]) * eta^spcov_params[["extra"]] * besselK(as.matrix(eta), nu = spcov_params[["extra"]]) + dist_vector_zero <- which(dist_vector == 0) # consider epsilon threshold instead of exactly zero? + spcov_vector_val[dist_vector_zero] <- spcov_params[["de"]] spcov_vector_val } diff --git a/docs/404.html b/docs/404.html index 75f8099..e49b0c4 100644 --- a/docs/404.html +++ b/docs/404.html @@ -39,7 +39,7 @@ diff --git a/docs/LICENSE.html b/docs/LICENSE.html index 7b9ff1d..a4a1ae8 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -17,7 +17,7 @@ diff --git a/docs/articles/SPGLMs.html b/docs/articles/SPGLMs.html index d2ebaea..ed777b2 100644 --- a/docs/articles/SPGLMs.html +++ b/docs/articles/SPGLMs.html @@ -40,7 +40,7 @@ diff --git a/docs/articles/block.html b/docs/articles/block.html index 75a3637..0ff3f7b 100644 --- a/docs/articles/block.html +++ b/docs/articles/block.html @@ -40,7 +40,7 @@ diff --git a/docs/articles/emmeans.html b/docs/articles/emmeans.html index a1a5b73..596b52f 100644 --- a/docs/articles/emmeans.html +++ b/docs/articles/emmeans.html @@ -40,7 +40,7 @@ diff --git a/docs/articles/guide.html b/docs/articles/guide.html index 73d2589..2833809 100644 --- a/docs/articles/guide.html +++ b/docs/articles/guide.html @@ -40,7 +40,7 @@ diff --git a/docs/articles/index.html b/docs/articles/index.html index 1691d7c..e3ff4bf 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -17,7 +17,7 @@ diff --git a/docs/articles/introduction.html b/docs/articles/introduction.html index fa5eb64..37a1007 100644 --- a/docs/articles/introduction.html +++ b/docs/articles/introduction.html @@ -40,7 +40,7 @@ @@ -147,12 +147,18 @@
There are three more spmodel vignettes available on our
+
There are five more spmodel vignettes available on our
website at https://usepa.github.io/spmodel/:
spmodel
spmodel
+emmeans to Estimate Marginal Means of
+spmodel Objectsspmodel
+Additionally, there are two workbooks that have accompanied recent diff --git a/docs/articles/technical.html b/docs/articles/technical.html index 71d0404..89d8f97 100644 --- a/docs/articles/technical.html +++ b/docs/articles/technical.html @@ -40,7 +40,7 @@ @@ -3461,13 +3461,15 @@
esv())
+esv()
+and eacf())
The empirical semivariogram is a moment-based estimate of the
theoretical semivariogram. The empirical semivariogram quantifies half
of the average squared difference in the response among observations in
-several distance classes. More formally, the empirical semivariogram is
-defined as \[\begin{equation}\label{eq:esv}
+several distance classes. More formally, the empirical semivariogram,
+\(\hat{\gamma}(h)\), is defined as
+\[\begin{equation}\label{eq:esv}
\hat{\gamma}(h) = \frac{1}{2|N(h)|} \sum_{N(h)} (y_i - y_j)^2,
\end{equation}\] where \(N(h)\)
is the set of observations in \(\mathbf{y}\) that are \(h\) distance apart (distance classes) and
@@ -3481,11 +3483,29 @@ The Empirical Semivariogram (
because there tend to be
few pairs with large distances. The default in esv())\(\hat{\gamma}(h)\)spmodel is
to use a cutoff of half the maximum distance (hypotenuse) of the
-domain’s bounding box.
The main purpose of the empirical semivariogram is its use in semivariogram weighted least squares estimation for spatial linear models, though it can also be used as a visual diagnostic to assess the fit of a spatial covariance function.
+The empirical semivariogram is closely related to the empirical
+autocovariance (Zimmerman and Ver Hoef
+2024). The empirical autocovariance quantifies the average
+covariance among observations in several distance classes. Using similar
+notation as for the empirical semivariogram, \[\begin{equation}\label{eq:eacf}
+ \hat{eacf}(h) = \frac{1}{|N(h)|} \sum_{N(h)} (y_i - \hat{y})(y_j -
+\hat{y}),
+\end{equation}\] where \(\hat{y} =
+\bar{y}\), the sample mean of all the observations. Like with
+esv(), the default is to use 15 equally spaced bins with a
+distance cutoff of half the maximum distance (hypotenuse) of the
+domain’s bounding box. Because we are typically interested in
+characterizing residual autocovariance, \(y_i
+- \hat{y}\) and \(y_j -
+\hat{y}\) tend to represent residuals from an ordinary least
+squares fit given some fixed effects design matrix, \(\mathbf{X}\), and fixed effects, \(\hat{\boldsymbol{\beta}}\).
NEWS.md
+ eacf() to compute the empirical autocovariance function.size argument to the local argument in predict(..., block = TRUE) and augment(..., block = TRUE) from 1000 to 4000. This enhances the block prediction (i.e., Kriging) approximation’s accuracy but can slightly increase computational complexity.newdata objects.xcoord and ycoord are ignored when data is an sf object.Compute the empirical autocovariance (i.e., empirical covariance) +for varying bin sizes and cutoff values.
+eacf(
+ formula,
+ data,
+ xcoord,
+ ycoord,
+ cloud = FALSE,
+ bins = 15,
+ cutoff,
+ dist_matrix,
+ partition_factor
+)
+
+# S3 method for eacf
+plot(x, ...)A formula describing the fixed effect structure.
A data frame or sf object containing the variables in formula
+and geographic information.
Name of the variable in data representing the x-coordinate.
+Can be quoted or unquoted. Not required if data is an sf object.
Name of the variable in data representing the y-coordinate.
+Can be quoted or unquoted. Not required if data is an sf object.
A logical indicating whether the empirical autocovariance should
+be summarized by distance class or not. When cloud = FALSE (the default), pairwise autocovariances
+are binned and averaged within distance classes. When cloud = TRUE,
+all pairwise autocovariances and distances are returned (this is known as
+the "cloud" autocovariance).
The number of equally spaced bins. The default is 15. Ignored if
+cloud = TRUE.
The maximum distance considered. +The default is half the diagonal of the bounding box from the coordinates.
A distance matrix to be used instead of providing coordinate names.
An optional formula specifying the partition factor. +If specified, autocovariances are only computed for observations sharing the +same level of the partition factor.
An object from eacf().
Other arguments passed to other methods.
If cloud = FALSE, a tibble (data.frame) with distance bins
+(bins), the average distance (dist), the average autocovariance (acov), and the
+number of (unique) pairs (np). If cloud = TRUE, a tibble
+(data.frame) with distance (dist) and autocovariance (acov)
+for each unique pair.
The empirical autocovariance (i.e., empirical covariance) is a tool used to visualize and model
+spatial dependence by estimating the semivariance of a process at varying distances.
+For a constant-mean process, the
+autocovariance at distance \(h\) is denoted \(Cov(h)\) and defined as
+\(Cov(z1, z2)\). Under second-order stationarity,
+\(Cov(h) = Cov(0) - \gamma(h)\), where \(gamma(h)\) is the semivariance function at distance h. Typically the residuals from an ordinary
+least squares fit defined by formula are second-order stationary with
+mean zero. These residuals are used to compute the empirical autocovariance
+At a distance h, the empirical autocovariance is
+\(1/N(h) \sum (r1 \times r2)\), where \(N(h)\) is the number of (unique)
+pairs in the set of observations whose distance separation is h and
+r1 and r2 are residuals corresponding to observations whose
+distance separation is h. In spmodel, these distance bins actually
+contain observations whose distance separation is h +- c,
+where c is a constant determined implicitly by bins. Typically,
+only observations whose distance separation is below some cutoff are used
+to compute the empirical semivariogram (this cutoff is determined by cutoff).
eacf(sulfate ~ 1, sulfate)
+#> # A tibble: 15 × 4
+#> bins dist acov np
+#> * <fct> <dbl> <dbl> <dbl>
+#> 1 (0,1.5e+05] 103340. 93.2 149
+#> 2 (1.5e+05,3.01e+05] 232014. 86.4 456
+#> 3 (3.01e+05,4.51e+05] 379255. 75.7 749
+#> 4 (4.51e+05,6.02e+05] 529543. 70.0 887
+#> 5 (6.02e+05,7.52e+05] 677949. 62.3 918
+#> 6 (7.52e+05,9.03e+05] 826917. 54.8 1113
+#> 7 (9.03e+05,1.05e+06] 978773. 47.6 1161
+#> 8 (1.05e+06,1.2e+06] 1127232. 35.0 1230
+#> 9 (1.2e+06,1.35e+06] 1275415. 24.8 1239
+#> 10 (1.35e+06,1.5e+06] 1429184. 8.00 1236
+#> 11 (1.5e+06,1.65e+06] 1577636. -2.08 1139
+#> 12 (1.65e+06,1.81e+06] 1729098. -15.3 1047
+#> 13 (1.81e+06,1.96e+06] 1879679. -22.2 934
+#> 14 (1.96e+06,2.11e+06] 2029566. -32.9 842
+#> 15 (2.11e+06,2.26e+06] 2181337. -39.8 788
+plot(eacf(sulfate ~ 1, sulfate))
+
+dispersion_params()
Create a dispersion parameter object
Compute the empirical autocovariance
If block is TRUE, local accepts method and size,
and method takes values of "all", "covariance",
and "distance", similar as when block is FALSE.
-The default method is "distance" with size 1000. This default size is
+The default method is "covariance" with size 4000. This default size is
much larger than when block is FALSE. This is because when block is TRUE,
covariances and explanatory variables are averaged before prediction, which greatly
reduces computational burden, only requiring the Cholesky decomposition
@@ -418,6 +418,7 @@