3214

pending completion

Build # 3214

Build Type

push

travis-ci-com

Committed by

chschan

Commit Message

Untracked: add dependency for car@3.1-0

Run Details

1039 of 1117 relevant lines covered (93.02%)

479.93 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

95.0

/R/onewayanova.R

#' \code{OneWayANOVA}
#'
#' Computes a one-way analysis of variance with post hoc tests.
#' @param outcome The outcome variable.
#' @param predictor The factor representing the groups.
#' @param subset An optional vector specifying a subset of observations to be
#'   used in the fitting process, or, the name of a variable in \code{data}. It
#'   may not be an expression. \code{subset} may not
#' @param weights An optional vector of sampling weights, or, the name or, the
#'   name of a variable in \code{data}. It may not be an expression.
#' @param compare One of \code{"To mean", "Pairwise", "To first"} (which implement's Dunnett's C, when
#' combined with 'correction' == 'Tukey Range'), or \code{"All"}
#' @param correction The multiple comparison adjustment method: \code{"Tukey Range", "None",
#' "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni",
#' "Free Combinations"} (Westfall et al. 1999), \code{"Hochberg", "Holm",
#' "Hommel", "Single-step"} (Bretz et al. 2010) \code{"Shaffer"}, and \code{"Westfall"}.
#' @param alternative The alternative hypothesis: "Two sided", "Greater", or "Less". The main application of this is when
#' Compare us set 'To first' (e.g., if testing a new product, where the purpose is to work out of the new product is superior
#' to an existing product, "Greater" would be chosen).
#' @param robust.se If \code{TRUE}, computes standard errors that are robust to violations of
#'   the assumption of constant variance for linear and Poisson models, using the HC3 modification of White's (1980) estimator
#'   (Long and Ervin, 2000). This parameter is ignored if weights are applied (as weights already
#'   employ a sandwich estimator). Other options are \code{FALSE} and \code{"FALSE"No}, which do the same
#'   thing, and \code{"hc0"}, \code{"hc1"}, \code{"hc2"}, \code{"hc4"}.
#' @param missing How missing data is to be treated in the ANOVA. Options:
#'   \code{"Error if missing data"}.
#'   \code{"Exclude cases with missing data"}, and
#'   \code{"Imputation (replace missing values with estimates)"}.
#' @param show.labels Shows the variable labels, as opposed to the labels, in the outputs, where a
#' variables label is an attribute (e.g., attr(foo, "label")).
#' @param outcome.name The name of the outcome  variable. Only used when \code{show.labels} is FALSE. Defaults to
#' the actual variable name, which does not work so well if the function is being called by another function.
#' @param predictor.name The name of the predictor variable. Only used when \code{show.labels} is FALSE. Defaults to
#' the actual variable name, which does not work so well if the function is being called by another function.
#' @param p.cutoff The alpha level to be used in testing.
#' @param seed The random number seed used when evaluating the multivariate t-distribution.
#' @param return.all If \code{TRUE}, returns all the internal computations in the output object. If \code{FALSE},
#' returns just the information required to print the output.
#' @param ... Other parameters to be passed to wrapped functions.
#' @details When 'Tukey Range' is selected, p-values are computed using t'tests, with a correction for the family-wise
#' error rate such that the p-values are correct for the largest range of values being compared (i.e.,
#' the biggest difference between the smallest and largest means). This is a single-step test. The method
#' of calculation is valid for both balanced and unbalanced samples (Bretz et al. 2011), and consequently the results
#' may differ for unbalanced samples to those that appear in most software and books (which instead employee an approximation
#' when the samples are unbalanced).
#'
#' When \code{missing = "Imputation (replace missing values with estimates)"}, all selected
#' outcome and predictor variables are included in the imputation, along with
#' all \code{auxiliary.data}, excluding cases that are excluded via subset or
#' have invalid weights, but including cases with missing values of the outcome variable.
#' Then, cases with missing values in the outcome variable are excluded from
#' the analysis (von Hippel 2007). See \code{\link[flipImputation]{Imputation}}.
#'
#' @references
#' Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.
#' Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.
#' Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.
#' Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.
#' Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.
#' Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.
#' Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.
#' Long, J. S. and Ervin, L. H. (2000). Using
#' heteroscedasticity consistent standard errors in the linear regression
#' model. The American Statistician, 54(3): 217-224.
#' Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.
#' Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.
#' Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.
#' Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.
#' Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.
#' Peter H. Westfall (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.
#' P. H. Westfall, R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.
#' von Hippel, Paul T. 2007. "Regression With Missing Y's: An
#' Improved Strategy for Analyzing Multiply Imputed Data." Sociological
#' Methodology 37:83-117.
#' Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.
#' White, H. (1980), A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity.
#' Econometrica, 48, 817-838.
#' @importFrom flipData Observed
#' @importFrom flipFormat Labels FormatAsReal FormatAsPValue RegressionTable OriginalName
#' @import flipRegression
# #Regression GrandMean vcov.Regression
#' @importFrom flipStatistics Frequency
#' @importFrom flipTransformations AsNumeric Factor ProcessQVariables
#' @importFrom flipRegression PValueAdjustFDR
#' @importFrom multcomp glht mcp adjusted
#' @importFrom survey regTermTest
#' @importFrom stats aov pf vcov ptukey
#' @importFrom verbs Sum
#' @export
OneWayANOVA <- function(outcome,
                        predictor,
                        subset = NULL,
                        weights = NULL,
                        compare = "Pairwise",
                        correction = "Tukey Range",
                        alternative = "Two-sided",
                        robust.se = FALSE,
                        missing = "Exclude cases with missing data",
                        show.labels = TRUE,
                        outcome.name = NULL,
                        predictor.name = NULL,
                        p.cutoff = 0.05,
                        seed = 1223,
                        return.all = FALSE,
                        ...)
{
    outcome <- ProcessQVariables(outcome)
    predictor <- ProcessQVariables(predictor)
    ## DS-2353 Ensure vector of unity weights treated same as NULL weights
    if (!is.null(weights) && all(weights == 1))
        weights <- NULL

    if (robust.se == "No")
        robust.se <- FALSE
    .multcompSummary <- function(comparisons, correct, robust.se)
    {
        if (correct == "Tukey Range")
        {
            if (compare == "Pairwise" && comparisons$alternative == "two.sided" && !robust.se)
            {
                res <- summary(comparisons, test = adjusted(type = "none"))

                # Overwrite pvalues with adjusted
                compare <- attr(comparisons$linfct, "type")
                MSE <- Sum(comparisons$model$residuals^2, remove.missing = FALSE) / comparisons$model$df.residual
                means <- comparisons$model$coefficients
                means[-1] <- means[-1] + means[1]
                counts <- table(comparisons$model$model$predictor)

                center <- outer(means, means, "-")
                keep <- lower.tri(center)
                center <- center[keep]
                est <- center / (sqrt((MSE/2) * outer(1/counts, 1/counts, "+"))[keep])
                pvals <- ptukey(abs(est), length(means), comparisons$model$df.residual, lower.tail = FALSE)
                res$test$pvalues <- pvals
                return(res)
            }
            res <- tryCatch({summary(comparisons)},
                            warning = function(w) {warning("Numerical precision of p-value calcuations exceeds threshold. ",
                                                           "Treat p-values with caution.")
                                suppressWarnings(summary(comparisons))})
            return(res)
        }
        if (correct == "False Discovery Rate" || correct == "fdr")
        {
            res <- summary(comparisons, test = adjusted(type = "none"))
            res$test$pvalues <- PValueAdjustFDR(res$test$pvalues)
            return(res)
        }
        return(summary(comparisons, test = adjusted(type = correction)))
    }
    if (is.null(outcome.name))
        outcome.name <- OriginalName(outcome)
    if (is.null(predictor.name))
        predictor.name <- OriginalName(predictor)
    subset.description <- try(deparse(substitute(subset)), silent = TRUE) #We don't know whether subset is a variable in the environment or in data.
    correct <- correction
    no.correction <- correction == "None"
    alt <- alternative
    alternative <- switch(alternative, "Two-sided" = "two.sided", "Greater" = "greater", "Less" = "less")
    correction <- switch(correction, "Holm" = "holm", "Hochberg" = "hochberg", "Hommel" = "hommel", "Bonferroni" = "bonferroni",
                         "Benjamini & Yekutieli" = "BY","False Discovery Rate" = "fdr", "None" = "none", "Single-step" = "single-step",
                         "Shaffer" = "Shaffer", "Westfall" = "Westfall", "Free Combinations" = "free",
                         "Tukey Range" = "none", "Dunnett" = "none")
    predictor.label <- if (show.labels) Labels(predictor) else predictor.name
    predictor <- tidyFactor(predictor)
    #outcome.label = if (show.labels & !is.null(attr(outcome, "label"))) attr(outcome, "label") else outcome.name
    outcome.label = if (show.labels) Labels(outcome) else outcome.name
    # Should be removed when hooking up GLM
    outcome <- if (is.factor(outcome)) AsNumeric(outcome, binary = FALSE) else outcome
    regression <- Regression(outcome ~ predictor,
                             subset = subset,
                             missing = missing,
                             weights = weights,
                             type = if (is.null(list(...)$type)) "Linear" else list(...)$type,
                             robust.se = robust.se,
                             internal = TRUE)
    model <- regression$original
    robust.se <- regression$robust.se
    contrasts <- mcp(predictor = switch(compare, "Pairwise" = "Tukey", "To mean" = "GrandMean", "To first" = "Dunnett"))
    vcov <- try(vcov(regression, robust.se), silent = TRUE)
    if (!tryError(vcov))
    {
        comparisons <- glht(model, linfct = contrasts, alternative = alternative, vcov = vcov)
        set.seed(seed)
        mcomp <- try(.multcompSummary(comparisons, correct, robust.se), silent = TRUE)
    }
    if (tryError(vcov) || tryError(mcomp))
    {
        if(robust.se)
        {
            warning("Robust standard errors have not been implemented for this model.")
            mcomp <- .multcompSummary(glht(model, linfct = contrasts, alternative = alternative), correct, FALSE)
        }
        else if (!is.null(weights))
            stop("Weights cannot be used with this model. Remove weights to compute the unweighted model.")
    }
    f.test <- FTest(regression)
    sub <- if (is.null(subset)) rep(TRUE, length(predictor)) else subset
    if (!is.null(weights))
        sub <- sub & weights > 0
    predictor.n <- Frequency(predictor, sub)
    predictor.n <- predictor.n[predictor.n > 0]
    result <- c(f.test,
                list(original = mcomp,
                     robust.se = robust.se,
                     grand.mean = GrandMean(regression),
                     correction = correct,
                     outcome.label = outcome.label,
                     r.squared = rSquared(regression),
                     p.cutoff = p.cutoff,
                     compare = compare,
                     column.names = names(predictor.n),
                     n = predictor.n))
    # Headers, subtitles, footers
    mc.correction <- paste0("; multiple comparisons correction: ", correct)
    alpha <- paste0("null hypotheses: ", tolower(alt))
    robust.se.text <- if (robust.se == FALSE) "" else
        paste0("; heteroscedasticity-robust standard errors (", if(robust.se == TRUE) "hc3" else robust.se, ");")
    result$posthoc <- paste0(alpha, mc.correction, robust.se.text)
    result$subtitle <- if (is.na(f.test$p)) "Error computing p-value" else paste0(if (f.test$p <= p.cutoff) "Significant" else "Not significant",
                                                                                  ": F: ", FormatAsReal(f.test$Ftest, 4),
                                                                                  " on ", f.test$df, " and ", f.test$ddf, " degrees-of-freedom; p: ", FormatAsPValue(f.test$p),
                                                                                  "; R-squared: ", FormatAsReal(result$r.squared, 4))
    result$title <- paste0("One-way ANOVA: ", outcome.label, " by ", predictor.label)
    result$footer <- paste0(regression$sample.description, result$posthoc)
    r <- result$original$test
    if (no.correction & compare == "To first")
    {   # Using more precise results from Regression rather than glht
        rcoefs <- summary(model)$coef[-1, , drop = FALSE]
        k <- nrow(rcoefs)
        r$coefficients[1:k] <- rcoefs[1:k, 1] # Retaining the labels.
        r$sigma <- rcoefs[, 2]
        if (Sum(abs(r$tstat - rcoefs[, 3]) > .1, remove.missing = FALSE))
        {
            print(rbind(svyglm = r$tstat, multcomp = rcoefs[, 3]))
            stop("Unreliable inference in one-way ANOVA.")
        }
        r$tstat <- rcoefs[, 3]
        r$pvalues <- rcoefs[, 4]
    }
    coefs <- r$coefficients + if (compare == "To mean") result$grand.mean else 0
    #coef.names <- gsub("predictor", "", names(model$coef)[-1])
    coefs <- cbind(coefs, r$sigma, r$tstat, r$pvalues)
    colnames(coefs) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|))")
    if (compare == "To mean")
    {
        tbl <- table(predictor[regression$subset])
        nms <- names(tbl)[tbl > 0]
        rownames(coefs) <- nms
        colnames(coefs)[1] <- "Mean"
    }
    else
        colnames(coefs)[1] <- "Difference"
    if (!no.correction)
        colnames(coefs)[4] <- "Corrected p"
    result$coefs <- coefs
    # Creating the final outputs.
    is.t <- result$original$df > 0
    estimate.name <- if (result$compare == "To mean") "Estimate" else "Difference"
    p.name <- if(result$correction == "NONE")
        "<span style='font-style:italic;'>p</span>"
    else
        "Corrected <span style='font-style:italic;'>p</span>"
    result$table <- RegressionTable(result$coefs,
                                    title = result$title,
                                    subtitle = result$subtitle,
                                    footer = result$footer,
                                    estimate.name = estimate.name,
                                    se.name = if (result$robust.se) "Robust SE" else "Standard Error",
                                    p.name = p.name,
                                    p.cutoff = result$p.cutoff)
    coefs <- result$coefs
    if (!return.all)
        result <- list(table = result$table)
    attr(result, "ChartData") <- coefs
    class(result) <- "OneWayANOVA"
    result
}

#' rSquared
#'
#' @param  object A \link{FitRegression} object.
#' @import flipRegression
#' @importFrom flipData Observed
#' @importFrom flipStatistics Correlation
#' @importFrom stats predict
rSquared <- function(object)
{
    predicted <- predict(object$original)
    observed <- Observed(object)
    cor <- Correlation(predicted, observed, object$weights)
    cor * cor
}


tryError <- function(x)
    inherits(x, "try-error")

#' politeWeightedFTest
#'
#' Computes \link{regTermTest}, but fails savely when an error occurs.
#' @param fit A FitRegression object.
#' @importFrom flipU OutcomeName
#' @importFrom survey regTermTest
politeWeightedFTest <- function(fit)
{
    test <- suppressWarnings(try(regTermTest(fit$original, "predictor"), silent = TRUE))
    if(tryError(test))
    {
        warning("Unable to compute weighted F-test due to a technical problem with the underlying computations. This error may disappear if you remove the weight.")
        return(list(Ftest = NA, df = NA, ddf = NA, p = NA))
    }
    test
}

#' \code{FTest}
#'
#' An F-Test from a \code{lm} or linear \code{svyglm} object.
#' @param fit An object created by \link{FitRegression}.
#' @importFrom stats aov
#' @export
FTest <- function(fit)
{
    if (!is.null(fit$weights))
        return(politeWeightedFTest(fit))
    aovFTest(fit$original)
}

aovFTest <- function(model)
{
    f.test <- summary(aov(model))[[1]]
    list(Ftest = f.test[1, 4],
         df = f.test[1, 1],
         ddf = f.test[2, 1],
         p = f.test[1, 5])
}

#' @export
print.OneWayANOVA <- function(x, ...)
{
    print(x$table)
}


1	#' \code{OneWayANOVA}
2	#'
3	#' Computes a one-way analysis of variance with post hoc tests.
4	#' @param outcome The outcome variable.
5	#' @param predictor The factor representing the groups.
6	#' @param subset An optional vector specifying a subset of observations to be
7	#' used in the fitting process, or, the name of a variable in \code{data}. It
8	#' may not be an expression. \code{subset} may not
9	#' @param weights An optional vector of sampling weights, or, the name or, the
10	#' name of a variable in \code{data}. It may not be an expression.
11	#' @param compare One of \code{"To mean", "Pairwise", "To first"} (which implement's Dunnett's C, when
12	#' combined with 'correction' == 'Tukey Range'), or \code{"All"}
13	#' @param correction The multiple comparison adjustment method: \code{"Tukey Range", "None",
14	#' "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni",
15	#' "Free Combinations"} (Westfall et al. 1999), \code{"Hochberg", "Holm",
16	#' "Hommel", "Single-step"} (Bretz et al. 2010) \code{"Shaffer"}, and \code{"Westfall"}.
17	#' @param alternative The alternative hypothesis: "Two sided", "Greater", or "Less". The main application of this is when
18	#' Compare us set 'To first' (e.g., if testing a new product, where the purpose is to work out of the new product is superior
19	#' to an existing product, "Greater" would be chosen).
20	#' @param robust.se If \code{TRUE}, computes standard errors that are robust to violations of
21	#' the assumption of constant variance for linear and Poisson models, using the HC3 modification of White's (1980) estimator
22	#' (Long and Ervin, 2000). This parameter is ignored if weights are applied (as weights already
23	#' employ a sandwich estimator). Other options are \code{FALSE} and \code{"FALSE"No}, which do the same
24	#' thing, and \code{"hc0"}, \code{"hc1"}, \code{"hc2"}, \code{"hc4"}.
25	#' @param missing How missing data is to be treated in the ANOVA. Options:
26	#' \code{"Error if missing data"}.
27	#' \code{"Exclude cases with missing data"}, and
28	#' \code{"Imputation (replace missing values with estimates)"}.
29	#' @param show.labels Shows the variable labels, as opposed to the labels, in the outputs, where a
30	#' variables label is an attribute (e.g., attr(foo, "label")).
31	#' @param outcome.name The name of the outcome variable. Only used when \code{show.labels} is FALSE. Defaults to
32	#' the actual variable name, which does not work so well if the function is being called by another function.
33	#' @param predictor.name The name of the predictor variable. Only used when \code{show.labels} is FALSE. Defaults to
34	#' the actual variable name, which does not work so well if the function is being called by another function.
35	#' @param p.cutoff The alpha level to be used in testing.
36	#' @param seed The random number seed used when evaluating the multivariate t-distribution.
37	#' @param return.all If \code{TRUE}, returns all the internal computations in the output object. If \code{FALSE},
38	#' returns just the information required to print the output.
39	#' @param ... Other parameters to be passed to wrapped functions.
40	#' @details When 'Tukey Range' is selected, p-values are computed using t'tests, with a correction for the family-wise
41	#' error rate such that the p-values are correct for the largest range of values being compared (i.e.,
42	#' the biggest difference between the smallest and largest means). This is a single-step test. The method
43	#' of calculation is valid for both balanced and unbalanced samples (Bretz et al. 2011), and consequently the results
44	#' may differ for unbalanced samples to those that appear in most software and books (which instead employee an approximation
45	#' when the samples are unbalanced).
46	#'
47	#' When \code{missing = "Imputation (replace missing values with estimates)"}, all selected
48	#' outcome and predictor variables are included in the imputation, along with
49	#' all \code{auxiliary.data}, excluding cases that are excluded via subset or
50	#' have invalid weights, but including cases with missing values of the outcome variable.
51	#' Then, cases with missing values in the outcome variable are excluded from
52	#' the analysis (von Hippel 2007). See \code{\link[flipImputation]{Imputation}}.
53	#'
54	#' @references
55	#' Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.
56	#' Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.
57	#' Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.
58	#' Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.
59	#' Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.
60	#' Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.
61	#' Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.
62	#' Long, J. S. and Ervin, L. H. (2000). Using
63	#' heteroscedasticity consistent standard errors in the linear regression
64	#' model. The American Statistician, 54(3): 217-224.
65	#' Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.
66	#' Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.
67	#' Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.
68	#' Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.
69	#' Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.
70	#' Peter H. Westfall (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.
71	#' P. H. Westfall, R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.
72	#' von Hippel, Paul T. 2007. "Regression With Missing Y's: An
73	#' Improved Strategy for Analyzing Multiply Imputed Data." Sociological
74	#' Methodology 37:83-117.
75	#' Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.
76	#' White, H. (1980), A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity.
77	#' Econometrica, 48, 817-838.
78	#' @importFrom flipData Observed
79	#' @importFrom flipFormat Labels FormatAsReal FormatAsPValue RegressionTable OriginalName
80	#' @import flipRegression
81	# #Regression GrandMean vcov.Regression
82	#' @importFrom flipStatistics Frequency
83	#' @importFrom flipTransformations AsNumeric Factor ProcessQVariables
84	#' @importFrom flipRegression PValueAdjustFDR
85	#' @importFrom multcomp glht mcp adjusted
86	#' @importFrom survey regTermTest
87	#' @importFrom stats aov pf vcov ptukey
88	#' @importFrom verbs Sum
89	#' @export
90	OneWayANOVA <- function(outcome,
91	predictor,
92	subset = NULL,
93	weights = NULL,
94	compare = "Pairwise",
95	correction = "Tukey Range",
96	alternative = "Two-sided",
97	robust.se = FALSE,
98	missing = "Exclude cases with missing data",
99	show.labels = TRUE,
100	outcome.name = NULL,
101	predictor.name = NULL,
102	p.cutoff = 0.05,
103	seed = 1223,
104	return.all = FALSE,
105	...)
106	{
107	outcome <- ProcessQVariables(outcome)	214✔
108	predictor <- ProcessQVariables(predictor)	214✔
109	## DS-2353 Ensure vector of unity weights treated same as NULL weights
110	if (!is.null(weights) && all(weights == 1))	214✔
111	weights <- NULL	9✔
112
113	if (robust.se == "No")	214✔
114	robust.se <- FALSE	×
115	.multcompSummary <- function(comparisons, correct, robust.se)	214✔
116	{
117	if (correct == "Tukey Range")	211✔
118	{
119	if (compare == "Pairwise" && comparisons$alternative == "two.sided" && !robust.se)	48✔
120	{
121	res <- summary(comparisons, test = adjusted(type = "none"))	9✔
122
123	# Overwrite pvalues with adjusted
124	compare <- attr(comparisons$linfct, "type")	9✔
125	MSE <- Sum(comparisons$model$residuals^2, remove.missing = FALSE) / comparisons$model$df.residual	9✔
126	means <- comparisons$model$coefficients	9✔
127	means[-1] <- means[-1] + means[1]	9✔
128	counts <- table(comparisons$model$model$predictor)	9✔
129
130	center <- outer(means, means, "-")	9✔
131	keep <- lower.tri(center)	9✔
132	center <- center[keep]	9✔
133	est <- center / (sqrt((MSE/2) * outer(1/counts, 1/counts, "+"))[keep])	9✔
134	pvals <- ptukey(abs(est), length(means), comparisons$model$df.residual, lower.tail = FALSE)	9✔
135	res$test$pvalues <- pvals	9✔
136	return(res)	9✔
137	}
138	res <- tryCatch({summary(comparisons)},	39✔
139	warning = function(w) {warning("Numerical precision of p-value calcuations exceeds threshold. ",	4✔
140	"Treat p-values with caution.")	4✔
141	suppressWarnings(summary(comparisons))})	4✔
142	return(res)	38✔
143	}
144	if (correct == "False Discovery Rate" \|\| correct == "fdr")	163✔
145	{
146	res <- summary(comparisons, test = adjusted(type = "none"))	1✔
147	res$test$pvalues <- PValueAdjustFDR(res$test$pvalues)	1✔
148	return(res)	1✔
149	}
150	return(summary(comparisons, test = adjusted(type = correction)))	162✔
151	}
152	if (is.null(outcome.name))	214✔
153	outcome.name <- OriginalName(outcome)	212✔
154	if (is.null(predictor.name))	214✔
155	predictor.name <- OriginalName(predictor)	212✔
156	subset.description <- try(deparse(substitute(subset)), silent = TRUE) #We don't know whether subset is a variable in the environment or in data.	214✔
157	correct <- correction	214✔
158	no.correction <- correction == "None"	214✔
159	alt <- alternative	214✔
160	alternative <- switch(alternative, "Two-sided" = "two.sided", "Greater" = "greater", "Less" = "less")	214✔
161	correction <- switch(correction, "Holm" = "holm", "Hochberg" = "hochberg", "Hommel" = "hommel", "Bonferroni" = "bonferroni",	214✔
162	"Benjamini & Yekutieli" = "BY","False Discovery Rate" = "fdr", "None" = "none", "Single-step" = "single-step",	214✔
163	"Shaffer" = "Shaffer", "Westfall" = "Westfall", "Free Combinations" = "free",	214✔
164	"Tukey Range" = "none", "Dunnett" = "none")	214✔
165	predictor.label <- if (show.labels) Labels(predictor) else predictor.name	214✔
166	predictor <- tidyFactor(predictor)	214✔
167	#outcome.label = if (show.labels & !is.null(attr(outcome, "label"))) attr(outcome, "label") else outcome.name
168	outcome.label = if (show.labels) Labels(outcome) else outcome.name	214✔
169	# Should be removed when hooking up GLM
170	outcome <- if (is.factor(outcome)) AsNumeric(outcome, binary = FALSE) else outcome	214✔
171	regression <- Regression(outcome ~ predictor,	214✔
172	subset = subset,	214✔
173	missing = missing,	214✔
174	weights = weights,	214✔
175	type = if (is.null(list(...)$type)) "Linear" else list(...)$type,	214✔
176	robust.se = robust.se,	214✔
177	internal = TRUE)	214✔
178	model <- regression$original	210✔
179	robust.se <- regression$robust.se	210✔
180	contrasts <- mcp(predictor = switch(compare, "Pairwise" = "Tukey", "To mean" = "GrandMean", "To first" = "Dunnett"))	210✔
181	vcov <- try(vcov(regression, robust.se), silent = TRUE)	210✔
182	if (!tryError(vcov))	210✔
183	{
184	comparisons <- glht(model, linfct = contrasts, alternative = alternative, vcov = vcov)	210✔
185	set.seed(seed)	210✔
186	mcomp <- try(.multcompSummary(comparisons, correct, robust.se), silent = TRUE)	210✔
187	}
188	if (tryError(vcov) \|\| tryError(mcomp))	210✔
189	{
190	if(robust.se)	1✔
191	{
192	warning("Robust standard errors have not been implemented for this model.")	1✔
193	mcomp <- .multcompSummary(glht(model, linfct = contrasts, alternative = alternative), correct, FALSE)	1✔
194	}
195	else if (!is.null(weights))	×
196	stop("Weights cannot be used with this model. Remove weights to compute the unweighted model.")	×
197	}
198	f.test <- FTest(regression)	210✔
199	sub <- if (is.null(subset)) rep(TRUE, length(predictor)) else subset	210✔
200	if (!is.null(weights))	210✔
201	sub <- sub & weights > 0	10✔
202	predictor.n <- Frequency(predictor, sub)	210✔
203	predictor.n <- predictor.n[predictor.n > 0]	210✔
204	result <- c(f.test,	210✔
205	list(original = mcomp,	210✔
206	robust.se = robust.se,	210✔
207	grand.mean = GrandMean(regression),	210✔
208	correction = correct,	210✔
209	outcome.label = outcome.label,	210✔
210	r.squared = rSquared(regression),	210✔
211	p.cutoff = p.cutoff,	210✔
212	compare = compare,	210✔
213	column.names = names(predictor.n),	210✔
214	n = predictor.n))	210✔
215	# Headers, subtitles, footers
216	mc.correction <- paste0("; multiple comparisons correction: ", correct)	210✔
217	alpha <- paste0("null hypotheses: ", tolower(alt))	210✔
218	robust.se.text <- if (robust.se == FALSE) "" else	210✔
219	paste0("; heteroscedasticity-robust standard errors (", if(robust.se == TRUE) "hc3" else robust.se, ");")	210✔
220	result$posthoc <- paste0(alpha, mc.correction, robust.se.text)	210✔
221	result$subtitle <- if (is.na(f.test$p)) "Error computing p-value" else paste0(if (f.test$p <= p.cutoff) "Significant" else "Not significant",	210✔
222	": F: ", FormatAsReal(f.test$Ftest, 4),	210✔
223	" on ", f.test$df, " and ", f.test$ddf, " degrees-of-freedom; p: ", FormatAsPValue(f.test$p),	210✔
224	"; R-squared: ", FormatAsReal(result$r.squared, 4))	210✔
225	result$title <- paste0("One-way ANOVA: ", outcome.label, " by ", predictor.label)	210✔
226	result$footer <- paste0(regression$sample.description, result$posthoc)	210✔
227	r <- result$original$test	210✔
228	if (no.correction & compare == "To first")	210✔
229	{ # Using more precise results from Regression rather than glht
230	rcoefs <- summary(model)$coef[-1, , drop = FALSE]	5✔
231	k <- nrow(rcoefs)	5✔
232	r$coefficients[1:k] <- rcoefs[1:k, 1] # Retaining the labels.	5✔
233	r$sigma <- rcoefs[, 2]	5✔
234	if (Sum(abs(r$tstat - rcoefs[, 3]) > .1, remove.missing = FALSE))	5✔
235	{
236	print(rbind(svyglm = r$tstat, multcomp = rcoefs[, 3]))	×
237	stop("Unreliable inference in one-way ANOVA.")	×
238	}
239	r$tstat <- rcoefs[, 3]	5✔
240	r$pvalues <- rcoefs[, 4]	5✔
241	}
242	coefs <- r$coefficients + if (compare == "To mean") result$grand.mean else 0	210✔
243	#coef.names <- gsub("predictor", "", names(model$coef)[-1])
244	coefs <- cbind(coefs, r$sigma, r$tstat, r$pvalues)	210✔
245	colnames(coefs) <- c("Estimate", "Std. Error", "t value", "Pr(>\|t\|))")	210✔
246	if (compare == "To mean")	210✔
247	{
248	tbl <- table(predictor[regression$subset])	186✔
249	nms <- names(tbl)[tbl > 0]	186✔
250	rownames(coefs) <- nms	186✔
251	colnames(coefs)[1] <- "Mean"	186✔
252	}
253	else
254	colnames(coefs)[1] <- "Difference"	24✔
255	if (!no.correction)	210✔
256	colnames(coefs)[4] <- "Corrected p"	49✔
257	result$coefs <- coefs	210✔
258	# Creating the final outputs.
259	is.t <- result$original$df > 0	210✔
260	estimate.name <- if (result$compare == "To mean") "Estimate" else "Difference"	210✔
261	p.name <- if(result$correction == "NONE")	210✔
262	"<span style='font-style:italic;'>p</span>"	210✔
263	else
264	"Corrected <span style='font-style:italic;'>p</span>"	210✔
265	result$table <- RegressionTable(result$coefs,	210✔
266	title = result$title,	210✔
267	subtitle = result$subtitle,	210✔
268	footer = result$footer,	210✔
269	estimate.name = estimate.name,	210✔
270	se.name = if (result$robust.se) "Robust SE" else "Standard Error",	210✔
271	p.name = p.name,	210✔
272	p.cutoff = result$p.cutoff)	210✔
273	coefs <- result$coefs	210✔
274	if (!return.all)	210✔
275	result <- list(table = result$table)	16✔
276	attr(result, "ChartData") <- coefs	210✔
277	class(result) <- "OneWayANOVA"	210✔
278	result	210✔
279	}
280
281	#' rSquared
282	#'
283	#' @param object A \link{FitRegression} object.
284	#' @import flipRegression
285	#' @importFrom flipData Observed
286	#' @importFrom flipStatistics Correlation
287	#' @importFrom stats predict
288	rSquared <- function(object)
289	{
290	predicted <- predict(object$original)	210✔
291	observed <- Observed(object)	210✔
292	cor <- Correlation(predicted, observed, object$weights)	210✔
293	cor * cor	210✔
294	}
295
296
297	tryError <- function(x)	784✔
298	inherits(x, "try-error")	784✔
299
300	#' politeWeightedFTest
301	#'
302	#' Computes \link{regTermTest}, but fails savely when an error occurs.
303	#' @param fit A FitRegression object.
304	#' @importFrom flipU OutcomeName
305	#' @importFrom survey regTermTest
306	politeWeightedFTest <- function(fit)
307	{
308	test <- suppressWarnings(try(regTermTest(fit$original, "predictor"), silent = TRUE))	10✔
309	if(tryError(test))	10✔
310	{
311	warning("Unable to compute weighted F-test due to a technical problem with the underlying computations. This error may disappear if you remove the weight.")	×
312	return(list(Ftest = NA, df = NA, ddf = NA, p = NA))	×
313	}
314	test	10✔
315	}
316
317	#' \code{FTest}
318	#'
319	#' An F-Test from a \code{lm} or linear \code{svyglm} object.
320	#' @param fit An object created by \link{FitRegression}.
321	#' @importFrom stats aov
322	#' @export
323	FTest <- function(fit)
324	{
325	if (!is.null(fit$weights))	210✔
326	return(politeWeightedFTest(fit))	10✔
327	aovFTest(fit$original)	200✔
328	}
329
330	aovFTest <- function(model)
331	{
332	f.test <- summary(aov(model))[[1]]	200✔
333	list(Ftest = f.test[1, 4],	200✔
334	df = f.test[1, 1],	200✔
335	ddf = f.test[2, 1],	200✔
336	p = f.test[1, 5])	200✔
337	}
338
339	#' @export
340	print.OneWayANOVA <- function(x, ...)
341	{
342	print(x$table)	×
343	}
344

Displayr / flipAnalysisOfVariance / 3214

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous