• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

Displayr / flipAnalysisOfVariance / 3214

pending completion
3214

push

travis-ci-com

chschan
Untracked: add dependency for car@3.1-0

1039 of 1117 relevant lines covered (93.02%)

479.93 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

95.0
/R/onewayanova.R
1
#' \code{OneWayANOVA}
2
#'
3
#' Computes a one-way analysis of variance with post hoc tests.
4
#' @param outcome The outcome variable.
5
#' @param predictor The factor representing the groups.
6
#' @param subset An optional vector specifying a subset of observations to be
7
#'   used in the fitting process, or, the name of a variable in \code{data}. It
8
#'   may not be an expression. \code{subset} may not
9
#' @param weights An optional vector of sampling weights, or, the name or, the
10
#'   name of a variable in \code{data}. It may not be an expression.
11
#' @param compare One of \code{"To mean", "Pairwise", "To first"} (which implement's Dunnett's C, when
12
#' combined with 'correction' == 'Tukey Range'), or \code{"All"}
13
#' @param correction The multiple comparison adjustment method: \code{"Tukey Range", "None",
14
#' "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni",
15
#' "Free Combinations"} (Westfall et al. 1999), \code{"Hochberg", "Holm",
16
#' "Hommel", "Single-step"} (Bretz et al. 2010) \code{"Shaffer"}, and \code{"Westfall"}.
17
#' @param alternative The alternative hypothesis: "Two sided", "Greater", or "Less". The main application of this is when
18
#' Compare us set 'To first' (e.g., if testing a new product, where the purpose is to work out of the new product is superior
19
#' to an existing product, "Greater" would be chosen).
20
#' @param robust.se If \code{TRUE}, computes standard errors that are robust to violations of
21
#'   the assumption of constant variance for linear and Poisson models, using the HC3 modification of White's (1980) estimator
22
#'   (Long and Ervin, 2000). This parameter is ignored if weights are applied (as weights already
23
#'   employ a sandwich estimator). Other options are \code{FALSE} and \code{"FALSE"No}, which do the same
24
#'   thing, and \code{"hc0"}, \code{"hc1"}, \code{"hc2"}, \code{"hc4"}.
25
#' @param missing How missing data is to be treated in the ANOVA. Options:
26
#'   \code{"Error if missing data"}.
27
#'   \code{"Exclude cases with missing data"}, and
28
#'   \code{"Imputation (replace missing values with estimates)"}.
29
#' @param show.labels Shows the variable labels, as opposed to the labels, in the outputs, where a
30
#' variables label is an attribute (e.g., attr(foo, "label")).
31
#' @param outcome.name The name of the outcome  variable. Only used when \code{show.labels} is FALSE. Defaults to
32
#' the actual variable name, which does not work so well if the function is being called by another function.
33
#' @param predictor.name The name of the predictor variable. Only used when \code{show.labels} is FALSE. Defaults to
34
#' the actual variable name, which does not work so well if the function is being called by another function.
35
#' @param p.cutoff The alpha level to be used in testing.
36
#' @param seed The random number seed used when evaluating the multivariate t-distribution.
37
#' @param return.all If \code{TRUE}, returns all the internal computations in the output object. If \code{FALSE},
38
#' returns just the information required to print the output.
39
#' @param ... Other parameters to be passed to wrapped functions.
40
#' @details When 'Tukey Range' is selected, p-values are computed using t'tests, with a correction for the family-wise
41
#' error rate such that the p-values are correct for the largest range of values being compared (i.e.,
42
#' the biggest difference between the smallest and largest means). This is a single-step test. The method
43
#' of calculation is valid for both balanced and unbalanced samples (Bretz et al. 2011), and consequently the results
44
#' may differ for unbalanced samples to those that appear in most software and books (which instead employee an approximation
45
#' when the samples are unbalanced).
46
#'
47
#' When \code{missing = "Imputation (replace missing values with estimates)"}, all selected
48
#' outcome and predictor variables are included in the imputation, along with
49
#' all \code{auxiliary.data}, excluding cases that are excluded via subset or
50
#' have invalid weights, but including cases with missing values of the outcome variable.
51
#' Then, cases with missing values in the outcome variable are excluded from
52
#' the analysis (von Hippel 2007). See \code{\link[flipImputation]{Imputation}}.
53
#'
54
#' @references
55
#' Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.
56
#' Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.
57
#' Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.
58
#' Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.
59
#' Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.
60
#' Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.
61
#' Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.
62
#' Long, J. S. and Ervin, L. H. (2000). Using
63
#' heteroscedasticity consistent standard errors in the linear regression
64
#' model. The American Statistician, 54(3): 217-224.
65
#' Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.
66
#' Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.
67
#' Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.
68
#' Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.
69
#' Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.
70
#' Peter H. Westfall (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.
71
#' P. H. Westfall, R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.
72
#' von Hippel, Paul T. 2007. "Regression With Missing Y's: An
73
#' Improved Strategy for Analyzing Multiply Imputed Data." Sociological
74
#' Methodology 37:83-117.
75
#' Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.
76
#' White, H. (1980), A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity.
77
#' Econometrica, 48, 817-838.
78
#' @importFrom flipData Observed
79
#' @importFrom flipFormat Labels FormatAsReal FormatAsPValue RegressionTable OriginalName
80
#' @import flipRegression
81
# #Regression GrandMean vcov.Regression
82
#' @importFrom flipStatistics Frequency
83
#' @importFrom flipTransformations AsNumeric Factor ProcessQVariables
84
#' @importFrom flipRegression PValueAdjustFDR
85
#' @importFrom multcomp glht mcp adjusted
86
#' @importFrom survey regTermTest
87
#' @importFrom stats aov pf vcov ptukey
88
#' @importFrom verbs Sum
89
#' @export
90
OneWayANOVA <- function(outcome,
91
                        predictor,
92
                        subset = NULL,
93
                        weights = NULL,
94
                        compare = "Pairwise",
95
                        correction = "Tukey Range",
96
                        alternative = "Two-sided",
97
                        robust.se = FALSE,
98
                        missing = "Exclude cases with missing data",
99
                        show.labels = TRUE,
100
                        outcome.name = NULL,
101
                        predictor.name = NULL,
102
                        p.cutoff = 0.05,
103
                        seed = 1223,
104
                        return.all = FALSE,
105
                        ...)
106
{
107
    outcome <- ProcessQVariables(outcome)
214✔
108
    predictor <- ProcessQVariables(predictor)
214✔
109
    ## DS-2353 Ensure vector of unity weights treated same as NULL weights
110
    if (!is.null(weights) && all(weights == 1))
214✔
111
        weights <- NULL
9✔
112

113
    if (robust.se == "No")
214✔
114
        robust.se <- FALSE
×
115
    .multcompSummary <- function(comparisons, correct, robust.se)
214✔
116
    {
117
        if (correct == "Tukey Range")
211✔
118
        {
119
            if (compare == "Pairwise" && comparisons$alternative == "two.sided" && !robust.se)
48✔
120
            {
121
                res <- summary(comparisons, test = adjusted(type = "none"))
9✔
122

123
                # Overwrite pvalues with adjusted
124
                compare <- attr(comparisons$linfct, "type")
9✔
125
                MSE <- Sum(comparisons$model$residuals^2, remove.missing = FALSE) / comparisons$model$df.residual
9✔
126
                means <- comparisons$model$coefficients
9✔
127
                means[-1] <- means[-1] + means[1]
9✔
128
                counts <- table(comparisons$model$model$predictor)
9✔
129

130
                center <- outer(means, means, "-")
9✔
131
                keep <- lower.tri(center)
9✔
132
                center <- center[keep]
9✔
133
                est <- center / (sqrt((MSE/2) * outer(1/counts, 1/counts, "+"))[keep])
9✔
134
                pvals <- ptukey(abs(est), length(means), comparisons$model$df.residual, lower.tail = FALSE)
9✔
135
                res$test$pvalues <- pvals
9✔
136
                return(res)
9✔
137
            }
138
            res <- tryCatch({summary(comparisons)},
39✔
139
                            warning = function(w) {warning("Numerical precision of p-value calcuations exceeds threshold. ",
4✔
140
                                                           "Treat p-values with caution.")
4✔
141
                                suppressWarnings(summary(comparisons))})
4✔
142
            return(res)
38✔
143
        }
144
        if (correct == "False Discovery Rate" || correct == "fdr")
163✔
145
        {
146
            res <- summary(comparisons, test = adjusted(type = "none"))
1✔
147
            res$test$pvalues <- PValueAdjustFDR(res$test$pvalues)
1✔
148
            return(res)
1✔
149
        }
150
        return(summary(comparisons, test = adjusted(type = correction)))
162✔
151
    }
152
    if (is.null(outcome.name))
214✔
153
        outcome.name <- OriginalName(outcome)
212✔
154
    if (is.null(predictor.name))
214✔
155
        predictor.name <- OriginalName(predictor)
212✔
156
    subset.description <- try(deparse(substitute(subset)), silent = TRUE) #We don't know whether subset is a variable in the environment or in data.
214✔
157
    correct <- correction
214✔
158
    no.correction <- correction == "None"
214✔
159
    alt <- alternative
214✔
160
    alternative <- switch(alternative, "Two-sided" = "two.sided", "Greater" = "greater", "Less" = "less")
214✔
161
    correction <- switch(correction, "Holm" = "holm", "Hochberg" = "hochberg", "Hommel" = "hommel", "Bonferroni" = "bonferroni",
214✔
162
                         "Benjamini & Yekutieli" = "BY","False Discovery Rate" = "fdr", "None" = "none", "Single-step" = "single-step",
214✔
163
                         "Shaffer" = "Shaffer", "Westfall" = "Westfall", "Free Combinations" = "free",
214✔
164
                         "Tukey Range" = "none", "Dunnett" = "none")
214✔
165
    predictor.label <- if (show.labels) Labels(predictor) else predictor.name
214✔
166
    predictor <- tidyFactor(predictor)
214✔
167
    #outcome.label = if (show.labels & !is.null(attr(outcome, "label"))) attr(outcome, "label") else outcome.name
168
    outcome.label = if (show.labels) Labels(outcome) else outcome.name
214✔
169
    # Should be removed when hooking up GLM
170
    outcome <- if (is.factor(outcome)) AsNumeric(outcome, binary = FALSE) else outcome
214✔
171
    regression <- Regression(outcome ~ predictor,
214✔
172
                             subset = subset,
214✔
173
                             missing = missing,
214✔
174
                             weights = weights,
214✔
175
                             type = if (is.null(list(...)$type)) "Linear" else list(...)$type,
214✔
176
                             robust.se = robust.se,
214✔
177
                             internal = TRUE)
214✔
178
    model <- regression$original
210✔
179
    robust.se <- regression$robust.se
210✔
180
    contrasts <- mcp(predictor = switch(compare, "Pairwise" = "Tukey", "To mean" = "GrandMean", "To first" = "Dunnett"))
210✔
181
    vcov <- try(vcov(regression, robust.se), silent = TRUE)
210✔
182
    if (!tryError(vcov))
210✔
183
    {
184
        comparisons <- glht(model, linfct = contrasts, alternative = alternative, vcov = vcov)
210✔
185
        set.seed(seed)
210✔
186
        mcomp <- try(.multcompSummary(comparisons, correct, robust.se), silent = TRUE)
210✔
187
    }
188
    if (tryError(vcov) || tryError(mcomp))
210✔
189
    {
190
        if(robust.se)
1✔
191
        {
192
            warning("Robust standard errors have not been implemented for this model.")
1✔
193
            mcomp <- .multcompSummary(glht(model, linfct = contrasts, alternative = alternative), correct, FALSE)
1✔
194
        }
195
        else if (!is.null(weights))
×
196
            stop("Weights cannot be used with this model. Remove weights to compute the unweighted model.")
×
197
    }
198
    f.test <- FTest(regression)
210✔
199
    sub <- if (is.null(subset)) rep(TRUE, length(predictor)) else subset
210✔
200
    if (!is.null(weights))
210✔
201
        sub <- sub & weights > 0
10✔
202
    predictor.n <- Frequency(predictor, sub)
210✔
203
    predictor.n <- predictor.n[predictor.n > 0]
210✔
204
    result <- c(f.test,
210✔
205
                list(original = mcomp,
210✔
206
                     robust.se = robust.se,
210✔
207
                     grand.mean = GrandMean(regression),
210✔
208
                     correction = correct,
210✔
209
                     outcome.label = outcome.label,
210✔
210
                     r.squared = rSquared(regression),
210✔
211
                     p.cutoff = p.cutoff,
210✔
212
                     compare = compare,
210✔
213
                     column.names = names(predictor.n),
210✔
214
                     n = predictor.n))
210✔
215
    # Headers, subtitles, footers
216
    mc.correction <- paste0("; multiple comparisons correction: ", correct)
210✔
217
    alpha <- paste0("null hypotheses: ", tolower(alt))
210✔
218
    robust.se.text <- if (robust.se == FALSE) "" else
210✔
219
        paste0("; heteroscedasticity-robust standard errors (", if(robust.se == TRUE) "hc3" else robust.se, ");")
210✔
220
    result$posthoc <- paste0(alpha, mc.correction, robust.se.text)
210✔
221
    result$subtitle <- if (is.na(f.test$p)) "Error computing p-value" else paste0(if (f.test$p <= p.cutoff) "Significant" else "Not significant",
210✔
222
                                                                                  ": F: ", FormatAsReal(f.test$Ftest, 4),
210✔
223
                                                                                  " on ", f.test$df, " and ", f.test$ddf, " degrees-of-freedom; p: ", FormatAsPValue(f.test$p),
210✔
224
                                                                                  "; R-squared: ", FormatAsReal(result$r.squared, 4))
210✔
225
    result$title <- paste0("One-way ANOVA: ", outcome.label, " by ", predictor.label)
210✔
226
    result$footer <- paste0(regression$sample.description, result$posthoc)
210✔
227
    r <- result$original$test
210✔
228
    if (no.correction & compare == "To first")
210✔
229
    {   # Using more precise results from Regression rather than glht
230
        rcoefs <- summary(model)$coef[-1, , drop = FALSE]
5✔
231
        k <- nrow(rcoefs)
5✔
232
        r$coefficients[1:k] <- rcoefs[1:k, 1] # Retaining the labels.
5✔
233
        r$sigma <- rcoefs[, 2]
5✔
234
        if (Sum(abs(r$tstat - rcoefs[, 3]) > .1, remove.missing = FALSE))
5✔
235
        {
236
            print(rbind(svyglm = r$tstat, multcomp = rcoefs[, 3]))
×
237
            stop("Unreliable inference in one-way ANOVA.")
×
238
        }
239
        r$tstat <- rcoefs[, 3]
5✔
240
        r$pvalues <- rcoefs[, 4]
5✔
241
    }
242
    coefs <- r$coefficients + if (compare == "To mean") result$grand.mean else 0
210✔
243
    #coef.names <- gsub("predictor", "", names(model$coef)[-1])
244
    coefs <- cbind(coefs, r$sigma, r$tstat, r$pvalues)
210✔
245
    colnames(coefs) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|))")
210✔
246
    if (compare == "To mean")
210✔
247
    {
248
        tbl <- table(predictor[regression$subset])
186✔
249
        nms <- names(tbl)[tbl > 0]
186✔
250
        rownames(coefs) <- nms
186✔
251
        colnames(coefs)[1] <- "Mean"
186✔
252
    }
253
    else
254
        colnames(coefs)[1] <- "Difference"
24✔
255
    if (!no.correction)
210✔
256
        colnames(coefs)[4] <- "Corrected p"
49✔
257
    result$coefs <- coefs
210✔
258
    # Creating the final outputs.
259
    is.t <- result$original$df > 0
210✔
260
    estimate.name <- if (result$compare == "To mean") "Estimate" else "Difference"
210✔
261
    p.name <- if(result$correction == "NONE")
210✔
262
        "<span style='font-style:italic;'>p</span>"
210✔
263
    else
264
        "Corrected <span style='font-style:italic;'>p</span>"
210✔
265
    result$table <- RegressionTable(result$coefs,
210✔
266
                                    title = result$title,
210✔
267
                                    subtitle = result$subtitle,
210✔
268
                                    footer = result$footer,
210✔
269
                                    estimate.name = estimate.name,
210✔
270
                                    se.name = if (result$robust.se) "Robust SE" else "Standard Error",
210✔
271
                                    p.name = p.name,
210✔
272
                                    p.cutoff = result$p.cutoff)
210✔
273
    coefs <- result$coefs
210✔
274
    if (!return.all)
210✔
275
        result <- list(table = result$table)
16✔
276
    attr(result, "ChartData") <- coefs
210✔
277
    class(result) <- "OneWayANOVA"
210✔
278
    result
210✔
279
}
280

281
#' rSquared
282
#'
283
#' @param  object A \link{FitRegression} object.
284
#' @import flipRegression
285
#' @importFrom flipData Observed
286
#' @importFrom flipStatistics Correlation
287
#' @importFrom stats predict
288
rSquared <- function(object)
289
{
290
    predicted <- predict(object$original)
210✔
291
    observed <- Observed(object)
210✔
292
    cor <- Correlation(predicted, observed, object$weights)
210✔
293
    cor * cor
210✔
294
}
295

296

297
tryError <- function(x)
784✔
298
    inherits(x, "try-error")
784✔
299

300
#' politeWeightedFTest
301
#'
302
#' Computes \link{regTermTest}, but fails savely when an error occurs.
303
#' @param fit A FitRegression object.
304
#' @importFrom flipU OutcomeName
305
#' @importFrom survey regTermTest
306
politeWeightedFTest <- function(fit)
307
{
308
    test <- suppressWarnings(try(regTermTest(fit$original, "predictor"), silent = TRUE))
10✔
309
    if(tryError(test))
10✔
310
    {
311
        warning("Unable to compute weighted F-test due to a technical problem with the underlying computations. This error may disappear if you remove the weight.")
×
312
        return(list(Ftest = NA, df = NA, ddf = NA, p = NA))
×
313
    }
314
    test
10✔
315
}
316

317
#' \code{FTest}
318
#'
319
#' An F-Test from a \code{lm} or linear \code{svyglm} object.
320
#' @param fit An object created by \link{FitRegression}.
321
#' @importFrom stats aov
322
#' @export
323
FTest <- function(fit)
324
{
325
    if (!is.null(fit$weights))
210✔
326
        return(politeWeightedFTest(fit))
10✔
327
    aovFTest(fit$original)
200✔
328
}
329

330
aovFTest <- function(model)
331
{
332
    f.test <- summary(aov(model))[[1]]
200✔
333
    list(Ftest = f.test[1, 4],
200✔
334
         df = f.test[1, 1],
200✔
335
         ddf = f.test[2, 1],
200✔
336
         p = f.test[1, 5])
200✔
337
}
338

339
#' @export
340
print.OneWayANOVA <- function(x, ...)
341
{
342
    print(x$table)
×
343
}
344

STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc