Title: | Analysis of Means |
---|---|
Description: | Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with 'multcomp', 'SimComp', 'nparcomp', or 'MCPAN') or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean. |
Authors: | Philip Pallmann |
Maintainer: | Philip Pallmann <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5 |
Built: | 2024-11-03 04:06:33 UTC |
Source: | https://github.com/philippallmann/anom |
Compute an analysis of means (i.e., a multiple contrast test involving comparisons of each group versus the grand mean) and draw a decision chart as commonly used in technometrics.
ANOM(mc, xlabel=NULL, ylabel=NULL, printn=TRUE, printp=TRUE, stdep=NULL, stind=NULL, pst=NULL, pbin=NULL, bg="white", bgrid=TRUE, axlsize=18, axtsize=25, npsize=5, psize=5, lwidth=1, dlstyle="dashed", fillcol="darkgray")
ANOM(mc, xlabel=NULL, ylabel=NULL, printn=TRUE, printp=TRUE, stdep=NULL, stind=NULL, pst=NULL, pbin=NULL, bg="white", bgrid=TRUE, axlsize=18, axtsize=25, npsize=5, psize=5, lwidth=1, dlstyle="dashed", fillcol="darkgray")
mc |
An object of class |
xlabel |
An optional character string specifying the label of the horizontal axis. |
ylabel |
An optional character string specifying the label of the vertical axis. |
printn |
A logical. Should per-group sample sizes be included in the chart? Default is |
printp |
A logical. Should simultaneous p-values be included in the chart? Default is |
stdep |
A numerical vector giving the values of the dependent variable. Only required if an object of class |
stind |
A factor specifying the levels of the independent variable. Only required if an object of class |
pst |
An object of class |
pbin |
An object of class |
bg |
A character string. Should the plot's background be |
bgrid |
A logical. Should background grid lines be plotted? Default is |
axlsize |
A numerical value specifying the font size of the axis labels. Default is |
axtsize |
A numerical value specifying the font size of the axis titles. Default is |
npsize |
A numerical value specifying the font size of the sample sizes and p-values (if printed). Default is |
psize |
A numerical value specifying the size of the points (group means). Default is |
lwidth |
A numerical value specifying the width of the lines (grand mean, decision limits, vertical connections). Default is |
dlstyle |
A character string specifying the style of the decision limits. Default is |
fillcol |
A character string specifying the color of the area of no significant deviation from the grand mean. Default is |
The 'standard' version of ANOM is invoked by inserting a glht
object (created with function glht
from package multcomp
using a contrast matrix of type GrandMean
) for mc
. The glht
object must be based on one of the model types aov
, lm
, glm
, gls
, lme
, or lmer
. That is, ANOM is feasible not only for simple linear (ANOVA) models with Gaussian data, but indeed for a broad range of (semi-)parametric models e.g., with Poisson or binomial data, hierarchical and clustered structures, and more (Hothorn et al. 2008). If the underlying model is a Poisson or binomial GLM (involving log and logit link functions, respectively), the effects are automatically transformed back to counts (Poisson) or proportions (binomial).
When analyzing binomial data, we need to distinguish between two data formats. As long as replicated observations of proportions are available for each group (i.e., a within-group variance can be estimated), we can fit a binomial GLM. However, if there is only one proportion per group (i.e., we have a 2-by-k data table), we need to fall back on some simpler procedure than a GLM. In the case of ANOM, we insert a binomRDci
object (created with function binomRDci
from package MCPAN
) for mc
with the contrasts being of type GrandMean
. If printing out simultaneous p-values with the ANOM decision chart is desired (printp=TRUE
), a binomRDtest
object (generated with function binomRDtest
) must be inserted for pbin
(see examples). Notice that printn=TRUE
prints the total sample size per group when inserting an object of class binomRDci
for mc
; by contrast, inserting a glht
object based on a binomial GLM makes printn=TRUE
print the number of independent observations of proportions per group.
There are two options for coping with heterogeneous variances. The 'standard' version using glht
objects is basically designed for homoscedastic data but can also cope with unequal variances by employing sandwich covariance estimates (Herberich et al. 2010). All you have to do is set the option vcov=vcovHC
in the glht
call (see examples).
A different heterogeneity adjustment using multiple Satterthwaite degrees of freedom (Hasler and Hothorn 2008) is invoked by inserting a SimCi
object (created with function SimCiDiff
from package SimComp
with the setting covar.equal=FALSE
) for mc
. Notice that covar.equal=TRUE
would compute a 'standard' multiple contrast test assuming equal variances, which is just the opposite of what we want.
Another instance when the SimComp
package proves useful is for ANOM with ratios i.e., we assess each group's percentage change in comparison to the grand mean (which is always 100%). Again, we insert an object of class SimCi
for mc
but now we generated it with function SimCiRatio
instead of SimCiDiff
. Setting covar.equal
to TRUE
gives the homoscedastic version whereas FALSE
calls the heteroscedastic variant using multiple degrees of freedom.
All ANOM functionality based on objects from SimComp
requires you to submit the data (dependent and independent variable via stdep
and stind
, respectively) separately (see examples). Moreover, if you wish to print simultaneous p-values with your ANOM decision chart, you may compute them using function SimTestDiff
or SimTestRat
and insert the resulting object for pst
. Again, don't forget to set the option covar.equal=FALSE
if you want to account for heteroscedasticity. Make sure in all cases to perform comparisons of type GrandMean
.
Nonparametric ANOM is performed by inserting an object of class mctp
(created with function mctp
from package nparcomp
) for mc
. It is based on estimation of relative effects via global pseudo-rankings as proposed by Konietschke et al. (2012). This procedure can naturally cope with heteroscedasticity in the data.
The relative effect of two independent random variables X_1 and X_2 following some distributions F_1 and F_2, respectively, is generally defined as
Loosely speaking, p is the probability that X_1 takes smaller values than X_2 (plus half the probability of taking equal values). Hence when p<0.5, X_1 is stochastically more likely to take larger values than X_2, and vice versa for p>0.5.
Creating the mctp
object requires to specify a grand-mean-type contrast matrix by hand (see examples) as it is not among the options provided by the package nparcomp
. For the test statistics you may choose one out of three asymptotic approximation methods:
a multivariate t approximation with Satterthwaite degrees of freedom (asy.method="mult.t"
), which works reasonably well most of the time,
a multivariate normal approximation (asy.method="normal"
), which is unfavorable with small sample sizes,
a Fisher transform (asy.method="fisher"
), which ensures that the decision limits preserve the range of [-1, 1].
Make sure that the argument correlation
in the function mctp
is set to TRUE
. Do not use the function nparcomp
instead of mctp
since it does not involve global ranking and is inoperative with ANOM-type contrast matrices.
An ANOM decision chart.
Notice that some of the more sophisticated ANOM variants (ratios, nonparametric, heteroscedastic with multiple degrees of freedom) are limited to one-way layouts without covariates.
Philip Pallmann [email protected]
Djira, G. D., Hothorn, L. A. (2009) Detecting relative changes in multiple comparisons with an overall mean. Journal of Quality Technology 41(1), 60-65.
Hasler, M. and Hothorn, L. A. (2008) Multiple contrast tests in the presence of heteroscedasticity. Biometrical Journal 50(5), 793–800.
Herberich, E., Sikorski, J., Hothorn, T. (2010) A robust procedure for comparing multiple means under heteroscedasticity in unbalanced designs. PLoS One 5(3), e9788.
Hothorn, T., Bretz, F., Westfall, P. (2008) Simultaneous inference in general parametric models. Biometrical Journal 50(3), 346–363.
Konietschke, F., Hothorn, L. A., Brunner, F. (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electronic Journal of Statistics 6, 738–759.
Pallmann, P. and Hothorn, L. A. (2016) Analysis of means (ANOM): A generalized approach using R. Journal of Applied Statistics, 43(8), 1541–1560.
############################################### ### Standard ANOM (Gaussian, homoscedastic) ### ############################################### ### Devices of which brand filter bacteria significantly worse? head(waterfilter) str(waterfilter) library(multcomp) model <- lm(colonies ~ brand, waterfilter) hom <- glht(model, mcp(brand="GrandMean"), alternative="less") ANOM(hom) ############################ ### Heteroscedastic ANOM ### ############################ ## With sandwich covariance matrix estimate (Herberich et al. 2010) library(multcomp) library(sandwich) het1 <- glht(model, mcp(brand="GrandMean"), alternative="less", vcov=vcovHC) ANOM(het1) ## With multiple degrees of freedom (Hasler and Hothorn 2008) library(SimComp) het2 <- SimCiDiff(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) het2p <- SimTestDiff(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) ANOM(het2, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=het2p) ####################### ### ANOM for ratios ### ####################### ## Homoscedastic library(SimComp) rel <- SimCiRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=TRUE) relp <- SimTestRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=TRUE) ANOM(rel, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=relp) ## Heteroscedastic (with multiple degrees of freedom) library(SimComp) relh <- SimCiRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) relhp <- SimTestRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) ANOM(relh, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=relhp) ########################## ### Nonparametric ANOM ### ########################## # Compute sample sizes per group ss <- tapply(waterfilter$colonies, waterfilter$brand, length) # Build a grand-mean-type contrast matrix library(multcomp) Mat <- contrMat(ss, "GrandMean") ## Using a multivariate t approximation library(nparcomp) mult <- mctp(colonies ~ brand, data=waterfilter, type="UserDefined", contrast.matrix=Mat, alternative="less", info=FALSE, correlation=TRUE, asy.method="mult.t") ANOM(mult) ## Using a range-preserving Fisher transform library(nparcomp) fish <- mctp(colonies ~ brand, data=waterfilter, type="UserDefined", contrast.matrix=Mat, alternative="less", info=FALSE, correlation=TRUE, asy.method="fisher") ANOM(fish) ##################################### ### ANOM for binomial proportions ### ##################################### ### Which schools' math achievements differ from the grand mean? head(math) str(math) ## Based on Wald-type confidence intervals library(MCPAN) wald <- binomRDci(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="Wald", type="GrandMean") waldp <- binomRDtest(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="Wald", type="GrandMean") ANOM(wald, pbin=waldp) ## Based on add-2 confidence intervals library(MCPAN) add2 <- binomRDci(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="ADD2", type="GrandMean") add2p <- binomRDtest(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="ADD2", type="GrandMean") ANOM(add2, pbin=add2p) ########################## ### ANOM for variances ### ########################## ### Springs of which brand are significantly more variable? head(spring) str(spring) # Compute the median weight per brand spring$median <- tapply(spring$weight, spring$brand, median)[spring$brand] # Compute the absolute deviations from the median (robust Levene residuals) spring$absdev <- with(spring, abs(weight - median)) library(multcomp) mod <- lm(absdev ~ brand, spring) test <- glht(mod, mcp(brand="GrandMean")) ANOM(test)
############################################### ### Standard ANOM (Gaussian, homoscedastic) ### ############################################### ### Devices of which brand filter bacteria significantly worse? head(waterfilter) str(waterfilter) library(multcomp) model <- lm(colonies ~ brand, waterfilter) hom <- glht(model, mcp(brand="GrandMean"), alternative="less") ANOM(hom) ############################ ### Heteroscedastic ANOM ### ############################ ## With sandwich covariance matrix estimate (Herberich et al. 2010) library(multcomp) library(sandwich) het1 <- glht(model, mcp(brand="GrandMean"), alternative="less", vcov=vcovHC) ANOM(het1) ## With multiple degrees of freedom (Hasler and Hothorn 2008) library(SimComp) het2 <- SimCiDiff(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) het2p <- SimTestDiff(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) ANOM(het2, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=het2p) ####################### ### ANOM for ratios ### ####################### ## Homoscedastic library(SimComp) rel <- SimCiRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=TRUE) relp <- SimTestRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=TRUE) ANOM(rel, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=relp) ## Heteroscedastic (with multiple degrees of freedom) library(SimComp) relh <- SimCiRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) relhp <- SimTestRat(data=waterfilter, grp="brand", resp="colonies", type="GrandMean", alternative="less", covar.equal=FALSE) ANOM(relh, stdep=waterfilter$colonies, stind=waterfilter$brand, pst=relhp) ########################## ### Nonparametric ANOM ### ########################## # Compute sample sizes per group ss <- tapply(waterfilter$colonies, waterfilter$brand, length) # Build a grand-mean-type contrast matrix library(multcomp) Mat <- contrMat(ss, "GrandMean") ## Using a multivariate t approximation library(nparcomp) mult <- mctp(colonies ~ brand, data=waterfilter, type="UserDefined", contrast.matrix=Mat, alternative="less", info=FALSE, correlation=TRUE, asy.method="mult.t") ANOM(mult) ## Using a range-preserving Fisher transform library(nparcomp) fish <- mctp(colonies ~ brand, data=waterfilter, type="UserDefined", contrast.matrix=Mat, alternative="less", info=FALSE, correlation=TRUE, asy.method="fisher") ANOM(fish) ##################################### ### ANOM for binomial proportions ### ##################################### ### Which schools' math achievements differ from the grand mean? head(math) str(math) ## Based on Wald-type confidence intervals library(MCPAN) wald <- binomRDci(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="Wald", type="GrandMean") waldp <- binomRDtest(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="Wald", type="GrandMean") ANOM(wald, pbin=waldp) ## Based on add-2 confidence intervals library(MCPAN) add2 <- binomRDci(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="ADD2", type="GrandMean") add2p <- binomRDtest(n=math$enrolled, x=math$proficient, names=math$school, alternative="two.sided", method="ADD2", type="GrandMean") ANOM(add2, pbin=add2p) ########################## ### ANOM for variances ### ########################## ### Springs of which brand are significantly more variable? head(spring) str(spring) # Compute the median weight per brand spring$median <- tapply(spring$weight, spring$brand, median)[spring$brand] # Compute the absolute deviations from the median (robust Levene residuals) spring$absdev <- with(spring, abs(weight - median)) library(multcomp) mod <- lm(absdev ~ brand, spring) test <- glht(mod, mcp(brand="GrandMean")) ANOM(test)
Graphical representation of the analysis of means: convert simultaneous confidence intervals (that were computed with ANY method) into ANOM decision limits and draw a decision chart as commonly used in technometrics.
ANOMgen(mu, n=NULL, gm=NULL, lo, up, names, alternative="two.sided", xlabel="Group", ylabel="Endpoint", printn=TRUE, p=NULL, bg="white", bgrid=TRUE, axlsize=18, axtsize=25, npsize=5, psize=5, lwidth=1, dlstyle="dashed", fillcol="darkgray")
ANOMgen(mu, n=NULL, gm=NULL, lo, up, names, alternative="two.sided", xlabel="Group", ylabel="Endpoint", printn=TRUE, p=NULL, bg="white", bgrid=TRUE, axlsize=18, axtsize=25, npsize=5, psize=5, lwidth=1, dlstyle="dashed", fillcol="darkgray")
mu |
A numeric vector of group means. |
n |
A numeric vector of sample sizes per group. Either |
gm |
A single numeric value giving the grand mean of all groups. Either |
lo |
A numeric vector of lower (simultaneous) confidence interval bounds for comparisons to the grand mean. |
up |
A numeric vector of upper (simultaneous) confidence interval bounds for comparisons to the grand mean. |
names |
An (optional) vector of characters specifying the groups' names. |
alternative |
A character string indicating the direction of the alternative hypothesis. Default is |
xlabel |
A character string specifying the label of the horizontal axis. |
ylabel |
A character string specifying the label of the vertical axis. |
printn |
A logical. Should per-group sample sizes be included in the chart? Default is |
p |
An (optional) numeric vector of (simultaneous) p-values to be printed. |
bg |
A character string. Should the plot's background be |
bgrid |
A logical. Should background grid lines be plotted? Default is |
axlsize |
A numerical value specifying the font size of the axis labels. Default is |
axtsize |
A numerical value specifying the font size of the axis titles. Default is |
npsize |
A numerical value specifying the font size of the sample sizes and p-values (if printed). Default is |
psize |
A numerical value specifying the size of the points (group means). Default is |
lwidth |
A numerical value specifying the width of the lines (grand mean, decision limits, vertical connections). Default is |
dlstyle |
A character string specifying the style of the decision limits. Default is |
fillcol |
A character string specifying the color of the area of no significant deviation from the grand mean. Default is |
This is a generic tool that translates (simultaneous) confidence intervals into ANOM decision limits.
An ANOM decision chart.
The confidence intervals must arise from comparisons to the grand mean; otherwise the ANOM chart is meaningless!
Philip Pallmann [email protected]
Pallmann, P. and Hothorn, L. A. (2016) Analysis of means (ANOM): A generalized approach using R. Journal of Applied Statistics, 43(8), 1541–1560.
### A toy example (n given, two-sided) groupmeans <- c(2.8, 2.3, 3.4, 5.6) samplesizes <- c(5, 5, 10, 5) low <- c(-1.2, -1.7, -0.4, 1.6) upp <- c(-0.2, -0.7, 0.2, 2.6) names <- c("1st", "2nd", "3rd", "4th") ANOMgen(mu=groupmeans, n=samplesizes, lo=low, up=upp, names=names, alternative="two.sided") ### Another toy example (gm given, one-sided, with p-values) groupmeans <- c(2.8, 2.3, 3.4, 5.6) gm <- 3.5 low <- rep(-Inf, 4) upp <- c(-0.2, -0.7, 0.2, 2.6) names <- c("1st", "2nd", "3rd", "4th") pvalues <- c(0.01, 0.003, 0.8, 1) ANOMgen(mu=groupmeans, gm=gm, lo=low, up=upp, names=names, alternative="less", p=pvalues)
### A toy example (n given, two-sided) groupmeans <- c(2.8, 2.3, 3.4, 5.6) samplesizes <- c(5, 5, 10, 5) low <- c(-1.2, -1.7, -0.4, 1.6) upp <- c(-0.2, -0.7, 0.2, 2.6) names <- c("1st", "2nd", "3rd", "4th") ANOMgen(mu=groupmeans, n=samplesizes, lo=low, up=upp, names=names, alternative="two.sided") ### Another toy example (gm given, one-sided, with p-values) groupmeans <- c(2.8, 2.3, 3.4, 5.6) gm <- 3.5 low <- rep(-Inf, 4) upp <- c(-0.2, -0.7, 0.2, 2.6) names <- c("1st", "2nd", "3rd", "4th") pvalues <- c(0.01, 0.003, 0.8, 1) ANOMgen(mu=groupmeans, gm=gm, lo=low, up=upp, names=names, alternative="less", p=pvalues)
Hemoglobin levels of 30 male cancer patients treated with radiation or chemotherapy and one of three drugs.
data(hemoglobin)
data(hemoglobin)
A data frame with 30 observations on the following 3 variables.
therapy
A factor with 2 levels giving the types of therapy.
drug
A factor with 3 levels giving the drugs administered.
level
A numeric vector giving the patients' hemoglobin levels.
This is a complete balanced two-way layout. 15 male cancer patients were radiated, and another 15 underwent chemotherapy. In addition, the patients were treated with either drug 1, 2, or 3. The endpoint of interest was the level of hemoglobin (in grams per deciliter blood).
Nelson, P. R., Wludyka, P. S., Copeland, K. A. F. (2005) The Analysis of Means: A Graphical Method for Comparing Means, Rates, and Proportions. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, and American Statistical Association (ASA), Alexandria, VA, pp. 71 ff.
data(hemoglobin) str(hemoglobin)
data(hemoglobin) str(hemoglobin)
Proportion of fifth graders with proficient math test scores in 10 elementary schools.
data(math)
data(math)
A data frame with 10 observations on the following 3 variables.
school
A factor with 10 levels giving the ID of the school.
enrolled
A numeric vector giving the number of students taking part in the math test.
proficient
A numeric vector giving the number of students with proficient math test scores.
A study compared math achievements of students from 10 elementary schools in a U.S. district; 6 of them were conventional neighborhood schools (N1
–N6
) and 4 alternative schools (A1
–A4
). 563 fifth graders took standardized math tests, and each school's proportion of students who scored proficient was recorded.
Nelson, P. R., Wludyka, P. S., Copeland, K. A. F. (2005) The Analysis of Means: A Graphical Method for Comparing Means, Rates, and Proportions. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, and American Statistical Association (ASA), Alexandria, VA, pp. 42–43.
data(math) str(math)
data(math) str(math)
Weights required to stretch springs of four brands by 0.1 inches.
data(spring)
data(spring)
A data frame with 24 observations on the following 2 variables.
brand
A factor with 4 levels giving the brands of springs.
weight
A numeric vector giving the weight required to extend the spring by 0.1 inches.
Nelson, P. R., Wludyka, P. S., Copeland, K. A. F. (2005) The Analysis of Means: A Graphical Method for Comparing Means, Rates, and Proportions. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, and American Statistical Association (ASA), Alexandria, VA, p. 53.
data(spring) str(spring)
data(spring) str(spring)
Filtering performances of seven brands of water filters, measured as the number of bacterial colonies growing on each device.
data(waterfilter)
data(waterfilter)
A data frame with 20 observations on the following 2 variables.
brand
A factor with 7 levels giving the brands of water filters.
colonies
A numeric vector giving the number of bacterial colonies found on each filter.
A high number of bacterial colonies on a filter corresponds to good performance of this particular device. Note that the dataset is unbalanced (n=2 for brands 4 and 7, n=3 for all other brands).
Hsu, J. C. (1984) Ranking and selection and multiple comparisons with the best. In: Santner, T. J. and Tamhane, A. C. (Editors) Design of Experiments: Ranking and Selection (Essays in Honor of Robert E. Bechhofer). Marcel Dekker, New York, NY, pp. 23–33.
Westfall, P. H., Tobias, R. D., Wolfinger, R. D. (2011) Multiple Comparisons and Multiple Tests Using SAS, Second Edition. SAS Institute Inc., Cary, NC, pp. 592–593.
data(waterfilter) str(waterfilter)
data(waterfilter) str(waterfilter)