adjOutlyingness {robustbase}R Documentation

Compute Skewness-adjusted Multivariate Outlyingness

Description

For an n * p data matrix (or data frame) x, compute the “outlyingness” of all n observations. Outlyingness here is a generalization of the Donoho-Stahel outlyingness measure, where skewness is taken into account via the medcouple, mc().

Usage

adjOutlyingness(x, ndir = 250, clower = 3, cupper = 4,
                alpha.cutoff = 0.75, coef = 1.5, qr.tol = 1e-12)

Arguments

x a numeric matrix or data.frame.
ndir positive integer specifying the number of directions that should be searched.
clower, cupper the constant to be used for the lower and upper tails, in order to transform the data towards symmetry.
alpha.cutoff number in (0,1) specifying the quantiles (α, 1-α) which determine the “outlier” cutoff.
coef positive number specifying the factor with which the interquartile range (IQR) is multiplied to determine ‘boxplot hinges’-like upper and lower bounds.
qr.tol positive tolerance to be used for qr and solve.qr for determining the ndir directions each determined by a random sample of p (out of n) observations.

Details

FIXME: Details in the comment of the Matlab code; also in the reference(s).

The method as described can be useful as preprocessing in FASTICA (http://www.cis.hut.fi/projects/ica/fastica/

Value

a list with components

adjout numeric of length(n) giving the adjusted outlyingness of each observation.
cutoff cutoff for “outlier” with respect to the adjusted outlyingnesses, and depending on alpha.cutoff.
nonOut logical of length(n), TRUE when the corresponding observation is non-outlying with respect to the cutoff and the adjusted outlyingnesses.

Author(s)

Guy Brys; help page and improvements by Martin Maechler

References

Brys, G., Hubert, M., and Rousseeuw, P.J. (2005) A Robustification of Independent Component Analysis; Journal of Chemometrics, 19, 1–12.

For the up-to-date reference, please consult http://wis.kuleuven.be/stat/robust.html

See Also

the adjusted boxplot, adjbox and the medcouple, mc.

Examples

## An Example with bad condition number and "border case" outliers

if(FALSE) {## Not yet ok, because of bug in adjOutl
  dim(longley)
  set.seed(1) ## result is random 
  ao1 <- adjOutlyingness(longley)
  ## which are not outlying ?
  table(ao1$nonOut)  ## all of them
  stopifnot(all(ao1$nonOut))
}

## An Example with outliers :

dim(hbk)
set.seed(1)
ao.hbk <- adjOutlyingness(hbk)
str(ao.hbk)
hist(ao.hbk $adjout)## really two groups
table(ao.hbk$nonOut)## 14 outliers, 61 non-outliers:
## outliers are :
which(! ao.hbk$nonOut) # 1 .. 14   --- but not for all random seeds!

## here, they are the same as found by (much faster) MCD:
cc <- covMcd(hbk)
stopifnot(all(cc$mcd.wt == ao.hbk$nonOut))

## This is revealing (about 1--2 cases, where outliers are *not* == 1:14
##  but needs almost 1 [sec] per call:
if(interactive()) {
  for(i in 1:30) {
    print(system.time(ao.hbk <- adjOutlyingness(hbk)))
    if(!identical(iout <- which(!ao.hbk$nonOut), 1:14)) {
         cat("Outliers:\n"); print(iout)
    }
  }
}


[Package robustbase version 0.4-3 Index]