multinomial {VGAM} | R Documentation |
Fits a multinomial logit model to a (preferably unordered) factor response.
multinomial(zero = NULL, parallel = FALSE, nointercept = NULL, refLevel = "last")
In the following, the response Y is assumed to be a factor with unordered values 1,2,...,M+1, so that M is the number of linear/additive predictors eta_j.
zero |
An integer-valued vector specifying which
linear/additive predictors are modelled as intercepts only.
Any values must be from the set {1,2,...,M}.
The default value means none are modelled as intercept-only terms.
|
parallel |
A logical, or formula specifying which terms have
equal/unequal coefficients.
|
nointercept |
An integer-valued vector specifying which
linear/additive predictors have no intercepts.
Any values must be from the set {1,2,...,M}.
|
refLevel |
Either a single positive integer or a value of the factor.
If an integer then it specifies which
column of the response matrix is the reference or baseline level.
The default is the last one (the (M+1)th one).
If used, this argument will be often assigned the value 1 .
If inputted as a value of a factor then beware of missing values
of certain levels of the factor
(drop.unused.levels=TRUE or
drop.unused.levels=FALSE ).
See the example below.
|
The default model can be written
eta_j = log(P[Y=j]/ P[Y=M+1])
where eta_j is the jth linear/additive predictor.
Here, j=1,...,M, and eta_{M+1} is 0 by
definition. That is, the last level of the factor, or last column of
the response matrix, is taken as the reference level or baseline—this
is for identifiability of the parameters.
The reference or baseline level can be changed with the
refLevel
argument.
In almost all the literature, the constraint matrices associated
with this family of models are known. For example, setting
parallel=TRUE
will make all constraint matrices (except for
the intercept) equal to a vector of M 1's. If the constraint
matrices are unknown and to be estimated, then this can be achieved
by fitting the model as a reduced-rank vector generalized linear model
(RR-VGLM; see rrvglm
). In particular, a multinomial logit
model with unknown constraint matrices is known as a stereotype model
(Anderson, 1984), and can be fitted with rrvglm
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
No check is made to verify that the response is nominal.
The arguments zero
and nointercept
can be inputted
with values that fail. For example, multinomial(zero=2,
nointercept=1:3)
means the second linear/additive predictor is
identically zero, which will cause a failure.
Be careful about the use of other potentially contradictory constraints,
e.g., multinomial(zero=2, parallel = TRUE ~ x3)
. If in doubt,
apply constraints()
to the fitted object to check.
The response should be either a matrix of counts (with row sums that are
all positive), or a factor. In both cases, the y
slot returned
by vglm
/vgam
/rrvglm
is the
matrix of sample proportions.
The multinomial logit model is more appropriate for a nominal
(unordered) factor response than for an ordinal (ordered) factor
response.
Models more suited for the latter include those based on cumulative
probabilities, e.g., cumulative
.
multinomial
is prone to numerical difficulties if the groups
are separable and/or the fitted probabilities are close to 0 or 1.
The fitted values returned are estimates of the probabilities
P[Y=j] for j=1,...,M+1.
Here is an example of the usage of the parallel
argument.
If there are covariates x2
, x3
and x4
, then
parallel = TRUE ~ x2 + x3 - 1
and
parallel = FALSE ~ x4
are equivalent. This would constrain
the regression coefficients for x2
and x3
to be equal;
those of the intercepts and x4
would be different.
In Example 4 below, a conditional logit model is fitted to an artificial
data set that explores how cost and travel time affect people's
decision about how to travel to work. Walking is the baseline group.
The variable Cost.car
is the difference between the cost of
travel to work by car and walking, etc. The variable Time.car
is the difference between the travel duration/time to work by car and
walking, etc. For other details about the xij
argument see
vglm.control
and fill
.
The multinom
function in the nnet package
uses the first level of the factor as baseline, whereas the last
level of the factor is used here. Consequently the estimated
regression coefficients differ.
Thomas W. Yee
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.
Agresti, A. (2002) Categorical Data Analysis, 2nd ed. New York: Wiley.
Hastie, T. J., Tibshirani, R. J. and Friedman, J. H. (2009) The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York: Springer-Verlag.
Simonoff, J. S. (2003) Analyzing Categorical Data, New York: Springer-Verlag.
Anderson, J. A. (1984) Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
margeff
,
cumulative
,
acat
,
cratio
,
sratio
,
dirichlet
,
dirmultinomial
,
rrvglm
,
fill1
,
Multinomial
,
iris
.
The author's homepage has further documentation about
categorical data analysis using VGAM.
# Example 1: fit a multinomial logit model to Edgar Anderson's iris data data(iris) ## Not run: fit = vglm(Species ~ ., multinomial, iris) coef(fit, matrix=TRUE) ## End(Not run) # Example 2a: a simple example ymat = t(rmultinom(10, size = 20, prob=c(0.1,0.2,0.8))) # Counts fit = vglm(ymat ~ 1, multinomial) head(fitted(fit)) # Proportions fit@prior.weights # Not recommended for extraction of prior weights weights(fit, type="prior", matrix=FALSE) # The better method fit@y # Sample proportions constraints(fit) # Constraint matrices # Example 2b: Different reference level used as the baseline fit2 = vglm(ymat ~ 1, multinomial(refLevel=2)) coef(fit2, matrix=TRUE) coef(fit , matrix=TRUE) # Easy to reconcile this output with fit2 # Example 2c: Different input to Example 2a but same result w = apply(ymat, 1, sum) # Prior weights yprop = ymat / w # Sample proportions fitprop = vglm(yprop ~ 1, multinomial, weights=w) head(fitted(fitprop)) # Proportions weights(fitprop, type="prior", matrix=FALSE) fitprop@y # Same as the input # Example 3: The response is a factor. nn = 10 dframe3 = data.frame(yfactor = gl(3, nn, labels=c("Control","Trt1","Trt2")), x = runif(3 * nn)) myrefLevel = with(dframe3, yfactor[12]) fit3a = vglm(yfactor ~ x, multinomial(refLevel=myrefLevel), data=dframe3) fit3b = vglm(yfactor ~ x, multinomial(refLevel=2), data=dframe3) coef(fit3a, matrix=TRUE) # "Treatment1" is the reference level coef(fit3b, matrix=TRUE) # "Treatment1" is the reference level margeff(fit3b) # Example 4: Fit a rank-1 stereotype model data(car.all) fit4 = rrvglm(Country ~ Width + Height + HP, multinomial, car.all, Rank=1) coef(fit4) # Contains the C matrix constraints(fit4)$HP # The A matrix coef(fit4, matrix=TRUE) # The B matrix Coef(fit4)@C # The C matrix ccoef(fit4) # Better to get the C matrix this way Coef(fit4)@A # The A matrix svd(coef(fit4, matrix=TRUE)[-1,])$d # This has rank 1; = C # Example 5: The use of the xij argument (aka conditional logit model) set.seed(111) nn = 100 # Number of people who travel to work M = 3 # There are M+1 models of transport to go to work ymat = matrix(0, nn, M+1) ymat[cbind(1:nn, sample(x=M+1, size=nn, replace=TRUE))] = 1 dimnames(ymat) = list(NULL, c("bus","train","car","walk")) gotowork = data.frame(cost.bus = runif(nn), time.bus = runif(nn), cost.train= runif(nn), time.train= runif(nn), cost.car = runif(nn), time.car = runif(nn), cost.walk = runif(nn), time.walk = runif(nn)) gotowork = round(gotowork, dig=2) # For convenience gotowork = transform(gotowork, Cost.bus = cost.bus - cost.walk, Cost.car = cost.car - cost.walk, Cost.train = cost.train - cost.walk, Cost = cost.train - cost.walk, # for labelling Time.bus = time.bus - time.walk, Time.car = time.car - time.walk, Time.train = time.train - time.walk, Time = time.train - time.walk) # for labelling fit = vglm(ymat ~ Cost + Time, multinomial(parall=TRUE ~ Cost + Time - 1), xij = list(Cost ~ Cost.bus + Cost.train + Cost.car, Time ~ Time.bus + Time.train + Time.car), form2 = ~ Cost + Cost.bus + Cost.train + Cost.car + Time + Time.bus + Time.train + Time.car, data=gotowork, trace=TRUE) head(model.matrix(fit, type="lm")) # LM model matrix head(model.matrix(fit, type="vlm")) # Big VLM model matrix coef(fit) coef(fit, matrix=TRUE) constraints(fit) summary(fit) max(abs(predict(fit)-predict(fit, new=gotowork))) # Should be 0