首页 > 美文鉴赏

lclogit潜类别logit模型(latent class logit model)教程

更新时间:2023-07-01 00:05:43 阅读：评论：0

The Stata Journal (2013)

13,Number 3,pp.625–639

lclogit:A Stata command for ﬁtting

latent-class conditional logit models via the

expectation-maximization algorithm

Daniele Paciﬁco

Italian

Department of the Treasury

Rome,Italy

daniele.paciﬁco@tesoro.it Hong il Yoo Durham University Durham,UK

@durham.ac.uk

Abstract.In this article,we describe lclogit ,a Stata command for ﬁtting

a discrete-mixture or latent-class logit model via the expectation-maximization

algorithm.

Keywords:st0312,lclogit,lclogitpr,lclogitcov,lclogitml,latent-class model,ex-

pectation-maximization algorithm,mixed logit

1Introduction

猪耳朵卤多长时间Mixed logit or random parameter logit is ud in many empirical applications to cap-ture more realistic substitution patterns than traditional conditional logit.The ran-dom parameters are usually assumed to follow a normal distribution,and the resulting model is ﬁt through simulated maximum likelihood,as in Hole ’s (2007)Stata command mixlogit .Several recent studies,however,note potential gains from specifying a dis-crete instead of normal mixing distribution,including the ability to approximate the true parameter distribution more ﬂexibly at lower computational costs.1

Paciﬁco (2012)implements the expectation-maximization (EM )algorithm for ﬁtting a discrete-mixture logit model,also known as a latent-class logit (LCL )model,in Stata.As Bhat (1997)and Train (2008)emphasize,the EM algorithm is an attractive alterna-tive to the usual (quasi-)Newton methods in the prent context becau it guarantees numerical stability and convergence to a local maximum even when the number of latent class is large.In contrast,the usual optimization procedures often fail to achieve con-vergence becau inversion of the (approximate)Hessian becomes numerically diﬃcult.

With this contribution,we aim at generalizing Paciﬁco ’s (2012)code with a Stata command that introduces a ries of important functionalities and provides an improved performance in terms of run time and stability.

1.For example,e Hess et al.(2011),Shen (2009),and Greene and Hensher (2003).

c 2013StataCorp LP st0312

626Latent-class logit model 2EM algorithm for LCL

This ction recapitulates the EM algorithm forﬁtting an LCL model.2Suppo that each of N agents f

劳育aces,for notational simplicity,J alternatives in each of T choice scenarios.3Let y njt denote a binary variable that equals1if agent n choos alternative j in scenario t and equals0otherwi.Each alternative is described by alternative-speciﬁc characteristics x njt and each agent by agent-speciﬁc characteristics,including a constant,z n.

LCL assumes that there are C distinct ts(or class)of taste parameters,β= (β1,β2,...,βC).If agent n is in class c,the probability of obrving his or her quence of choices is a product of conditional logit formulas:

P n(βc)=

北京西师附小t=1

j=1

exp(βc x njt)

k=1

exp(βc x nkt)

njt

(1)

Becau the class membership status is unknown,the rearcher needs to specify the unconditional likelihood of agent n’s choices,which equals the weighted average of(1) over class.The weight for class c,πcn(θ),is the population share of that class and is usually modeled as fractional multinomial logit,

πcn(θ)=

exp(θc z n)

C−1

l=1

exp(θl z n)

(2)

whereθ=(θ1,θ2,...,θC−1)are class membership model parameters;note thatθC has been normalized to0for identiﬁcation.

The sample log likelihood is then obtained by summing each agent’s log uncondi-tional likelihood:

ln L(β,θ)=

n=1

c=1

πcn(θ)P n(βc)(3)

Bhat(1997)and Train(2008)note numerical diﬃculties associated with maximizing(3)

directly.They show thatβandθcan be more conveniently estimated via a well-known EM algorithm for likelihood maximization in the prence of incomplete data,treating each agent’s class membership status as the missing information.Let superscript s

在校学生denote the estimates obtained at the s th iteration of this algorithm.Then at iteration

s+1,the estimates are updated as

βs+1=argmaxβ N

n=1

c=1

ηcn(βs,θs)ln P n(βc)

θs+1=argmaxθ N

n=1

c=1

ηcn(βs,θs)lnπcn(θ)

2.Further details are available in Bhat(1997)and Train(2008).

3.lclogit is also applicable when the number of scenarios varies across agents,and the number of

alternatives varies both across agents and over scenarios.

D.Paciﬁco and H.Yoo627 whereηcn(βs,θs)is the posterior probability that agent n is in class c evaluated at the

s th estimates:

ηcn(βs,θs)=

πcn(θs)P n(βs c)

优秀员工推荐信

l=1

πln(θs)P n(βs l)

(4)

The updating procedure can be implemented easily in Stata,exploiting clogit and fmlogit routines as follows.4βs+1is computed byﬁtting a conditional logit model (clogit)C times,each time usingηcn(βs,θs)for a particular c to weight obrvations on each n.θs+1is obtained byﬁtting a fractional multinomial logit model(fmlogit) that takesη1n(βs,θs),η2n(βs,θs),...,ηCn(βs,θs)as dependent variables.When z n only includes the constant term so that each class share is the same for all agents,that is,whenπcn(θ)=πc(θ),each class share can be directly updated by using the following analytical solution withoutﬁtting the fractional multinomial logit model:

πc(θs+1)=

n=1

ηcn(βs,θs)

l=1

n=1

ηln(βs,θs)

(5)

With a suitable lection of starting values,the updating procedure can be repeated until changes in the estimates and improvement in the log likelihood between iterations are small enough.

An often-highlighted feature of LCL is its ability to accommodate unobrved inter-personal taste variation without restricting the shape of the underlying taste distribu-tion.Hess et al.(2011)have recently emphasized that LCL also provides a convenient means to account for obrved interpersonal heterogeneity in correlations among tastes for diﬀerent attributes.For example,letβq andβh denote taste coeﬃcients on the q th and h th attributes,respectively.Each coeﬃcient may take one of C distinct values and is a random parameter from the rearcher’s perspective.Their covariance is given by

cov n(βq,βh)=

c=1

πcn(θ)βc,qβc,h−

c=1

πcn(θ)βc,q

c=1

πcn(θ)βc,h

(6)

whereβc,q is the value ofβq when agent n is in class c,andβc,h is deﬁned similarly.As long as z n in(2)includes a nonconstant variable,this covariance will vary across agents with diﬀerent obrved characteristics through the variation inπcn(θ).

3The lclogit command

lclogit is a Stata command that implements the EM iterative scheme outlined in the previous ction.This command generalizes Paciﬁco’s(2012)step-by-step procedure and introduces an improved internal loop along with other important functionalities. The overall eﬀect is to make the estimation process more convenient,signiﬁcantly faster, and more stable numerically.

4.fmlogit is a ur-written program.See footnote5for a further description.

628Latent-class logit model For example,the internal code of lclogit executes fewer algebraic operations per iteration to update the estimates;us the standard generate command to perform tasks that were previously executed with slightly slower egen functions;and,when possible,works with log probabilities instead of probabilities.All of the changes substantially reduce the estimation run time,especially in the prence of a large number of parameters and obrvations.If we take the8-class modelﬁt by Paciﬁco(2012)as an example,lclogit produces the same results as the step-by-step procedure while taking less than one-half of the run time.

The data tup for lclogit is identical to that required by clogit.

3.1Syntax

The generic syntax for lclogit is

lclogit depvar

indepvars

if养胃食物食谱大全

,group(varname)id(varname)

nclass(#)

membership(varlist)convergence(#)iterate(#)ed(#)

constraints(Class#numlist:

Class#numlist:...

)nolog

3.2Options

group(varname)speciﬁes a numeric identiﬁer variable for the up() is required.

id(varname)speciﬁes a numeric identiﬁer variable for the choice makers or agents.

With cross-ction data,urs should specify the same variable for both the group() and the id()options.id()is required.

nclass(#)speciﬁes the number of latent class ud in the estimation.A minimum of two latent class lass()is required.

membership(varlist)speciﬁes independent variables to enter the fractional multinomial logit model of class membership,that is,the variables included in the vector z n of

(2).The variables must be constant within the same agent as identiﬁed by id().5

哲理的诗句

青少年的英文When this option is not speciﬁed,the class shares are updated algebraically following

(5).

convergence(#)speciﬁes the tolerance for the log likelihood.When the proportional increa in the log likelihood over the lastﬁve iterations is less than the speciﬁed criterion,lclogit declares convergence.The default is convergence(0.00001). 5.Paciﬁco(2012)speciﬁed an ml program with the method lf toﬁt the class membership model.

lclogit us another ur-written program from Buis(2008),fmlogit,which performs the same estimation with the signiﬁcantly faster and more accurate d2method.lclogit is downloaded with

a modiﬁed version of the prediction command of fmlogit and fmlogit pr becau we had to modify

this command to obtain double-precision class shares.

D.Paciﬁco and H.Yoo 629iterate(#)speciﬁes the maximum number of iterations.If convergence is not achieved after the lected number of iterations,lclogit stops the recursion and notes this fact before displaying the estimation results.The default is iterate(150).

ed(#)ts the ed for pudouniform random numbers.The default is the creturn value c(ed).

The starting values for taste parameters are obtained by splitting the sample into nclass()diﬀerent subsamples and ﬁtting a clogit model for each of them.Dur-ing this process,a pudouniform random number is generated for each agent to assign the agent into a particular subsample.6As for the starting values for the class shares,lclogit us equal shares,that is,1/nclass().constraints(Class #numlist : Class #numlist :... )speciﬁes the constraints that are impod on the taste parameters of the designated class,that is,βc in (1).For instance,suppo that x1and x2are alternative-speciﬁc characteristics included in indepvars for lclogit and that the ur wishes to restrict the coeﬃcient on x1to 0for Class1and Class4and the coeﬃcient on x2to 2for Class4.Then the relevant ries of commands would look like this:

constraint 1x1=0

constraint 2x2=2

lclogit depvar indepvars ,group(varname )id(varname )

///

nclass(8)constraints(Class11:Class412)nolog suppress the display of the iteration log.

4Postestimation command:lclogitpr

lclogitpr predicts the probabilities of choosing each alternative in a choice situation (choice probabilities hereafter),the class shares or prior probabilities of class member-ship,and the posterior probabilities of class membership.The predicted probabilities are stored in a variable named stubname#,where #refers to the relevant class number;the only exception is the unconditional choice probability,which is stored in a variable named stubname .

4.1Syntax

The syntax for lclogitpr is lclogitpr stubname if in ,class(numlist )pr0pr up cp

6.More speciﬁcally,the unit interval is divided into nclass()equal parts,and if the agent’s pudo-random draw is in the c th part,the agent is allocated to the subsample who clogit results rve as the initial estimates of class c ’s taste parameters.Note that lclogit is identical to asmprobit in that the current ed,as at the beginning of the command’s execution,is restored once all necessary pudorandom draws have been made.

本文发布于:2023-07-01 00:05:43，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1062308.html

上一篇：If you find a path with no obstacles, it probably doesn't lead anywhere

下一篇：MLE和EM算法的学习和阅读整理

标签：食谱耳朵推荐信诗句食物

留言与评论（共有 0 条评论）