Mode Choice Modeling with R

June 14th, 2013

I started this post (and the work to go with it) as a companion to A Self Instructing Course in Mode Choice Modeling by Bhat and Koppelman.  That's because I could reproduce the work in the book in R and can (now) reproduce in R.

To continue with this, please get the CD files from my last blog post.  You'll specifically need "SF MTC Work MC Data.sav", which is in SPSS format.

The first part:


The items above simply load the libraries.  If any of these are not found, go to Packages (on the menu bar) - Install Packages... and select your closest mirror and select the missing package (either foreign or mlogit).

Next, read in the data and we'll add a field, too, as there is no unique id in this dataset.


The first line reads in the SPSS file (it asks you for the file).  The second adds a "HHPerID" field, which is unique to each case.

The next part is to format the data for mlogit.  This is quite a challenge because it has to be JUST RIGHT or there will be errors.


The first parts of this are pretty obvious (inTab is the input table, choice="chosen" is the choice field).  Shape="long" indicates that the data is multiple records per case.  "Wide" would indicate each record is on its own line.  chid.var is the case id variable.  alt.var is the alternatives.  drop.index drops the index field out of the resulting table.

Finally, we'll run a simple multinomial logit estimate on this.


For such a short piece of code, there is a lot going on here.  The formula is (simply) chosen=cost+tvtt+hhinc, BUT hhinc is alternative specific and cost and travel time are not.  So the utilities for this would be something like:








The result is this:


mlogit(formula = chosen ~ cost + tvtt | hhinc, data = mc, method = "nr",
print.level = 0)

Frequencies of alternatives:
1 2 3 4 5 6
0.7232054 0.1028037 0.0320143 0.0990257 0.0099423 0.0330086

nr method
6 iterations, 0h:0m:6s
g'(-H)^-1g = 5.25E-05
successive function values within tolerance limits

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
2:(intercept) -2.17804077 0.10463797 -20.8150 < 2.2e-16 ***
3:(intercept) -3.72512379 0.17769193 -20.9639 < 2.2e-16 ***
4:(intercept) -0.67094862 0.13259058 -5.0603 4.186e-07 ***
5:(intercept) -2.37634141 0.30450385 -7.8040 5.995e-15 ***
6:(intercept) -0.20681660 0.19410013 -1.0655 0.286643
cost -0.00492042 0.00023890 -20.5965 < 2.2e-16 ***
tvtt -0.05134065 0.00309940 -16.5647 < 2.2e-16 ***
2:hhinc -0.00216998 0.00155329 -1.3970 0.162406
3:hhinc 0.00035756 0.00253773 0.1409 0.887952
4:hhinc -0.00528636 0.00182881 -2.8906 0.003845 **
5:hhinc -0.01280827 0.00532413 -2.4057 0.016141 *
6:hhinc -0.00968627 0.00303306 -3.1936 0.001405 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -3626.2
McFadden R^2: 0.25344
Likelihood ratio test : chisq = 2462 (p.value = < 2.22e-16)

And this matches the self-instructing course manual, page 76 (under "Base Model").

Nested Logit

R can do simple nested logit calculations, but unfortunately they have to be *very* simple (which is uncharacteristic for R).  The best thing to do is get a copy of Biogeme and read the next post in this series.

A Self Instructing Course in Mode Choice Modeling

May 20th, 2013

One thing to ensure you understand how your software of choice works is to compare it to known outcomes.  For example, while learning Biogeme, I converted and ran some of the scenarios in A Self Instructing Course in Mode Choice Modeling in Biogeme and found interesting issues where certain values coming out of Biogeme were the reciprocal of those in the manual.  Neither is wrong, but when applying the data to a model, you have to know these things.

I've decided to do the same thing in R, and I had a lot of problems getting the CD.  I luckily found one on my hard drive.  It is here.

For the sake of making life easier on anyone that gets here looking for the manual, it's here.