Logsum Issues

February 11th, 2014

I've been working through distribution in the model, and I was having a little bit of trouble.  As I looked into things, I found one place where QC is necessary to verify that things are working right.

The Logsums.

I didn't like the shape of the curve from the friction factors I was getting, so I started looking into a variety of inputs to the mode choice model.  Like time and distance by car:

This is a comparison of distance.  The red line is the new model, the blue and green are two different years of the old model.

This is a comparison of distance. The red line is the new model, the blue and green are two different years of the old model.

This is a comparison of zone-to-zone times.  The red line is the new model, the blue and green are different years of the old model.

This is a comparison of zone-to-zone times. The red line is the new model, the blue and green are different years of the old model.

In both cases, these are as expected.  Since there are more (smaller) zones in the new model, there are more shorter times and distances.

The problem that crept up was the logsums coming from mode choice model for use in distribution:

These are the logsums from the old model.  Notice that the curve allows for some variation.

These are the logsums from the old model. Notice that the curve allows for some variation.

These are the logsums in the new model.  This is a problem because of that 'spike'.

These are the logsums in the new model. This is a problem because of that 'spike'.

I put all the logsums on this, notice how the curve for the old model is dwarfed by the spike in the new model.  This is bad.

I put all the logsums on this, notice how the curve for the old model is dwarfed by the spike in the new model. This is bad.

So the question remains, what went wrong?

I believe the ultimate problem was that there was no limit on Bike and Pedestrian trips in the model, so it was generating some extreme values and somewhere and an infinity was happening in those modes causing the curve shown above.  I tested this by limiting the pedestrian trips to 5 miles (a fairly extreme value) and bike trips to 15 miles and re-running.  The logsums looked very different (again, the new model is the red line):

This is a comparison between the two model versions with fixed bicycle and pedestrian utility equations.

This is a comparison between the two model versions with fixed bicycle and pedestrian utility equations.

Note that the X axis range went from 650 (in the above plots) to 1000.  I'm not too concerned that the logsums in the new model have a larger range.  In fact, as long as those ranges are in the right place distribution may be better.  This is not final data, as I am still looking at a few other things to debug.

Cube Voyager XCHOICE: The Missing Help Doc

October 22nd, 2013

First note: I sent this to the support people at Citilabs recently, so maybe on the next update they'll include this in the help document, as I think it is sorely missing.  Or maybe I'm just crazy (you kinda have to be crazy to do transportation modeling*) and have my own way of thinking.

In the OKI model, we have a nested logit mode choice structure that's pretty common for MPOs our size - it's had a history in several other MPOs.  The nesting structure looks like this:

 

OKI Mode Choice

 

The part that I think is missing in the Voyager Help File for XCHOICE is this:

XCHOICE ALTERNATIVES=LBW, LBP, LBK, EBW, EBP, EBK, LRW, LRP, LRK, WALK, BIKE, DA, SR2, SR3,
UTILITIESMW=141,142,143,144,145,146,147,148,149,110,111,112,113,114,
DEMANDMW=5,
ODEMANDMW=412,413,414,415,416,417,418,419,420,421,422,405,407,408,
SPLIT=LB 1 LBW 1 LBP 1 LBK,
SPLIT=EB 1 EBW 1 EBP 1 EBK,
SPLIT=LR 1 LRW 1 LRP 1 LRK,
SPLIT=TRAN 0.516 LB 0.516 EB 0.516 LR,
SPLIT=SR 1 SR2 1 SR3,
SPLIT=AUTO 0.516 SR 0.516 DA,
SPLIT=NONM 0.516 WALK 0.516 BIKE,
SPLIT=TOTAL 0.477 AUTO 0.477 TRAN 0.477 NONM,
SPLITCOMP=801,
STARTMW=900

More importantly, WHY this is important:

XCHOICE

All those green, blue, red, and yellow marks are pointing things out - like what connects to what and so on.  Once these connections are made, you can get answers without a lot of effort.  It's quite nice, really.  However, I haven't figured out where to put upper-level coefficients, but in my latest estimation runs, those are out.

More Stuff

One of the more important things to get out of the mode choice model is the logsum values that would be used as the impedance for a gravity model distribution.  Once you have the above, it's pretty easy.

First off, make your demand matrix have 100 trips per IJ pair:

MW[2]=100

Then, get the exponential of the SPLITCOMP matrix to get a usable logsum value.

MW[700]=EXP(MW[801])

Note that in the OKI model (and in other models that use walk access markets), this needs to be done multiple times and accumulated to another matrix:

MW[701]=MW[701]+(MW[5]*MW[700])

And then there is some post-processing that needs to be done at the end of the ILOOP:

JLOOP
IF(MW[2]>0)
MW[702]=MW[701]/MW[2] ;Denom
ELSE
MW[702]=0.00000000001
ENDIF

MW[704]=(LN(MW[702])/(COEFF[1]*CLSPRM*CLSSUB))+LOGADD
ENDJLOOP

704 is the output matrix.

--

* I have other forms of craziness, too.  I've recently announced - and made it "Facebook official" to my close friends and family - that I'm going to run a half marathon next spring.

Mode Choice Modeling with R

June 14th, 2013

I started this post (and the work to go with it) as a companion to A Self Instructing Course in Mode Choice Modeling by Bhat and Koppelman.  That's because I could reproduce the work in the book in R and can (now) reproduce in R.

To continue with this, please get the CD files from my last blog post.  You'll specifically need "SF MTC Work MC Data.sav", which is in SPSS format.

The first part:

library(foreign)
library(mlogit)

The items above simply load the libraries.  If any of these are not found, go to Packages (on the menu bar) - Install Packages... and select your closest mirror and select the missing package (either foreign or mlogit).

Next, read in the data and we'll add a field, too, as there is no unique id in this dataset.

inTab<-read.spss(file.choose(),to.data.frame=T,use.value.labels=F)
inTab$HHPerID=inTab$hhid*100+inTab$perid

The first line reads in the SPSS file (it asks you for the file).  The second adds a "HHPerID" field, which is unique to each case.

The next part is to format the data for mlogit.  This is quite a challenge because it has to be JUST RIGHT or there will be errors.

mc<-mlogit.data(inTab,choice="chosen",shape="long",chid.var="HHPerID",alt.var="altnum",drop.index=T)

The first parts of this are pretty obvious (inTab is the input table, choice="chosen" is the choice field).  Shape="long" indicates that the data is multiple records per case.  "Wide" would indicate each record is on its own line.  chid.var is the case id variable.  alt.var is the alternatives.  drop.index drops the index field out of the resulting table.

Finally, we'll run a simple multinomial logit estimate on this.

nlb<-mlogit(chosen~cost+tvtt|hhinc,mc)

For such a short piece of code, there is a lot going on here.  The formula is (simply) chosen=cost+tvtt+hhinc, BUT hhinc is alternative specific and cost and travel time are not.  So the utilities for this would be something like:

U_{da}=\beta_{cost}*cost+\beta_{tt}*tvtt

U_{sr2}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr2}*hhinc+K_{sr2}

U_{sr3}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr3}*hhinc+K_{sr3}

U_{transit}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,tranist}*hhinc+K_{transit}

U_{walk}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,walk}*hhinc+K_{walk}

U_{bike}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,bike}*hhinc+K_{bike}

 

The result is this:

>summary(nlb)

Call:
mlogit(formula = chosen ~ cost + tvtt | hhinc, data = mc, method = "nr",
print.level = 0)

Frequencies of alternatives:
1 2 3 4 5 6
0.7232054 0.1028037 0.0320143 0.0990257 0.0099423 0.0330086

nr method
6 iterations, 0h:0m:6s
g'(-H)^-1g = 5.25E-05
successive function values within tolerance limits

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
2:(intercept) -2.17804077 0.10463797 -20.8150 < 2.2e-16 ***
3:(intercept) -3.72512379 0.17769193 -20.9639 < 2.2e-16 ***
4:(intercept) -0.67094862 0.13259058 -5.0603 4.186e-07 ***
5:(intercept) -2.37634141 0.30450385 -7.8040 5.995e-15 ***
6:(intercept) -0.20681660 0.19410013 -1.0655 0.286643
cost -0.00492042 0.00023890 -20.5965 < 2.2e-16 ***
tvtt -0.05134065 0.00309940 -16.5647 < 2.2e-16 ***
2:hhinc -0.00216998 0.00155329 -1.3970 0.162406
3:hhinc 0.00035756 0.00253773 0.1409 0.887952
4:hhinc -0.00528636 0.00182881 -2.8906 0.003845 **
5:hhinc -0.01280827 0.00532413 -2.4057 0.016141 *
6:hhinc -0.00968627 0.00303306 -3.1936 0.001405 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -3626.2
McFadden R^2: 0.25344
Likelihood ratio test : chisq = 2462 (p.value = < 2.22e-16)

And this matches the self-instructing course manual, page 76 (under "Base Model").

Nested Logit

R can do simple nested logit calculations, but unfortunately they have to be *very* simple (which is uncharacteristic for R).  The best thing to do is get a copy of Biogeme and read the next post in this series.

A Self Instructing Course in Mode Choice Modeling

May 20th, 2013

One thing to ensure you understand how your software of choice works is to compare it to known outcomes.  For example, while learning Biogeme, I converted and ran some of the scenarios in A Self Instructing Course in Mode Choice Modeling in Biogeme and found interesting issues where certain values coming out of Biogeme were the reciprocal of those in the manual.  Neither is wrong, but when applying the data to a model, you have to know these things.

I've decided to do the same thing in R, and I had a lot of problems getting the CD.  I luckily found one on my hard drive.  It is here.

For the sake of making life easier on anyone that gets here looking for the manual, it's here.

Using a Class Object to Help Read a Control File

December 5th, 2010

One thing we're used to in travel modeling is control files.  It seems to harken back to the days of TranPlan where everything had a control file to control the steps.

In my case, I have a control file for my nested logit mode choice program, and because of the age of the mode choice program, I want to redesign it.  The first part of this is reading the control file, and I did a little trick to help with reading each control file line.  With C++, there is no way to read variables in from a file (like there is with FORTRAN).

The first part of the code reads the control file, and you will see that once I open and read the control file, I section it out (the control file has sections for files ($FILES), operation parameters ($PARAMS), operation options ($OPTIONS), and mode choice parameters ($PARMS). Each section ends with an end tag ($END). This adds the flexibility of being able to re-use variables in different locations.

After the section, the next portion of the code reads the line and checks to see if FPERIN is found. If it is, a ControlFileEntry object is created. This object is a class that is used to return the filename held in the object. This makes it easy to reduce code.

int readControlFile(char *controlFileName){
	cout << "Reading " << controlFileName << endl;
	//Read the control file
	string line;
	bool inFiles=false, inParams=false, inOptions=false, inParms=false;
	ifstream controlFile(controlFileName);
	if(!controlFile.good()){
		cout << "PROBLEMS READING CONTROL FILE" << endl;
		return 1;
	}
	while(controlFile.good()){
		getline(controlFile,line);
		//check the vars sections
		if(line.find("$FILES")!=string::npos)
			inFiles=true;
		if(line.find("$PARAMS")!=string::npos)
			inParams=true;
		if(line.find("$OPTIONS")!=string::npos)
			inOptions=true;
		if(line.find("$PARMS")!=string::npos)
			inParms=true;
		if(line.find("$END")!=string::npos){
			inFiles=false;
			inParams=false;
			inOptions=false;
			inParms=false;
		}
		if(inFiles){
			cout << "Checking files" << endl;
			if(line.find("FPERIN")!=string::npos){
				controlFileEntry cfe(line);
				PerTrpIn=cfe.filename;
			}
//SNIP!!!
	return 0;
}

The controlFileEntry is code is below.  This is used at the top of the code, just below the preprocessor directives (the #include stuff).

class controlFileEntry{
public:
	string filename;
	controlFileEntry(string Entry){
		beg=(int)Entry.find("=")+2;
		end=(int)Entry.rfind("\'")-beg;
		filename=Entry.substr(beg,end);
	}
	~controlFileEntry(){
		beg=0;
		end=0;
		filename="";
	}
private:
	string Entry;
	int beg;
	int end;
};

The class has one public member, filename, which is what is read in the code where it is used. There are two public functions. The first is the constructor (controlFileEntry) which is used when creating the object. The second is the de-constructor (~controlFileEntry), which sets the beg, end, and filename variables to zero and blank.  The beg, end (misnomer), and the line sent to it are private and cannot be used in code.

This can be extended, as the file entry type is fine when there are quotes around the item (it is setup for that, note the -2 in beg).  I wrote a different one for integers, floating point, and boolean values.

class controlParamEntry{
public:
	int ivalue;
	bool bvalue;
	double dvalue;
	controlParamEntry(string Entry){
		beg=(int)Entry.find("=")+1;
		end=(int)Entry.rfind(",")-beg;
		ivalue=0;
		dvalue=0;
		if(Entry.substr(beg,end)=="T"){
			bvalue=true;
			ivalue=1;
			dvalue=1;
		}else if(Entry.substr(beg,end)=="F"){
			bvalue=false;
			ivalue=0;
			dvalue=0;
		}
		if(ivalue==0){
			ivalue=atoi(Entry.substr(beg,end).c_str());
		}
		if(dvalue==0){
			dvalue=atof(Entry.substr(beg,end).c_str());
		}
	}
	~controlParamEntry(){
		beg=0;
		end=0;
		ivalue=0;
		dvalue=0;
		Entry="";
	}
private:
	string Entry;
	int beg;
	int end;
};

As you can see above, there are return values for floating point (dvalue), integer (ivalue), and boolean (bvalue).

Tune in next week to see more of the code.