R + OMX County Flows

October 8th, 2015

In travel modeling, we use matrices to do things like zone-to-zone trip flows.  The matrix is like a table, but with an equal number of rows and columns, each representing a location in the model (a traffic analysis zone, like a Census Block Group).  In the OKI region, we have 2,299 zones, which means there are 5,285,401 cells.  Too many to look at, and we don't have reliable data at that level.  However, we do have semi-reliable data at the county level.

The Open Matrix Format (OMX) is used to get these matrix files out of a proprietary format and into something that can be read by many programs.  We use this at OKI to move data out of Cube (a proprietary software product) and into R (an open source statistical programming package).

Summarizing a matrix to a county flow table in R is actually pretty easy:

This doesn't take very long, either.  Anyone familiar with R can see where the code can be revised to summarize to districts.

This is what the data looks like. Note that this is not verified data (please do not use it!).

This is what the data looks like. Note that this is not verified data (please do not use it!).

Note: the reason Hamilton, Campbell, and Dearborn county names are cut off is related to a bug in Cube.  They (Citilabs) are aware of the bug, but it has not been fixed yet.

Lookups in R: The REAL Right Way!

September 9th, 2014

After waiting forever enough to get things to run, I stepped into a better way to do lookups.

mapply on matrix objects.

Basically, I do this:

TSkimLBWPk<-read.dbf("data/TSPKLBW.DBF") #Read the local bus walk skim

TSKimLBWPK_IWAIT=(acast(TSkimLBWPk,I~J,value.var="V1",drop=FALSE,fill=0)) #build a matrix from the data

TSKimLBWPK.IWAIT<-function(i,j) {
if(i<=nrow(TSKimLBWPK_IWAIT) && j<=ncol(TSKimLBWPK_IWAIT))
return(TSKimLBWPK_IWAIT[i,j])
else return(0)
} #build a function to lookup, returning 0 if it is looking for a zone not found

TripsAllPk$LBW.IWAIT=mapply(TSKimLBWPK.IWAIT,TripsAllPk$PTAZ,TripsAllPk$ATAZ) #do the lookup

That's it. This takes the input DBF (which has I, J, V1, V2, etc. fields), converts to a matrix for a quick lookup, and then applies it.

It runs in about 3 minutes.

Lookups in R: The Wrong Way and the Right Way

August 28th, 2014

I recently wrote a script that takes DBF exports of Cube matrices and prepares them for Biogeme.  The main... well, only reason I did this in R was because I was considering using mlogit for model estimation.  I ultimately decided to 'go with what I know' and changed course to use Biogeme. Mind you, the part of Stairway to Heaven applies: "There are two paths you can go by, but in the long run / There's still time to change the road you're on."

The Wrong Way

I've changed my code already, so pardon that this is from memory.  Also, these are snippets - I have a lot more code than this.

HSkimPk<-read.dbf("data/HSKIM_PK3.dbf")

for(rn in 1:nrow(TripsAll)){
HSkimPkRow<-subset(HSkimPk,I==TripsAll[rn,"PTAZ"] & J==TripsAll[rn,"ATAZ")
TripsAll$DA.IVT<-HSkimPkRow[,"V1"]
...
}

This took no less than 17 hours to complete for around 23,000 trip records and for values from 5 different tables * 2 time periods.

The Right Way

I (obviously) wanted something that wouldn't take forever, especially as I was working in Biogeme and seeing things that made me think that I wanted to change ONE LITTLE THING.  This seems to always happen.

I took a different approach that by my calculations should be much quicker.

HSkimPk<-read.dbf("data/HSKIM_PK3.dbf")
HSkimPkD<-acast(HSkimPk,I ~ J,value.var="V2",drop=FALSE,fill=0)
HSkimPkT<-acast(HSkimPk,I ~ J,value.var="V1",drop=FALSE,fill=0)

for(rn in 1:nrow(TripsAll)){
if(I<=nrow(HSkimPkT) & J<=nrow(HSkimPkT)){
TripsAll[rn,"DA.IVT"]<-HSkimPkT[I,J]
}
}

Since this is currently running, my only metrics are to look at the time per 50 rows (in my real code, I have a line that outputs a timestamp every 50 rows), and it is taking about 0.27 seconds per record, compared to somewhere around 4.5 seconds per record.  While not perfect, I'll take an estimated completion of 1.75 hours compared to 17 (update: 2 hours).  However, I will say that Cube is faster in this regard and that I may not have the fastest R solution.

Quick Cube Voyager Trick: Arrays

December 24th, 2013

While injecting bugs into a script, I found a quick trick in Cube related to arrays.  I didn't test it to see if it works on a single-dimension array, but it DOES work on a multi-dimension array.

ARRAY Test=8,8

Test=1

This sets everything in Test to 1, so if you write:

LOOP _x=1,8

LOOP _y=1,8

PRINT LIST=Test[_x][_y]

ENDLOOP

ENDLOOP

The response will be '1' 64 times.