I started this post (and the work to go with it) as a companion to A Self Instructing Course in Mode Choice Modeling by Bhat and Koppelman.  That's because I could reproduce the work in the book in R and can (now) reproduce in R.

To continue with this, please get the CD files from my last blog post.  You'll specifically need "SF MTC Work MC Data.sav", which is in SPSS format.

The first part:

library(foreign)
library(mlogit)

The items above simply load the libraries.  If any of these are not found, go to Packages (on the menu bar) - Install Packages... and select your closest mirror and select the missing package (either foreign or mlogit).

Next, read in the data and we'll add a field, too, as there is no unique id in this dataset.

inTab&lt;-read.spss(file.choose(),to.data.frame=T,use.value.labels=F)
inTab$HHPerID=inTab$hhid*100+inTab$perid The first line reads in the SPSS file (it asks you for the file). The second adds a "HHPerID" field, which is unique to each case. The next part is to format the data for mlogit. This is quite a challenge because it has to be JUST RIGHT or there will be errors. mc&lt;-mlogit.data(inTab,choice="chosen",shape="long",chid.var="HHPerID",alt.var="altnum",drop.index=T) The first parts of this are pretty obvious (inTab is the input table, choice="chosen" is the choice field). Shape="long" indicates that the data is multiple records per case. "Wide" would indicate each record is on its own line. chid.var is the case id variable. alt.var is the alternatives. drop.index drops the index field out of the resulting table. Finally, we'll run a simple multinomial logit estimate on this. nlb&lt;-mlogit(chosen~cost+tvtt|hhinc,mc) For such a short piece of code, there is a lot going on here. The formula is (simply) chosen=cost+tvtt+hhinc, BUT hhinc is alternative specific and cost and travel time are not. So the utilities for this would be something like: $U_{da}=\beta_{cost}*cost+\beta_{tt}*tvtt$ $U_{sr2}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr2}*hhinc+K_{sr2}$ $U_{sr3}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,sr3}*hhinc+K_{sr3}$ $U_{transit}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,tranist}*hhinc+K_{transit}$ $U_{walk}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,walk}*hhinc+K_{walk}$ $U_{bike}=\beta_{cost}*cost+\beta_{tt}*tvtt+\beta_{inc,bike}*hhinc+K_{bike}$ The result is this: &gt;summary(nlb) Call: mlogit(formula = chosen ~ cost + tvtt | hhinc, data = mc, method = "nr", print.level = 0) Frequencies of alternatives: 1 2 3 4 5 6 0.7232054 0.1028037 0.0320143 0.0990257 0.0099423 0.0330086 nr method 6 iterations, 0h:0m:6s g'(-H)^-1g = 5.25E-05 successive function values within tolerance limits Coefficients : Estimate Std. Error t-value Pr(&gt;|t|) 2:(intercept) -2.17804077 0.10463797 -20.8150 &lt; 2.2e-16 *** 3:(intercept) -3.72512379 0.17769193 -20.9639 &lt; 2.2e-16 *** 4:(intercept) -0.67094862 0.13259058 -5.0603 4.186e-07 *** 5:(intercept) -2.37634141 0.30450385 -7.8040 5.995e-15 *** 6:(intercept) -0.20681660 0.19410013 -1.0655 0.286643 cost -0.00492042 0.00023890 -20.5965 &lt; 2.2e-16 *** tvtt -0.05134065 0.00309940 -16.5647 &lt; 2.2e-16 *** 2:hhinc -0.00216998 0.00155329 -1.3970 0.162406 3:hhinc 0.00035756 0.00253773 0.1409 0.887952 4:hhinc -0.00528636 0.00182881 -2.8906 0.003845 ** 5:hhinc -0.01280827 0.00532413 -2.4057 0.016141 * 6:hhinc -0.00968627 0.00303306 -3.1936 0.001405 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Log-Likelihood: -3626.2 McFadden R^2: 0.25344 Likelihood ratio test : chisq = 2462 (p.value = &lt; 2.22e-16) And this matches the self-instructing course manual, page 76 (under "Base Model"). # Nested Logit R can do simple nested logit calculations, but unfortunately they have to be *very* simple (which is uncharacteristic for R). The best thing to do is get a copy of Biogeme and read the next post in this series. ## Linear and Nonlinear Models in R June 7th, 2013 This post will talk about building linear and non-linear models of trip rates in R. If you haven't read the first part of this series, please do so, partly because this builds on it. # Simple Linear Models Simple linear models are, well, simple in R. An example of a fairly easy linear model with two factors is: inTab.hbsh<-subset(inTab,TP_Text=='HBSh') hbsh<-ddply(inTab.hbsh,.(HHID,HHSize6,Workers4,HomeAT),summarise,N=length(HHID)) hbsh.lm.W_H<-lm(N~Workers4+HHSize6,data=hbsh) This creates a simple linear home-based-shopping trip generation model based on workers and household size. Once the estimation completes (it should take less than a second), the summary should show the following data: > summary(hbsh.lm.W_H) Call: lm(formula = N ~ Workers4 + HHSize6, data = hbsh) Residuals: Min 1Q Median 3Q Max -2.2434 -1.1896 -0.2749 0.7251 11.2946 Coefficients: Estimate Std. Error t value Pr(&gt;|t|) (Intercept) 1.79064 0.10409 17.203 < 2e-16 *** Workers4 -0.02690 0.05848 -0.460 0.646 HHSize6 0.24213 0.04365 5.547 3.58e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.649 on 1196 degrees of freedom Multiple R-squared: 0.03228, Adjusted R-squared: 0.03066 F-statistic: 19.95 on 2 and 1196 DF, p-value: 3.008e-09 What all this means is: Trips = -0.0269*workers+0.24213*HHSize+1.79064 The important things to note on this is that the intercept is very significant (that's bad) and the R2 is 0.03066 (that's horrible). There's more here, but it's more details. # Non-Linear Least Squares When doing a non-linear model, the nls function is the way to go. The two lines below create a trips data frame, and then run a non-linear least-squares model estimation on it (note that the first line is long and wraps to the second line). trips<-ddply(inTab,.(HHID,HHSize6,Workers4,HHVEH4,INCOME,WealthClass),AreaType=min(HomeAT,3),summarise,T.HBSH=min(sum(TP_Text=='HBSh'),6),T.HBSC=sum(TP_Text=='HBS'),T.HBSR=sum(TP_Text=='HBSoc'),T.HBO=sum(TP_Text=='HBO')) trips.hbo.nls.at3p<-nls(T.HBO~a*log(HHSize6+b),data=subset(trips,AreaType>=3),start=c(a=1,b=1),trace=true) The second line does the actual non-linear least-squares estimation. The input formula is T=a*e^(HHSize+b). In this type of model, starting values for a and b have to be given to the model. The summary of this model is a little different: > summary( trips.hbo.nls.at3p) Formula: T.HBO ~ a * log(HHSize6 + b) Parameters: Estimate Std. Error t value Pr(>|t|) a 1.8672 0.1692 11.034 < 2e-16 *** b 1.2366 0.2905 4.257 2.58e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.095 on 402 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 1.476e-07  It doesn't perform R2 on this because it can't directly. However, we can because we know the actual values and the model predicts the values. So, one thing that can be done is a plot: > plot(c(0,10),c(0,10),type='l',xlab='Observed Trips',ylab='Predicted Trips') > points(subset(trips,AreaType>=3)$T.HBO,fitted(trips.hbo.nls.at3p),col='red')


The resulting graph looks like this. Not particularly good, but there is also no scale as to the frequency along the 45° line.

R2 is still a good measure here. There's probably an easier way to do this, but this way is pretty simple.

testTable<-data.frame(cbind(subset(trips,AreaType>=3)$T.HBO,fitted(trips.hbo.nls.at3p))) cor(testTable$X1,testTable$X2)  Since I didn't correct the column names when I created the data frame, R used X1 and X2, as evidenced by checking the summary of testTable: > summary(testTable) X1 X2 Min. : 0.000 Min. :1.503 1st Qu.: 1.000 1st Qu.:1.503 Median : 2.000 Median :2.193 Mean : 2.072 Mean :2.070 3rd Qu.: 3.000 3rd Qu.:2.193 Max. :23.000 Max. :3.696  So the R2 value is pretty bad... > cor(testTable$X1,testTable$X2) [1] 0.2755101  It's better than some of the others, after all, this is semirandom human behavior. That's it for now. My next post will be... MORE R! Also, I have a quick shout-out to Jeremy Raw at FHWA for help via email related to this. He helped me through some issues via email, and parts of his email helped parts of this post. ## New Project In the Works May 31st, 2013 I haven't worked in R in several days because I've been working on a new project that will assist with getting good transit speed curves from highway data. The project is in Java and is on my Github page. I'm working on making part of it multi-threaded, which is new to me. A second project that is still in my mind (and I'm not sure if this will become part of this one or another separate project) will be to use transit GPS traces to get good trip length frequencies on transit. Stay tuned! ## Getting Started in R May 24th, 2013 # Setting Up Download R from http://www.r-project.org. Install it normally (on Windows)... Double-click, next, next, next, etc. Create a project folder with your data and with a shortcut to R (shout-out to Brian Gregor at Oregon DOT for this little trick). Also copy/move the data CSV there. # Inputting and Looking at Data The data is in CSV, so we need to load the foreign library, and then we'll load the data. I'm not a fan of typing in long filepaths, so I use the file.choose() function to browse for the data. Note that in many cases the inTab&lt;-read.csv(file.choose()) summary(inTab) In the code above, we've loaded the dbf into the inTab data frame (a data object in R) and got a summary of it. There's a few tricks to see parts of the data. inTab$HHID (only the HHID values)
inTab[1:2] (only the first two fields)
inTab[1:10,] (only the first 10 rows)
inTab[1:10,1] (only the first field of the first 10 rows)

Data can be charted in R as well. A simple histogram is very simple to do in R.

hist(inTab$HHSize) Sometimes data needs to be summarized. There is a function to do that, but first you'll probably have to download a package. To download the module, go to Packages - Install Packages. From the list, find plyr and install it. Once plyr is installed (it shouldn't take long), you can load the module and use ddply to summarize data. library(plyr) inTab.Per&lt;-ddply(inTab,.(HHID,HHSize6,Workers4,HHVEH4,INCOME,WealthClass),AreaType=min(HomeAT,3),summarise,T.HBSH=min(sum(TP_Text=='HBSh'),6),T.HBSC=sum(TP_Text=='HBS'),T.HBSR=sum(TP_Text=='HBSoc'),T.HBO=sum(TP_Text=='HBO')) Where inTab is the input table, .(HHID,HHSize6,HHVEH4,INCOME,WealthClass) are input fields to summarize by, AreaType=min(HomeAT,3) is a calculated field to summarize by, and everything following 'summarise' are the summaries. # Conclusion This is a crash course in R, and in the last steps, you basically computed average trip rates. Next week's post will be to run linear and non-linear models on this data. ## A Self Instructing Course in Mode Choice Modeling May 20th, 2013 One thing to ensure you understand how your software of choice works is to compare it to known outcomes. For example, while learning Biogeme, I converted and ran some of the scenarios in A Self Instructing Course in Mode Choice Modeling in Biogeme and found interesting issues where certain values coming out of Biogeme were the reciprocal of those in the manual. Neither is wrong, but when applying the data to a model, you have to know these things. I've decided to do the same thing in R, and I had a lot of problems getting the CD. I luckily found one on my hard drive. It is here. For the sake of making life easier on anyone that gets here looking for the manual, it's here. ## New Series on R in Transportation Modeling [Updated 21 May 2013] May 17th, 2013 I've been doing a lot of statistical stuff over the past several weeks, and I think it is worth some value to the Interwebs if I try and post some of it. I'm considering making it a course of some sort with some scrubbed HHTS data (no, I can't post real peoples' locations and names, I think I might get in a little bit of trouble for that). The "syllabus" is roughly something like this (last update: 21 May 2013): 1. Intro to R: getting data in, making summaries 2. Trip rates - Averages 3. Trip rates - Linear and Non-linear modeling 4. Mode Choice Estimation in R 5. Complex Mode Choice Estimation in Biogeme 6. Distribution Friction Factors 7. Distribution K Factors 8. Outputs and Graphics I can't guarantee that these will be the next eight weeks worth of posts - there will probably be some weeks with a different post, since I don't know if I can get all this stuff done in six weeks, even with the head start I have. In other news... I've been disabling comments on these short posts that really don't warrant any sort of responses. I've been getting a lot of spam on this blog, so I'm cutting it down where I can. This is not a global thing, I've been turning them off on certain posts, but the default is on. ## T-Test Trivia May 13th, 2013 Any statistical test that uses the t-distribution can be called a t-test. One of the most common is Student's t-test, named after "Student," the pseudonym that William Gosset used to hide his employment by the Guinness brewery in the early 1900s. They didn't want their competitors to know that they were making better beer with statistics. ## TRB Applications Conference Mobile Website May 3rd, 2013 For those going to the TRB Transportation Planning Applications Conference in Columbus, Ohio next week (May 5-9), I've released a very simple mobile website for it. I have part of an API designed into the site, and I intend to continue that with the next Applications Conference, as I want to see a mobile/tablet app happen. I can make some Android platform stuff happen, but I have no iPhone development experience nor do I have an iDevice to do that on. In addition, I'd love to see people that tweet during the conference to use the hashtag #TRBAppCon. I will be tweeting (sometimes) and taking some pictures during the conference. My twitter handle is @okiAndrew. Next up... The day I'm writing this (I generally schedule posts anywhere from a day to 2 weeks in advance), I read the sad news that Astrid was acquired by Yahoo!. I didn't have anything big to write on this blog this week, but there's a few things out there worth a look.

In my other life, I'm an amateur radio operator.  I wrote a piece over on my other blog about the global economy and parts, as I have been buying parts from eBay at dirt-cheap prices.   This has continued implications on freight in this country.  It's likely to get worse, as the makers-turned-entrepreneurs are (in droves) sending things off to China for fabrication.  Designed in the USA, made in China.

Mike Spack over on his blog mentioned that the one feature every traffic counter must have is identification.  He's 100% correct.  I've seen a video of the Boston bomb squad blasting a traffic counter with a water cannon many years ago, and that's what happens when you don't put some sort of ID on your counters.   The orginal video of the counter's demise has long since disappeared from the Internet, but you can still see the reference on Boing Boing.

## Prepping my Computer (for a conference, but that part doesn’t matter)

April 12th, 2013

Note: I thought I posted this last January, but it appears I didn't.

This post could be re-titled "Why I Love Linux" because it requires Linux.

Like many other transportation geeks, I'm getting ready to go to this little conference in Washington, DC.  I've been getting things together because I found out a few years ago that being stuck in DC with problematic technology (like a bad cell phone battery) is no fun.  And to top it all off, my laptop feels like it has a failing hard drive.

So I booted into Ubuntu and used Disk Utility to check the SMART status via disk utility.  Which claims everything is fine.

Still, though, I didn't receive any disk with my laptop (it instead has a rescue partition) and my intuition disagrees with what my disk drive thinks of itself, so I decided the smart thing to do would be to arm myself with a few good USB flash drives.

The first USB flash drive is a live image of Ubuntu.

The second is my rescue partition image that can be restored to a new drive.  I got this by:

1. Getting an image file using the ntfsclone command:

sudo ntfsclone -o rescue.img /dev/sda4

Where /dev/sda4 is the Lenovo rescue partition (as indicated in Disk Utility)

2. Compress the rescue image

gzip rescue.img

3. Split the image into 1 GB bits

split -b 1024m rescue.img.gz

(note: steps 2 and 3 can be combined with gzip rescue.img |split -b 1024m

I then copied these to a USB flash drive.

I'm writing this as I'm working on a presentation for the TRB Applications Conference.  I'm working on a presentation I can present, and my delusions of grandeur are such that I THINK I can present Open Source Tools to QC Transit Survey Data as well as Steve Jobs could present a new iPhone, but without the reality distortion field.

I've been to quite a few conferences of varying groups, and I would call these "semi-academic".  Sometimes they are presenting research, but in many cases they are presenting an application of research.  There's no selling, and the audience is generally captive.

In places where you aren't required to post a paper, do so anyway.  Include the detail there.  Don't include tables full of numbers in a presentation, highlight one or two important numbers (trends, alternative analyses, etc) and note conclusions.  Include the big tables in the paper.

If you don't include a paper, upload a second presentation with more detail and/or use copious "slide notes".  Seriously.

The last resort - go to WordPress.com or Blogger.com or something, build a blog, and post it there.  Or hang it on your agency's website.  Or something else along those lines.

2. Don't Include tables full of numbers

Even though I mention it above, it bears repeating.  Normally, we can't read them in the audience.  Focus on one number.  For example, if you're showing that a mode choice model works better when using transfers as part of the transit utility, show us the log-likelihood or/and the correlation coefficient for ONLY the best case without transfers and the best case with transfers.  Keep it simple.  If I want the standard error of individual values, I'll look for them, and if I ask at the end of the presentation, direct me to the paper.

3. Just because you can read it on screen while authoring a presentation does not mean that your audience can read it on the projector

24 point font is a minimum.  Yes, I know PowerPoint's list box goes down to 8.  That does not mean you should ever go down there.  Some people have sight problems, and those problems can be exacerbated by trying to see around peoples' heads.

A second part of this has to do with being able to read the slides while you're presentting.  Just because you can read your slides on your 19"+ monitors at the office when you're 18" away does NOT mean that you'll e able to read them on a laptop with a 14" or 15" screen (or 17" widescreen, which is about as small due to the scaling) from a few feet away.

4. Use pictures and talk about them

If your presentation has no pictures, you're doing it wrong.  If you want your concept/idea/solution/innovation/etc (pick one), throw in a few pictures that illustrate a point (or something like that).  For example, in a presentation I'm working on now, I have a workplace location that is noted by Dilbert's office building and him waving.  I think it gets the idea of "workplace" across to people, and most people know Dilbert.

More importantly, half my presentation is maps that I will talk about.  No text.  I have 7 slides with bullets, 2 or 3 with numbered lists, and that's out of 30.  That's about right.

5. Reduce, but do not remove bullets

There is a big push in many circles to remove bullets from presentations.  In an academic presentation, that's damn near impossible.  Don't give in to the hate, but try to reduce bullets as much as practical.

6. Expect there to be dissenting opinions

I've seen a fair number of people get "blasted" by industry professionals.  Don't get mad about it.  They are normally not there to make you feel bad, and don't feel bad about it.  A session moderator can recognize when someone is asking a real question as opposed to someone that has an ax to grind, and a moderator WILL step in if someone asking questions is out of line.

7. Do not use the Microsoft PowerPoint (etc.) templates

Rare is it that a Built-in Template works for a presentation.  Normally an agency or company has some nicer and more appropriate templates to use.  Use them.

This guideline does not apply if your presentation is short (e.g. 5 minutes) or it is a presentation in a non-professional setting (e.g. a hobby).

I can read quite well and so can the rest of the audience.  If you're just going to read the slides, hand out your presentation (as good 'ol tree-killin' paper) and sit back down.  Don't load your presentation on the laptop, don't talk, and tell the session moderator to just skip you.

This is probably the biggest reason many people want to remove bullets.  No bullets means that you might have to (gasp!) TALK ABOUT your content!

9. Use Animations Sparingly

Do NOT use animations to simply put bullets on the screen.  However, there are times when animations are important for the point of illustrating an idea, showing a process, or just pure entertainment.

10. Do NOT use numbers for alternatives

I will forget about the numbers as soon as you change slides.  Give them names.  And for those that have used "Alternative 1" and "Alternative 1A", there is a special place in Hell for you.

11. Have the similar delusions of grandeur to what I have

Find a person you think is a damn good presenter. Learn from them.  Try to present as effectively as they do.

I stumbled on a problem that seems to have no easy answer.  Working on the count stations layer here at the office, I found that we had a small number of points that weren't located in the GIS feature class, although we DO have X and Y coordinates for them.

Since searching on Google turned up nothing, I wrote my own solution.  Since I already had some Java code to look for selected features and get to the actual features, I copied that code into a new project and made a few modifications.  Those modifications are posted on Github.  Even better, I actually used a few comments in this one!

In a prior post, I link to some code that outputs a path file.  I've done something a tad different because I needed some select link analysis and reading the path file in Cube was taking far too long to do it the normal way.

So, I took that program on Github and extended it to perform a selected link:

And this outputs a few GB of paths in CSV format.  I went from 42 GB of paths in the AM to 3.4 GB of CSV paths.  Still not good enough. The next thing I did was use GAWK to get just the Origin and Destination

This returns a CSV file of just the origin and destination (which can be linked to the vehicle trip matrix).

Part 2 will discuss how to link to a vehicle trip matrix and if this approach actually works!

I've started up a new blog that will hopefully be more maintained than this one: www.opencivichardware.org.  The idea of civic hardware came about from a presenter from Transportation Camp DC 2013.  Civic hardware are things created to help with a city (or state, or region).  It could be things like traffic counters, data loggers, tools to help with public involvement, or infrastructure.

The idea of this site is similar in nature to Hack-A-Day, but with a focus on civic hardware.  There will probably be a lot of things that can be cross-posted to both.  Additionally, look for things on this blog to be cross-posted there.

As a follow-up to my prior post, this is how to use the Cube Voyager API to read a path file.  I highly recommend you read the other article first, as it talks more about what is going on here.

### The Interface

The interface for the path reader is larger because of the return structure.  The code below includes the interfaces to the DLL calls and the structure for the path data returned by some of them.  Note that I didn't do PathReaderReadDirect.  It doesn't seem to work (or I'm not trying hard enough).

### The Code

Once the interface is in place, the code is reasonably simple.  However, I'm seeing a few interesting things in the costs and volumes in both C++ and in Java, so I wouldn't use those values.  I guess if you need to determine the costs, you should save the costs with the loaded highway network to a DBF file and read that into an array that can be used to store and get the values.

### The Final Word... For Now

Java is a great programming language.  Using these DLLs can help you do some interesting stuff.  However, it seems that there are very few people using the API, which is concerning.  I personally would like to see an interface for reading .NET files and writing matrices.  But I can't expect Citilabs to put time in on that when it seems there are so few people using it.

I've begun to really enjoy Java.  It's hot, black exterior exposes a sweet bitterness that matches few other things in this world.  Oh, wait, this is supposed to be about the other Java - the programming language!

The "Holy Grail" of API programming with Cube Voyager to me has been using the API in Java.  I can program in C++ quite well, but I have a staff that can't.  We're likely going to be going to a Java based modeling structure in the next few years, so  it makes sense to write everything in Java and keep the model down to two languages - Cube Voyager and Java.

### Setting up the Java Environment

There are three things to do to setup the Java environment to make this work.  The first is to place the Cube DLL in the right location.  The second is to get JNA and locate the libraries to where you need them.  The final is to setup the Java execution environment.

First, copy the VoyagerFileAccess.dll file (and probably it's associated lib file) to C:\Windows.  It should work.  I'm using a Windows 7-64 bit machine, so if it doesn't work, try C:\Windows\System32 and C:\Windows\System.

Second, get JNA.  This allows the Java compiler to connect to the DLL.  The latest version can be downloaded from Github (go down to "Downloads" under the Readme.md... just scroll down 'till you see it, and get both platform.jar and jna.jar).

If you're on a 64-bit computer, the second thing to do is to set your jdk environment to use a 32-bit compiler.  I use Eclipse as my IDE, so this is done through the project properties.  One location is the Java Build Path - on the Libraries tab, make sure the JRE System Library is set to a 32-bit compiler.  In the Java Build Path screenshot below, you can see that all the locations are in C:\Program Files (x86) - this is an easy (although not foolproof) way to show that this is a 32-bit compiler.

While you're setting up stuff in this window, make sure the jna.jar and platform.jar are linked here as well (click "Add External JARs..." and locate those two files).

Another place to check in Eclipse is the Java Compiler settings, which should have "Use Compliance from execution environment..." checked.

### The Programming

The thing that makes this work is this part of the programming.  You can see in this that I create an interface t0 the Voyager DLL file by loading the DLL, and then setup some pointer objects to hold the memory pointer variable (the "state" variable in all of these) and set up the functions to read from the matrix.

The next part that makes this work is the actual programming. In the code below, the first thing I do is define vdll as an instance of the voyagerDLL interface.  Then, I open a matrix file (yes, it is hard-coded, but this is an example!), get the number of matrices, zones, the names, and I start reading the matrix (in the for loops).  I only print every 100th value, as printing each one makes this slow a bit. The actual reading is quite fast.  Finally, I close the matrix and the program terminates.

### Issues

The big issue I noticed is that if the matrix is not found, the Pointer variable returned by MatReaderOpen will be null, but nothing will be in the error value.  I've tried redefining the error value to be a string in the interface, but it does the same thing.  However, I don't recall if it did anything in C++.  At any rate, there needs to be some error checking after the matrix is opened to ensure that it actually has opened, else the program will crash (and it doesn't do a normal crash).

### Next Up

The next thing I'm going to do is the path files.

Just posted on Github: Path2CSV

This is a tool that will read a Cube Voyager Path file and output the contents by node to a CSV file.  The code is written in C++ and available under the GPL3 license.

I don't know about anyone else, but I do a lot of calculation prototyping in Excel before applying that in scripts.  One of the most recent was to do a script to add expansion zones (also known as "dummy zones", although they aren't really dumb, just undeveloped!).

The problem I had was related to the following equation:

R=INT((819-N)/22)+1   Where N={820..906}

In Excel, the results are as below (click on it if it is too small to see):

In Cube, I got the result of (click on it to expand, and I only took it into Excel to move stuff around and make it easier to see):

Note the sheer number of zeroes in the Cube version and all the numbers are 'off'.

The reason, as I looked into things was because of how INT() works differently in the two platforms.  In Cube, INT simply removes everything to the right of the decimal, so INT(-0.05) = 0, and INT(-1.05)=-1.  In Excel, INT rounds down to the nearest integer.  This means that negative values will be different between the two platforms.  Note the table below.

 Excel Cube 3.4 3 3 2.3 2 2 1.1 3 1 0.5 0 0 0 0 0 -0.5 -1 0 -1.1 -2 -1 -2.3 -3 -2 -3.4 -4 -3

While neither software is truly wrong in it's approach (there is no standard spec for INT()) it is important to know why things may not work as expected.

## What Have I Been Up To Lately?

First, I've done a few conversion tools for converting Tranplan/INET to Voyager PT and back again.  These are open-source tools that are meant to help, but they may not be perfect (and I don't have the time to make sure they do).  If anyone wants to upload fixes, you'll get credit for it (but you have to let me know, as I think I have to allow that in Github).

Next, I've been heavily working on QC of my transit on-board survey.  This has resulted in some more work being uploaded to Github.  I've written some to assist in trying to figure out what I need to actually look at and what is probably okay enough to ignore.

I've seen some stuff come out of the Census related to an API, and I did post some example code to the CTPP listserve to help.  Since I didn't want to bog down some people with my code, I put it in a Gist (which is below).

This code will get Census data using their API and chart it.  Note that you have to install PyGTK All-In-One to make it work.  Of course, mind the items that Krishnan Viswanathan posted to the Listserve - they help make sense of the data!

I'm also working on an ArcMap add-in that will help with QC-ing data that has multiple elements.  It is on Github, but currently unfinished.  This is something for advanced users.

I will have a few tips coming for some Cube things I've done recently, but those will be for another blog post.  With that, I will leave with the first publicly-available video I've ever posted to YouTube.  Of a traffic signal malfunction.  I'm sure Hollywood will start calling me to direct the next big movie any day now...

Sometimes I do things that don't really have a point... yet. One of them was pulling some information from GetSatisfaction (GSFN) to a Google Docs Spreadsheet (GDS). GSFN has an API that returns everything in JSON, so writing script in a GDS to pull in that information is quite easy.

The first step is to create a spreadsheet in Google Docs.  This will act as a container for the data.

The second step is to create a script to parse the JSON output and put it in the spreadsheet.  An example of this, which is a script I used to only get the topic, date, and type of topic (question, idea, problem, or praise).  It's simple, and it can be expanded on.  But for the sake of example, here it is:

function fillGSFN() {
var r=1;
for(var page=89;page<200;page++){
var jsondata = UrlFetchApp.fetch("http://api.getsatisfaction.com/companies/{COMPANY}/topics.json?page="+page);
var object = Utilities.jsonParse(jsondata.getContentText());
var sheet=ss.getSheets()[0];

for(var i in object.data){
sheet.getRange(r, 1).setValue(object.data[i].subject);
sheet.getRange(r,2).setValue(object.data[i].created_at);
sheet.getRange(r,3).setValue(object.data[i].style);
r++;
}
if(i!="14") return 1; //This was not a full page
}
}


This script is still a work in progress, and there are better ways to consume a JSON feed, but for what I was doing, this was a nice quick-and-simple way to do it.

One thing I've missed from the old TranPlan days was the reporting group.  We've used that for many years to compare our transit loadings by major corridor.  Unfortunately, that functionality was lost going to PT.  I still need it, though, and enter awk.

The script below looks at the transit line file and outputs ONLY the line code, comma-separated.  It uses a loop to check each field for ' NAME=' and 'USERN2', which is where we now store our reporting group codes.

BEGIN{
FS=","
RS="LINE"
}
{
for (i=1;i<20;i++)
{
if($i~/ NAME=/) { printf "%s,",substr($i,8,length($i)-8) } if($i~/USERN2/)
{


Note that TCPTR00A.PRN is the transit assignment step print file, and UnassignedTransitTrips.PRN is the destination file. The {print $2,$5,$7} tells gawk to print the second, fifth, and seventh columns. Gawk figures out the columns itself based on spaces in the lines. The >UnassignedTransitTrips.PRN directs the output to that file, instead of listing it on the screen. The UnassignedTransitTrips.PRN file should include something like:  1 I=3 J=285, 1 I=3 J=289, 1 I=3 J=292, 1 I=6 J=227, 1 I=7 J=1275,  The first column is the number of unassigned trips, the second column is the I zone, and the last column is the J zone. This file can then be brought into two Matrix steps to move it to a matrix. The first step should include the following code: RUN PGM=MATRIX PRNFILE="S:\USER\ROHNE\PROJECTS\TRANSIT OB SURVEY\TRAVELMODEL\MODEL\TCMAT00A.PRN" FILEO RECO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF", FIELDS=IZ,JZ,V FILEI RECI = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\UnassignedTransitTrips.PRN" RO.V=RECI.NFIELD[1] RO.IZ=SUBSTR(RECI.CFIELD[2],3,STRLEN(RECI.CFIELD[2])-2) RO.JZ=SUBSTR(RECI.CFIELD[3],3,STRLEN(RECI.CFIELD[3])-2) WRITE RECO=1 ENDRUN  This first step parses the I=, J=, and comma out of the file and inserts the I, J, and number of trips into a DBF file. This is naturally sorted by I then J because of the way PT works and because I am only using one user class in this case. The second Matrix step is below: RUN PGM=MATRIX FILEO MATO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.MAT" MO=1 FILEI MATI[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF" PATTERN=IJM:V FIELDS=IZ,JZ,0,V PAR ZONES=2425 MW[1]=MI.1.1 ENDRUN  This step simply reads the DBF file and puts it into a matrix. At this point, you can easily draw desire lines to show the unassigned survey trips. Hopefully it looks better than mine! ## Getting the 2nd Line through the Last Line of a File June 24th, 2011 One recent work task involved compiling 244 CSV traffic count files and analyzing the data. I didn't want to write any sort of program to import the data into Access or FoxPro, and I didn't want to mess with it (since it would be big) in Excel or Notepad++. So, I took the first of the 244 files and named it CountData.csv. The remaining files all begin with 'fifteen_min' and they are isolated in their own folder with no subfolders. Enter Windows PowerShell really powered up with GNUWin. One command: awk 'NR==2,NR<2' .\f*.csv >> CountData.csv awk is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports (source: Wikipedia). The first argument, NR==2 means start on record #2, or the second line in the file. The second argument, NR<2, means end on the record less than 2. In this case, it always returns false, and thus the remainder of the file is output. The .\f*.csv means any file in this folder where the first letter is f and the last 4 letters are .csv (and anything goes between them). The '>> CountData.csv' means to append to CountData.csv Once I started this process, it ran for a good 45 minutes and created a really big file (about 420 MB). After all this, I saw a bunch of "NUL" characters in Notepad++, roughly one every-other-letter, and it looked like the data was there (just separated by "NUL" characters). I had to find and replace "\x00" with blank (searching as Regular Expression). That took a while. Acknowledgements: The Linux Commando. His post ultimately helped me put two and two together to do what I needed to do. Security 102. The NUL thing. ## Emailing an alert that a model run is complete in Cube Voyager March 6th, 2011 When you are doing many model runs, it makes life easier to know if the modelrun is complete. The code is below.  SENDMAIL, SMTPSERVER='MAILSERVER HERE', FROM='from@somwehere.com', TO='to@somewhere.com', SUBJECT='Subject Line Here', MESSAGE='Message Goes Here', USERNAME='username', PASSWORD='password'  The things you replace here are pretty obvious. If you have questions about the SMTPSERVER parameter, ask your IT person. Also, for Windows domains, the USERNAME parameter should be 'DOMAIN\USERNAME' (you may be able to use your email address, depending on your email setup). ## Adding a Search Engine in Chrome to Track UPS Shipments December 22nd, 2010 One of the cool features of the Google Chrome Browser is the ability to add search engines and search them from the address bar. This tip builds on that capability to track UPS shipments based on their UPS Tracking Number. The first step is to go to the options menu by clicking on the wrench icon and going to Options: The second step is to go to the Basics tab (or on Mac, click on the Basics icon) The third step is to add the search engine. On Windows, click Add, and then fill out the resulting form, on OS X, click the '+' button and do the same. Windows Form: The following are the items for the form: Name: UPS Keyword: UPS URL: http://wwwapps.ups.com/WebTracking/processInputRequest?sort_by=status&tracknums_displ ayed=1&TypeOfInquiryNumber=T&loc=en_US&InquiryNumber1=%s&track.x=0&track.y=0 NOTE: The entire URL above should be one line with no spaces! Click OK on everything (or in some cases, the red circle on OS X). To use this, open Chrome, type 'ups' in the address bar and press Tab and enter the tracking number (copy-paste works well for this). Once you press Enter, you will immediately go to the UPS website showing your tracking information. In this case, my shipment won't make it by Christmas. Oh well. ## Python and Cube December 19th, 2010 One of the advantages of the ESRI ArcGIS Framework is that you can write Python scripts that do GIS things and run them from Cube. Even better, Cube uses the Geodatabase format, so you can store and retrieve things from there. The first thing that is needed is a python script. The below is an example that we're not using at the moment, but it merges multiple transit line files together. import arcgisscripting, sys, os gp=arcgisscripting.create() gp.AddToolbox("C:/Program Files/ArcGIS/ArcToolBox/Toolboxes/Data Management Tools.tbx") print sys.argv input1=sys.argv[1] input2=sys.argv[2] output=sys.argv[3] in1=input1+input1[input1.rfind("\\"):]+"_PTLine" in2=input2+input2[input2.rfind("\\"):]+"_PTLine" input=in1+';'+in2 input=input.replace("\\","/") output=output.replace("\\","/") print input print output if gp.Exists(output): gp.delete_management(output) #input=input1+"_PTLine" +';'+input2 +"_PTLine" gp.Merge_management(input,output) print gp.GetMessage del gp To call this, we add the following in a Pilot script: *merge.py {CATALOG_DIR}\Inputs\Transit.mdb\Routes1 {CATALOG_DIR}\Inputs\Transit.mdb\Routes2 {CATALOG_DIR}\Inputs\Transit.mdb\RoutesCombined This makes it easy to create geoprocessing steps in ArcMap, export them to Python, and call them from the model. ## Top 6 Resources for a Travel Modeler to Work From Home December 16th, 2010 It's the most wonderful time of the year, isn't it? Nothing says "winter" like 6 inches of snow that keeps you from going to the office! Over the years, I've amassed a set of utilities, many of them free, to make my life easier. This list can sometimes take the place of things that I would normally use in the office, other times they are things that sync to the "cloud" and I use them both in the office and at home. 1. Dropbox I don't care too much for USB "thumb" drives, and I've had my fair share of leaving them at home or at work and needing them at the opposite location. Dropbox clears up this mess, as there are no USB drives to lose or leave in the wrong place. NOTE: the link that I have IS a referral link. Clicking on that and creating an account results in both of us getting an extra 250 MB of space with the free account (starts at 2 GB, max for free is 8 GB). 2. Evernote I take a lot of notes, both on the road at conferences and at the office. Evernote is what I use to keep them organized. Unless you want to spring for Microsoft Office at home, Google Docs is the way to go. There are several others including Zoho and Office Online, but I haven't used them. Google Docs has great collaboration features, document versioning, and its free. Just make sure to back it up! The only problem: no DBF file support. This is perhaps the greatest text editor. It understands and does some context highlighting (etc) for many programming languages. Even better, Colby from Citilabs uploaded his language definition file for Cube Voyager to the user group! The Express Edition tools have become our go-to tools for new development, particularly MS Visual C++ EE and MS Visual Basic EE. Since they're free, you can have copies both at home and work. 6. Eclipse This one's almost optional, but for those working with Java models, this is the standard IDE, and it is open source. Any tools to add? Add them in the comments below. ## Using a Class Object to Help Read a Control File December 5th, 2010 One thing we're used to in travel modeling is control files. It seems to harken back to the days of TranPlan where everything had a control file to control the steps. In my case, I have a control file for my nested logit mode choice program, and because of the age of the mode choice program, I want to redesign it. The first part of this is reading the control file, and I did a little trick to help with reading each control file line. With C++, there is no way to read variables in from a file (like there is with FORTRAN). The first part of the code reads the control file, and you will see that once I open and read the control file, I section it out (the control file has sections for files ($FILES), operation parameters ($PARAMS), operation options ($OPTIONS), and mode choice parameters ($PARMS). Each section ends with an end tag ($END). This adds the flexibility of being able to re-use variables in different locations.

After the section, the next portion of the code reads the line and checks to see if FPERIN is found. If it is, a ControlFileEntry object is created. This object is a class that is used to return the filename held in the object. This makes it easy to reduce code.

int readControlFile(char *controlFileName){
cout &lt;&lt; "Reading " &lt;&lt; controlFileName &lt;&lt; endl;
string line;
bool inFiles=false, inParams=false, inOptions=false, inParms=false;
ifstream controlFile(controlFileName);
if(!controlFile.good()){
cout &lt;&lt; "PROBLEMS READING CONTROL FILE" &lt;&lt; endl;
return 1;
}
while(controlFile.good()){
getline(controlFile,line);
//check the vars sections
if(line.find("$FILES")!=string::npos) inFiles=true; if(line.find("$PARAMS")!=string::npos)
inParams=true;
if(line.find("$OPTIONS")!=string::npos) inOptions=true; if(line.find("$PARMS")!=string::npos)
inParms=true;

I can't type all this without bringing up another big issue that CAN negate the above.  General Planning Consultants and General Engineering Consultants.  The GPC and GEC contracts are always put up for RFQ, and handing a scope to a GEC or GPC consultant is NOT the same as sole-source.  This method is perfectly legal (it is open to public review and open to all consultants to submit statements of qualifications) and is a great way to get smaller (less than $100k, perhaps) jobs to consultants without them spending a lot of money trying to get smaller jobs. They have to spend their marketing money up-front, but over the 3-5 year span, they have plenty of opportunity to make it back on smaller jobs that have very small marketing requirements. RFPs Only to Certain Consultants? Again, 2c, conflict of interest - public agencies cannot perform the work of the public good using the fewest tax dollars without having an open bid process. Also, it is pretty likely that every state requires RFPs and RFQs to be advertised. That being said, what's the point? You're going to send the RFP to 2 or 3 consultants but post it on your website (and for us, the newspaper, state DOT website, and various other locations as required by law and our policy) for all to see? Sounds like a pretty ineffective way to only target a few consultants. If you only want certain consultants to respond, find a way to do it, legally, without giving the opportunity for other consultants to not compete for it. ## Separating Intent and Unintended Effects March 21st, 2010 On March 7, 2010 at Atlanta Motor Speedway (AMS), an interesting crash happened in the larger context of NASCAR. Carl Edwards intentionally got into the side of Brad Kesolowski, causing Kesolowski to spin around, become airborne, and land on his side with momentum sending Kesolowski's car into the wall (video). This was almost inverse of the Talledega spring race in 2009 where Edwards unintentionally came down on Kesoloski, spun around, became airborne, got hit by another car in the process, and hit the safety fence that separates the track from the stands(video). The big difference between these two scenarios was intention. Earlier in the race at AMS, Kesolowski got into the side of Edwards, causing Edwards a long repair and a poor finish. NASCAR handed down a three-race probation to Edwards after parking him for the remainder of the race at AMS. The debate as to whether that was the most appropriate disciplinary action have been swirling around NASCAR for weeks (and still is at the time of writing). This post is not about whether NASCAR made the right or wrong decision, but rather how it relates to management. You have to understand the history behind the wing. If you've watched the videos above, you've seen two of three. The other piece of history is at this video. The scenario at AMS is the third time that a car has become airborne after being turned around. The probation that Edwards faces (and no suspension or fine, mind you) was because Edward's intent was to mess up Kesolowski's 6th place finish with a spin to the infield. Edwards didn't intend for the vehicle to flip, and the vehicle should not have flipped. In fact, the severe crash was likely caused more by the wing on the back of the car (which has now been replaced with a spoiler), not by Edwards's intentionally spinning Kesolowski. This is quite a conundrum for NASCAR. They control the design of the car very strictly. They also said that the drivers could use a little less restraint after feeling a lot of criticism over the 2009 season where they made rules that limited the drivers actions. Drivers and teams are not allowed to make decisions as to whether they use the wing or not. They have to use it. The important thing here is, as a manager, make the decision looking at all pieces of information and all parts of history. Look at what you've told your employees. Look at what has happened in the past that your employees should have been aware of. Look at what you would have done in that situation, particularly if you weren't a manager. Discuss the issue with the employees involved. Do not make rash decisions and do not let emotions be the only thing that guides your decisions. ## Romanian street sign warns drivers of 'drunk pedestrians' - Telegraph March 15th, 2010 In what is perhaps an accidental approach to reducing pedestrian crashes using the first step of "the three Es" (education, enforcement, engineering), Pecica, Romania has installed signs that warn of drunk pedestrians ahead. While a little odd, I applaud the mayor for experimenting with a low-cost, low-impact way to handle the problem. I hope it works. ## Former DOT Secretary weighs in on Transportation Bill October 15th, 2009 I agree with the thoughts of increased tolling and more fees other than the gas tax. I also agree with$1B per year for technology, but it has to be managed right.

I'm also glad that the performance measures are measurable:

• Congestion (we can measure that - it is the percent of a region's network that is operating with a demand greater than its capacity)
• Costs (we can measure that, although we have to watch how we do it, as we don't want to have a system be considered bad if gas prices hit \$4/gallon)
• Safety (we DO measure this - it is the number of injuries and deaths on the road)

## What are those little green boxes???

Traffic Counter on side of road

First off: how these things work

Those that have been around for 30 or more years may remember when some gas stations had a hose that rang a bell to call a station attendant to pump your fuel. Those that don't should watch Back to the Future. This is the same basic concept for most traffic counters. There are hoses that go across the road, and based on what the sensors feel and the time between them, these little green (or sometimes gray) boxes calculate the number of axles, distance between them (which can be used to derive the type of vehicle), and the speed.

I know that speed is a big issue with a lot of people. After all, some of these counters are being used for speed studies to see if they want to put a cop on a road at a certain time. This does happen, despite my wishes that cops and others would use less-detectable methods for enforcement. There are two other ways that counts, with speed, can be taken. One is by RADAR (the same thing they use for active speed enforcement). Mind you, for speed sampling, RADAR is pretty useful when installed correctly, and the boxes can be hidden quite well. The other is using magnetic loops. There are portable models of these that sit in the lane and are difficult to see (and also susceptible to theft). There are also permanent models that can be completely hidden from view.

One thing I can say with ALL hose counters: WE CANNOT USE THEM FOR SPEED ENFORCEMENT! The units do not have any cameras (etc), so if you speed while going over them, we know you did, but we don't know who you are!

Second off: How We Use The Data We Get From These Things

This one differs by jurisdiction, but most use it for traffic studies. Speed, count, and vehicle type are very useful for roadway improvement design. Another use is for travel model validation. We (specifically me, since it is my job) use this to ensure that the region's travel model is accurate so when we use it to plan billions of dollars in improvements, we know we're not just guessing, which would be a waste of money.

Law enforcement will use the number of speeders per unit of time to plan when to run patrols. As I indicated, I wish they wouldn't use hose counters for this, but they do, and the data they get is accurate. However, hoses are pretty conspicuous, which is why I wish they wouldn't use them.

We cannot use the data in court. You cannot be detected to be going 45 MPH in a 25 MPH zone based on a traffic counter. The counters do not have cameras in them, and none that I know of can connect to a camera. A camera would be required to prove who was speeding. Without the connection, it would be difficult to prove, since the times would have to be the same, the counter has to be operating perfectly, and the hoses have to be measured very precisely. Some states also forbid the use of cameras for passive law enforcement (a cop can actively use a RADAR+camera, but not mount one on a pole and get every car that is speeding).

The War Stories

I have two, both given to me by a salesperson for Jamar Tech, one of the leading traffic counter manufacturers.

City of Boston Thinks a Counter is a Bomb. This one is proof that some cops don't use hose counters, else they would have known what this unit is.

Counter burned, likely by an accelerant. PDF from Jamar, which the salesperson sent me just after I bought 8 counters from him.

Don't Mess With Them!

It amazes me that 1 month into the season, I've had to replace several hoses because of cut or stolen hoses. This is your tax dollars at work. The more hoses we have to replace, the less money we have to improve the roads.

It occurred to me that many people likely do not understand all of the terminology of travel demand models.  Because of this, I felt the need to list many of them here. Read the rest of this post... »

I've occasionally seen some road nicknames that are particularly good.  A few that I've heard:

• Malfunction Junction (I-275 and I-4, Tampa, FL)
• The Riddle in the Middle (Alaska Way, Seattle, WA)
• Spaghetti Junction (I-85 and I-285, Atlanta, GA)

I've also started calling a strech of Columbia Parkway (Cincinnati, OH) "The Suicide Side", which is a 45 MPH arterial that everyone goes 60 MPH.  The divider is a double-yellow line... only.

Trip generation is likely one of the easiest parts of the four step process.  Normally, the most difficult part of dealing with trip generation is getting the input socioeconomic (population and employment) data correct.  This post explains how trip generation is calculated in the model... Read the rest of this post... »

The center of most travel demand models is the "Four Step Model".  This model was created in the 1950s to determine the demand on roadways.  The four steps include:

1. Trip Generation
2. Trip Distribution
3. Mode Choice
4. Trip Assignment