Iterating Through DBFs - R Style!

March 6th, 2014

Anyone familiar with transportation modeling is familiar with processes that iterate through data.  Gravity models iterate, feedback loops iterate, assignment processes iterate (well, normally), model estimation processes iterate, gravity model calibration steps, shadow cost loops iterate... the list goes on.

Sometimes it's good to see what is going on during those iterations, especially with calibration.  For example, in calibrating friction factors in a gravity model, I've frequently run around 10 iterations.  However, as an experiment I set the iterations on a step to 100 and looked at the result:

This is the mean absolute error in percentage of observed trips to modeled trips in distribution.

This is the mean absolute error in percentage of observed trips to modeled trips in distribution.  Note the oscillation that starts around iteration 25 - this was not useful nor detected.  Note also that the best point was very early in the iteration process - at iteration 8.

After telling these files to save after each iteration (an easy process), I faced the issue of trying to quickly read 99 files and get some summary statistics.  Writing that in R was not only the path of least resistance, it was so fast to run that it was probably the fastest solution.  The code I used is below, with comments:

Mapping with R Markdown

November 20th, 2013

A natural extension of my last blog post about using R Markdown is to take it a step further and make maps with it.  RMarkdown is a great tool to not only create the map, but to automate creating the map AND putting the map in a document.

This is a work-in-progress.  This is a day late (I've been trying to keep to a Tuesday posting schedule to this blog, and well, it's Wednesday afternoon when I'm finally typing this). I'm still not happily done, but I'm also ANGRY at the the output of the distribution model (now that I've changed a lot of stuff in the model), so I'm having to take a step back and redo friction factors, so rather than post a finished product, I decided to post the work in progress, which is probably as helpful as the finished product.

If you haven't read last week's post on using RMarkdown, do so now.  I'll wait.  Back?  Good.

The code:

The first part, above "Manually create the lines" is basic library-loading and data input (and some manipulation/subsetting)

The second part is creating the desire lines.  These are created by first creating a list with two coordinate pairs in it (l1 - l7).  Those objects are then created into individual lines (Sl1-Sl7).  The individual lines are packaged into one Lines obeject (basically, a list of lines) (DL).  Finally, that object is prepared for the map (deslines<-list...).

The third part is the text.  There are two functions here, one is to get the mid-point of the line, the other is to get the angle of the line.  There was a lot of trial and error that went into this.  In the lines after the functions, the txt1-txt7 and mpt1-mpt7 objects, the text is formatted and map objects are created for them.

The fourth part is the counties for the map.  The col=... and cpl<-... lines handle the colors for the counties.

The last part is drawing the map.  The spplot() function handles all of that.  The primary map is made out of the counties, and the lines and text is added in the sp.layout=... portion of the command.

That's it!  It really isn't difficult as long as you remember trigonometry (and of course, you don't even have to do that since it is in my code above.  I also included some references at the bottom of the most useful resources when I was doing this, as there are many, many, MANY more ways to do this, options to play with, etc.

Example Report.  NOTE: THE DATA IS INCORRECT.

Travel Model Reports with R and knitr

November 12th, 2013

I've had my share of complaining about the various "reporting" platforms available to those of us that do travel modeling.  I've looked at a few options, and nothing has stuck as "the one". Until now.

In the meantime, I've noticed that a lot of groups have adopted Markdown.  It's found it's way onto Github via Jekyll.  Jeckyll's found it's way into my life as a quick blogging and site-building solution.  Then, I stumbled upon RStudio RMarkdown.  This is becoming a goldmine because RStudio is a great platform for developing things in R (including presentations and R Markdown).  Even better, the RMarkdown documents can be run via R (in a batch wrapper).  The only missing link is the ability to read matrix files directly.  I guess we can't have everything, but I have a solution for that, too.

What Is This Markdown Thing And Why Should I Care?

Markdown is a pretty easy thing to grasp.  It's open and fairly flexible.  It's a text markup format that is easy to read when not rendered.  Easy to read means easy to write.  The open-ness means that you can do things with it.  In the case of RMarkdown, you can add R code blocks and LaTeX equations.  I will admit that LaTeX equations are not legible until rendered, but when you start adding R in the equation, the focus shifts less on reading the unrendered RMarkdown and more on reading the rendered output.

The link to Github (above) goes to their Markdown cheat sheet.  That alternates between Markdown and HTML output and it's pretty easy to see how things work.

Getting Model Run Results into RMarkdown and into Rendered Output

There's a number of things that need to happen to get model run results into R/RMarkdown and then to Output:

  1. Output data to a format R understands
  2. Write RMarkdown document
  3. Write RScript to render RMarkdown to HTML
  4. Write Windows Batch File to automate the RScript

Output data to a format R understands

In the case of zonal data, R can understand CSV out of the box, and with the appropriate library, can understand DBF.  With matrix files, Voyager will export them to DBF with a simple script file:

This script simply reads a bunch of matrix files and outputs them to two DBF files, one for the peak-period distribution and one for the off-peak-period distribution.

One important thing to note in this is that I didn't put paths in this.  I run this from the command line in the model folder and it picks up the files in that folder and outputs the DBFs into that folder.  This is something that would have to be testing when placed into a model run file.

Resist the urge to do this in two separate steps.  The join process in R takes forever, and reading the data into memory may take a while, too.

Write RMarkdown document

The RMarkdown document is where the magic happens.  Fortunately, Knitr (the R Package that does all this magic) does a few things to split code blocks.  If you want to build my file, add all these together into one file and name it something.rmd

There are three code blocks that do this.  They are importing, summarizing, and graphing.

Importing Data

This block does three things:

  1. Loads libraries.  The foreign library is used to read DBF files.  The plyr library is used to join and summarize the data frames (from the DBF inputs).  The ggplot2 library is used for plots.
  2. Sets a few variables.  Since the OKI model is actually two MPO models, we do three reports of everything - one for the entire model, one for the OKI (Cincinnati) region, and one for the MVRPC (Dayton) region.  zones_oki and zones_mv are used to control which report is which.
  3. Imports the DBF files.  Those read.dbf lines are where that magic happens.  Again, since this is run in the model folder, no paths are used.

Summarizing Data

This block does three things:

  1. It rounds the logsum values to provide some grouping to the values
  2. It gets a subset of the model (for OKI)
  3. It summarizes the rounded values to prepare for charting them

Charting Data

This block does one thing: it draws the chart using the ggplot tool.  This is pretty advanced (although not bad, and the online help is good).  However, for this I'm going to hold to my recommendation to get a copy of The R Graphics Cookbook (where I learned how to use ggplot).  The concepts and examples in the book are far greater than what I will post here.

One point that should not be lost is that text elements (either Markdown headings, etc., or just text, or formatted text) can be added into this outside of the ```...``` blocks. This way, reports can actually look good!

Once this part is complete, the hardest stuff is over.

Write RScript to render RMarkdown to HTML

The RScript to render the RMarkdown file to HTML is pretty simple:

This writes the .Rmd file out to the same filename as .html. You can have as many knit2html lines as needed

There are ways to write the files out to PDF (I haven't looked into them... perhaps that would be a future blog topic?).

Write Windows Batch File to automate the RScript

The last step is to write a batch file "wrapper" that can be run as part of the travel demand model run.  This is really quite easy:

The first line sets the path to include R (on my system it isn't in the path, and my path statement is already 9 miles long). The second line runs the R script file (ReportR.R) in R.

 

That's It! It seems like a lot of work goes into this, but it isn't as difficult as some of the other reporting platforms out there.

PS: Stay tuned for some example reports

Update:

Example Report (generated from RMarkdown file)

Example RMarkdown File