Travel Model Reports with R and knitr
I've had my share of complaining about the various "reporting" platforms available to those of us that do travel modeling. I've looked at a few options, and nothing has stuck as "the one". Until now.
In the meantime, I've noticed that a lot of groups have adopted Markdown. It's found it's way onto Github via Jekyll. Jeckyll's found it's way into my life as a quick blogging and site-building solution. Then, I stumbled upon RStudio RMarkdown. This is becoming a goldmine because RStudio is a great platform for developing things in R (including presentations and R Markdown). Even better, the RMarkdown documents can be run via R (in a batch wrapper). The only missing link is the ability to read matrix files directly. I guess we can't have everything, but I have a solution for that, too.
What Is This Markdown Thing And Why Should I Care?
Markdown is a pretty easy thing to grasp. It's open and fairly flexible. It's a text markup format that is easy to read when not rendered. Easy to read means easy to write. The open-ness means that you can do things with it. In the case of RMarkdown, you can add R code blocks and LaTeX equations. I will admit that LaTeX equations are not legible until rendered, but when you start adding R in the equation, the focus shifts less on reading the unrendered RMarkdown and more on reading the rendered output.
The link to Github (above) goes to their Markdown cheat sheet. That alternates between Markdown and HTML output and it's pretty easy to see how things work.
Getting Model Run Results into RMarkdown and into Rendered Output
There's a number of things that need to happen to get model run results into R/RMarkdown and then to Output:
- Output data to a format R understands
- Write RMarkdown document
- Write RScript to render RMarkdown to HTML
- Write Windows Batch File to automate the RScript
Output data to a format R understands
In the case of zonal data, R can understand CSV out of the box, and with the appropriate library, can understand DBF. With matrix files, Voyager will export them to DBF with a simple script file:
This script simply reads a bunch of matrix files and outputs them to two DBF files, one for the peak-period distribution and one for the off-peak-period distribution.
One important thing to note in this is that I didn't put paths in this. I run this from the command line in the model folder and it picks up the files in that folder and outputs the DBFs into that folder. This is something that would have to be testing when placed into a model run file.
Resist the urge to do this in two separate steps. The join process in R takes forever, and reading the data into memory may take a while, too.
Write RMarkdown document
The RMarkdown document is where the magic happens. Fortunately, Knitr (the R Package that does all this magic) does a few things to split code blocks. If you want to build my file, add all these together into one file and name it something.rmd
There are three code blocks that do this. They are importing, summarizing, and graphing.
This block does three things:
- Loads libraries. The foreign library is used to read DBF files. The plyr library is used to join and summarize the data frames (from the DBF inputs). The ggplot2 library is used for plots.
- Sets a few variables. Since the OKI model is actually two MPO models, we do three reports of everything - one for the entire model, one for the OKI (Cincinnati) region, and one for the MVRPC (Dayton) region. zones_oki and zones_mv are used to control which report is which.
- Imports the DBF files. Those read.dbf lines are where that magic happens. Again, since this is run in the model folder, no paths are used.
This block does three things:
- It rounds the logsum values to provide some grouping to the values
- It gets a subset of the model (for OKI)
- It summarizes the rounded values to prepare for charting them
This block does one thing: it draws the chart using the ggplot tool. This is pretty advanced (although not bad, and the online help is good). However, for this I'm going to hold to my recommendation to get a copy of The R Graphics Cookbook (where I learned how to use ggplot). The concepts and examples in the book are far greater than what I will post here.
One point that should not be lost is that text elements (either Markdown headings, etc., or just text, or formatted text) can be added into this outside of the ```...``` blocks. This way, reports can actually look good!
Once this part is complete, the hardest stuff is over.
Write RScript to render RMarkdown to HTML
The RScript to render the RMarkdown file to HTML is pretty simple:
This writes the .Rmd file out to the same filename as .html. You can have as many knit2html lines as needed
There are ways to write the files out to PDF (I haven't looked into them... perhaps that would be a future blog topic?).
Write Windows Batch File to automate the RScript
The last step is to write a batch file "wrapper" that can be run as part of the travel demand model run. This is really quite easy:
The first line sets the path to include R (on my system it isn't in the path, and my path statement is already 9 miles long). The second line runs the R script file (ReportR.R) in R.
That's It! It seems like a lot of work goes into this, but it isn't as difficult as some of the other reporting platforms out there.
PS: Stay tuned for some example reports
Example Report (generated from RMarkdown file)