## Finding Stuff in Big CSV Files

October 16th, 2017

If you have an activity based model OR big(ish) data, from time-to-time, you need to find something. One record, possibly one in half a million or one in a million. You need GNUWin tools for these if you're on Windows.

# Getting the First Line

Getting the first line is pretty easy with the head command:

>head -n 1 file.csv

>head -n 1 jointParticipantResults.csv
id,tourid,hhid,hhsize,purpose,partytype,participantNo,pNum,personType,HhJoint

If you want the last, record, replace 'head' with 'tail'.

# Getting the Number of Rows

This is a pretty simple awk script that returns the number of rows:

>awk 'END {print NR}' jointParticipantResults.csv

# Getting a Specific Record

This is a simple awk script that returns the row where the  third field is 158568.  Looking at the first script above, the third field is the hhid field:

>awk '$3 == 158568 {print$0}' FS="," jointParticipantResults.csv

Note the FS part - that tells awk that the field separator is a comma.

## GnuWin32 Trick: Quickly Finding Text in File

June 1st, 2017

The downside of not managing a good library of scripts is forgetting where some code is written.  Case in point: I wrote a nice RMSE script, but I forgot where it was.

So I found it with the following command line:

grep "rmse" $ls -R ./*/*.R ./*/*.r The first part of the script - grep "rmse" - tells the grep command to look for rmse. The second part -$ls -R .//.R .//.r - tells what files to look through (that command is list recursive looking for *.R and *.r files).

## Updating Fields In One Table From Another in FoxPro

March 20th, 2017

With Microsoft products dropping support for DBF files, I've been using FoxPro 7 much more now.

One of the more annoying things in FoxPro (and maybe other systems) is when you update a DBF field from another, a simple UPDATE query does not work.  You need to use:

SET RELATION TO table.field INTO otherTable
REPLACE destField WITH otherTable.field FOR table.joinField = otherTable.joinField

This is a fairly annoying way to do it, but it works.  Both statements are required.

Cheers!

## ArcMap Showing Negative Values?

October 6th, 2016

This post is an actual exaggerated account of events in the office.  Names of the innocent have been protected.

I'm sitting in my office working on a way to streamline checking data from a large and terrible organization and I click on one of our links and see the worst possible thing: a negative AADT (Annualized Average Daily Traffic, basically a traffic count!).

AADT's can't be negative. They'd cause a rift in the space-time continuum.

After screaming across the aisle to the poor soul that has to fix things like this (who replied with an exacerbated "WHAT?" when I said "Hey, why are the AADT's negative on the Big Mac Bridge?" and we both checked the model output to see that, indeed, the model output claims that the AADT is positive, I started looking further into the problem.

I started with FoxPro (I may be the last person with FoxPro installed on their computer, but IT can uninstall it over my cold dead body AND after removal of my ghost) and found that the AADT should be 44,043 (in one direction).

Then I remembered that we've had this problem before, but I thought that bug was fixed (silly me for thinking multiple-year-old bug get fixed).  I looked into it on the web and found a statement from one of ESRI's own indicating that the problem is with ArcWhatever* converting 5-width numbers in DBFs to be short integers (which range from -32,767 to 32,767).  44,043? Too big, so the number gets displayed incorrectly.

The fix is surprisingly simple: change the width of the field in FoxPro from 5 to 6.

That 5, it's bad! Make it a 6.

Once the fix is made, the data is displayed as a 32-bit integer.

Fixed!

Notes:

• I first saw this with a data layer served by ArcServer. Or ArcSDE. Heck, I don't know the difference, I just ask our GIS department for a REST endpoint and they take care of me and that's where I actually saw this.

## Quickie: WMV Screen Captures and Animated GIFs

August 11th, 2016

Occasionally, I do screen captures.  I use the Microsoft Expression Encoder.  It's not pretty, but I think it's free.  The output, by default, is a WMV (Windows Media Video) file.

YouTube is perfectly okay with WMV files.  However, for an animation to show up in Twitter, it has to be an animated GIF with a limit of 5 MB.

Enter ImageMagick:

convert -resize 50% -deconstruct -layers optimize inputFile.wmv outputFile.gif

This took a 14-20 MB animated GIF (uncompressed) to 1.36 MB.  Which is nice, because then I can show gems like this:

Cheers!

## Announcing The Ruby OMX API

October 23rd, 2015

I am happy to announce that there is now a Ruby API for OMX.  This is a read-only API that supports a few ways of reading a matrix, returning an array of J for a given I, an array of I for a given J, and returning the value at a matrix address.

More documentation is available on Github (including the all-important install instructions).

Let me know if you have any questions.  Post issues and bugs to the Github issues tracker.

The motivation behind yet-another-API, Ruby seems (operative word!!!) that it handles being a web-based API better than a lot of other languages.  I’ve built a few just to test things out - for example, I built a versioned API that responds with random quotes from Yogi Berra... And please don’t build that in to anything, I have the free Heroku plan, so that may disappear at a random time!  Aside from the time that it took to adapt my mess of Voyager+Java+C+++Python(GRRR!)+Basic syntax to Ruby, it wasn’t at all difficult and it is incredibly easy to add another API version.  I would like to have a semi-live map of skims that I can click on a zone and see colors for the selected attribute/matrix (e.g. travel time).

## Standard Deviation Differences Between Excel and R (and my code in Cube Voyager)

July 24th, 2015

I had a need to get the correlation of count to assignment in Cube Voyager.  I don't know how to do this off the top of my head, and I'm instantly mistrusting of doing things I'd normally trust to R or Excel.  So I looked on Wikipedia for the Pearson product-moment correlation coefficient and ended up at Standard Deviation.  I didn't make it that far down on the page and used the first, which generally made Voyager Code like this:

I left the print statements in, because the output is important.

Avg AADT_TRK = 1121.77
Avg VOLUME = 822.03
n = 230.00

sdx1 = 1588160175
sdy1 = 1196330474
n = 230.00
sd AADT_TRK = 2627.75
sd Volume = 2280.67
r2 = 155.06

Note the standard deviations above.  Ignore the R2 because it's most certainly not correct!

Again, mistrusting my own calculations, I imported the DBF into R and looked at the standard deviations:

> sd(trkIn$AADT_TRK) [1] 2633.476 > sd(trkIn$V_1)
[1] 2285.64

Now Standard Deviation is pretty easy to compute.  So WHY ARE THESE DIFFERENT?

Just for fun, I did the same in Excel:

WTF? Am I right or wrong???

So I started looking into it and recalled something about n vs. n-1 in the RMSE equation and discussion in the latest Model Validation and Reasonableness Checking Manual.  So I decided to manually code the standard deviation in Excel and use sqrt(sum(x-xavg)^2/n-1) instead of Excel's function:

Looky there! Matching Numbers!

It's not that Excel is incorrect, it's not using Bessel's Correction.  R is.

/A

## Running Python in Atom.io in Windows

June 25th, 2015

I already hate Python, but their "IDE" makes it worse.  Fortunately, Atom.io can fix the IDE problem.

Atom.io is a Github product that whips the IDLE Python's ass.  It's not actually an IDE, it's  a text editor, but it's a text editor on steroids.

I stumbled upon a plugin for Atom to run various languages right in the window, including Python.  The problem is that it doesn't work right out of the box in Windows.  Fixing this is easy:

1. Go to File - Settings (you can also press CTRL+,)
2. Select Install
3. In the search box, type "script" and it should come up after a few seconds
4. Click install

Once it is installed (which should take less than a minute on a modern Internet connection), you will need to update the startup script to fix the path.  To do that:

1.  Go to File - Open Your Init Script
2. Add the following line

process.env.path = ["C:\Python27\ArcGIS10.2",process.env.PATH].join(";")

NOTE: I'm using the ArcGIS-bundled Python - you may need to fix that path!

Once the init script is updated, close and re-open Atom, and you should be able to select Packages - Script - Run Script (or press CTRL+SHIFT+B) to run a Python script.

/A

(my hate for Python is well-known!)

## R Quick-Take: Reading a Ton of Files in a Few Lines

December 1st, 2014

I just downloaded 2,159 traffic count files over the Internet. I'm going to have to work with these in various ways.

So the following quick snippet of code reads all of them into one data frame:

## Using R with Phant

November 3rd, 2014

Last week on another blog, I showed a way to connect a temperature and humidity sensor to a Beaglebone Black and read it using some Python code that ultimately started with Adafruit.

So to be able to play (a little) AND after complaining about my office temperature last week, I decided to plug this thing in at work and set it up with Phant, the IOT data service from Sparkfun.  Then I wrote a quick R script to get the JSON file and plot the temperature.

Plot of Temperature

The code is below:

It's pretty simple, although the plot could use me spending a bit more time on it... and perhaps limiting the data to the last hour or day or something.  All of that can be done within R pretty easily.  Also, I did make a file available for anyone that wishes to play (the file is NOT 'live', the Phant server is on a different network).

/A

## Quick R Trick: Lists and Data Frames

October 24th, 2014

I didn't know something better to call this, but you can use lists with data frames as variables to hold multiple field names:

MIN=c('NAICS21','NAICS22','NAICS23')

 temp=maz[,MIN] #temp is now a data frame of all rows with just NAICS21, NAICS22, and NAICS23 

temp=maz[,c("TAZ",MIN)] #temp is now a data frame of all rows with just TAZ, NAICS21, NAICS22, and NAICS23

This can be incredibly useful in many situations (moving ES202 data to model employment categories is one of many).

## DOS Commands You Should Know: FINDSTR

February 4th, 2014

The last time I talked about DOS, it was FIND.  Find is great for certain uses, but not for others... like when you need to search for a string through a lot of files in many subfolders.

In my case, I wanted to look for where I've used DELIMTER in a Cube script.  I tried Microsoft's example, and it doesn't work (and their comment box doesn't work with Chrome, so there's that, too).

This is a two step process.  The first is easy, and it uses a very basic DOS command: dir.

dir *.s /a/b >filelist

This creates a list of files to search in the current folder.  The list will include the full path.

The second command is actually three-in-one:

echo off & for /F "tokens=*" %A in (filelist) do findstr /i /m "DELIMITER" "%A"

The first part of this is "echo off".  This turns off the command prompt every time (else, you'll see every findstr command).

The second part is the for... do loop.  This basically says "for each line in the file" and stores it (temporarily) as %A.

The third part is the findstr command.  The i switch turns off case sensitivity, and the m switch prints ONLY files that match.  I'm searching for DELIMITER (not case sensitive, of course).  The "%A" is the file to search, being passed along from the for...do loop.  This is in quotes because there are spaces in some of my path names, and without the quotes, the command would fail when a space is encountered because it would think it is the end of input.

This is useful if you're like me and have 1,563,169 lines of script file in your model folder!

BONUS TIP!

I found the number of lines using gawk wrapped in the same process:

echo off & for /F "tokens=*" %A in (filelist) do gawk 'END{print NR}' "%A" >> filelen

This gave me a long list of numbers that I brought into Excel to get the sum.

In the gawk command, 'END{print NR}' means to print the number of records (by default, lines) at the end of looking through the file.  "%A" is the file to check (just like in the findstr command).  The >>filelen APPENDS the output to a file called filelen.  It is important to use the append here because the command runs on each loop.  If a single > is used, only the final number of lines is placed in the file.

## Blogging in Transportation

January 28th, 2014

A few people have approached me about starting a blog.  Sometimes it is about transportation, sometimes it isn't.  There are already a lot of guides out there, although many seem to assume you want to be a full-time blogger.  The people I talk to do not want to be full-time bloggers, and I think some approaches may be a little different.

This post is in three parts - Getting Started, Moving Forward, and Final Words.  The Getting Started part discusses the software and extensions used on my blogs (I run two blogs).  Moving Forward discusses a variety of things to keep your blog interesting and also about maintaining the blog.  The Final Words section is where I discuss many of the little things (and probably where the full-time bloggers will differ from me).

This is not intended to be a how-to.  I've tried to write this in a way that someone could get familiar with WordPress and use this as a guide without the "click here, click here" stuff.  If you can't find something search the Internet.  If you still can't find it, either drop a comment, tweet me (@okiAndrew), contact me via Google+, use the contact form, or drop me an email if you have one of my email addresses.

# Getting Started

I firmly advocate using WordPress, but my position is because I've used it for several years with ZERO issues.  I'm fairly certain that people using Blogger (Google's Blog Engine) and TypePad can say the same thing.  Wordpress comes in two "flavors" - wordpress.org is the open source blogging software for use on your own server.  In my case, my blog is hosted via BlueHost (and I pay for this service).  Wordpress.com is a hosted free (I think) version of wordpress.  I don't know what compromises you have to make for free, but nothing is truly free (however, allowing ads on a wordpress.com site may be acceptable).

Setting up WordPress on your own server or for a hosting service is simple.

Once installed, the first things I would do:

1. Rename the admin account from "admin" or "administrator" to something less common.  This reduces your likelihood for an attack from those that would like to turn your blog into a spam center (trust me, they exist).  Additionally, make sure your passwords are long, have numbers, letters (capital and lowercase) and have a symbol (or a few).
2. Setup Plugins:
1. Setup Akismet (anti-spam).
2. Setup Jetpack add-ons
3. Add and setup Login Security Solution
4. Add and setup WordPress Database Backup.  I have backups emailed to me on a daily basis.
5. Add WordPress Editorial Calendar (you'll see why later)
7. Add and setup WP Super Cache (this speeds up your site considerably)
8. Add Shadowbox JS (and if wanted, Shadowbox JS - Use Title from Image) - this has images come up on the page as opposed to separate "pages" when clicked.
9. Optional: add Syntax Highligher and Code Prettifier Plugin for WordPress - this is a code tool.  I actually no longer use it because I've been using Github Gists.
10. Optional: add Embed Github Gist - since I use Github Gists, this makes the experience much easier!  If you're not going to show code on your site, you can ignore this.
11. Optional: add LaTeX for WordPress - I've used LaTeX only once, but it makes equations so much nicer.  If you're not going to show equations, ignore this.
12. Optional: Setup Google Analytics and add Google Analyticator - this is one of many places where you can track 'hits'.
13. Don't setup social sharing yet!  You'll see why.
3. Either trash the first test post and comment and write your first post and page, or revise the first test post to something more substantial.
1. In WordPress, posts are the weekly (or daily, monthly, etc).  Pages are items that generally don't change.  I have four pages, and one of them is hidden.
• If you look somewhere on my site, you can see a heading for Pages, and under it are links to Contact Me, Travel Demand Modeling 101, and Welcome!.  My feelings are that the Welcome and Contact pages are pretty important, and an about page is pretty important as well (maybe one day I'll write one).
• The posts on my site are the front-and-center content
• I've seen a lot of bloggers claim that you should have a privacy policy and a comment policy.  Truthfully, you don't need them unless you have a lot of visits (over a few hundred, at least), and if you're in that arena, you should probably be looking for professional blogger help (I am absolutely serious about that).
2. You definitely want at least one post and one page before moving on to the next item.  The post can be just a test post, but I would do more words than the basic test post that comes with WordPress.
4. Find a better theme!
• In the Appearance area, you can find more themes.  Many can be customized, or you can make your own.
• Find a theme that suits you.  Many themes can be adjusted, and I would encourage you to tweak it a lot and make sure everything looks good before settling on one
• Making a custom theme is not the easiest thing to do.  I've done it once (for this blog) and I'm tempted to do it again (for my other blog), but there is so much that goes into it that I don't really advocate it.
5. Categories
1. Setup some categories.  This is important in the permalinks structure, but can have some importance elsewhere.
6. Fix Permanent Links
1. In the WordPress Dashboard - Settings - Permalinks, change the default (e.g. www.siliconcreek.net/?p=123) to a custom structure of /%category%/%postname% .  This works best for search engine optimization (SEO).  This is one of the few things I do for SEO.
7. Setup Social Sharing
1. If you're going to be a blogger, you generally should have a few things already:
• LinkedIn (IF your blog is professional related)
• Facebook (IF your blog is personal)
2. You want to setup sharing to automatically post new posts to the correct platforms.  Keep in mind that the "other" content of your social media should relate to the blog and vice-versa... What I mean by this is that I don't post amateur radio stuff to my @okiAndrew twitter feed or to LinkedIn and likewise, I don't post transportation stuff to my @KE8P twitter account.  Different accounts for different uses.  I don't advocate mixing too much (there are links between the two, but I expect users that are interested in the "two me's" to deal with the social media side on their end).
3. If you think the social media side isn't important, think again.  Over the past 6 months, most of the tracked referrers was from LinkedIn and Twitter (the t.co referrer) - both around equal shares.  They were ~4 times the third-place referrer.

Once these items are completed, you're ready to move forward!

# Moving Forward

So moving forward obviously means "write content!".  And that's something you should do.

I advocate publishing on a weekly basis.  This generally doesn't mean that you have to write weekly.  For example, I'm writing this a week before it is scheduled to publish.  Also, I'm not God - you can post daily, monthly, or irregularly.  It APPEARS (key word!) that regular posts are better than irregular, but for niche blogs (like transportation modeling and amateur radio), it doesn't matter as much.

"Marketing" your blog is important.  If you're like me (read: not a pro), marketing is about professional clout as opposed to money.  It occasionally gets help (if you want to see an example, there is a LinkedIn thread where Roger Witte sent me to some pretty useful information).  Generally, social sharing is the best marketing you can do without spending lots of time marketing.  No, it isn't perfect (look how many DOTs block Twitter and Linked In).  I don't advocate posting to listserves, either (unless it is relevant to answer a question or the post is a how-to to help people do something that is difficult).

Make sure you use tags in your posts.  It helps to be able to send someone to www.siliconcreek.net/tags/cube-voyager-c++ as opposed to a list of links.  It also helps find posts (in the post listing, you can click on the individual tags and see all posts with that tag).

Be wary of where you link to your blog, particularly in other blog's comments. I occasionally comment on other blogs, but my rule of thumb is that if you're going to be a troll, you probably don't want to link to your blog.  OTOH, if you post a comment that enhances value (is constructive, positive, a question, etc), then linking to your blog is a good thing.  If you do blogging of a controversial nature, I would be a lot more cautious, as the rules can be very blurry.

# Final Words

There will be a lot of things you'll start to see.  One is occasional marketing for SEO firms (you'll see this both as spam email and blog comments).  In certain worlds, it may make sense to use these services.  Truthfully, in my world it does not.  The most SEO you can do is setting up good categories, using social media, providing an RSS feed, and occasionally (and appropriately) pushing your blog via other (generally social) means.

Proofread your posts.  Don't ask how many times I didn't do that only to find a typo a few days later.  This post was written over a week early and I came back to it a few days later to proofread and clarify things.

Preview your posts, check them when they go live.

Don't obsess over numbers.  Obsess over content (if your content is numbers, it is okay to ignore that part about obsessing over numbers).  And mostly, go with your gut instincts.  The reason? Story time!

I recently got an email from WordPress that has my "year in review".  It wasn't the best, according to them.  three of the top 5 posts were from years past.  However, a few people had mentioned my blog in passing before that, and I had received some interaction from people based on my blog.  Later, at TRB, a handful of people that I respect A LOT mentioned my blog.  I've had more comments via LinkedIn than... ever (that might have something to do with starting to post to LinkedIn this year 🙂 ).  My gut instinct was that the blog was better this year, and those friends at TRB confirmed that.

Always update WordPress when it wants to.  I'm going to use Michel Bierlaire's quote: "In all non-trivial software, there is at least one bug".  Software development is hard work, and it is really hard when the software can be used (and abused) by anyone.  When WordPress finds security issues (and bugs, too), they fix them and issue updates.  YOU WANT THESE UPDATES.

Finally, keep your best year ahead of you.

## Gravity Model Calibration in R (Example)

December 3rd, 2013

Calibrating a gravity model for the first time is difficult.  I stumbled upon a webpage from a professor at Ohio State that really helps.  One thing I like to do is actually do the examples because I can ensure that my code works and it gives me some ideas.

The code to replicate Dr. Viton's work is below.

Obviously this runs really quick, since it is only three zones.  It is nice that R has some matrix tricks that help make it easy.  One thing to note is that running this for a large matrix in R takes forever (at least the way this is written).  It is possible to run it parallel across all processors, but it still takes forever.

## DOS Commands You Should Know: FIND

November 26th, 2013

Recently, I stumbled upon a problem in my new mode choice and distribution code - I was setting unavailable modes to -9999 to ensure that there was no chance of the model to choose an unavailable mode.  I found later that using that value was a bit extreme and I should be using something like -15 (and the difference causes wild logsum values).

After changing these values in 10 scripts, I wanted to ensure that ALL were changed so I didn't end up running them and finding that I had to wait another 15 minutes after finding an error (or worse, not immediately finding the error!).

So, I used the FIND command in DOS.

All of my distribution files begin with 25 and end with .S, so I used:

find "=-9999" 25*.S"

Missed a few in these files. The filename is listed there so I can go to it and fix it.

Missed a bunch in this file. This is why I checked 🙂

## #TRBAM Twitter Data Mining Project

November 5th, 2013

I have been interested in playing around with twitter as a data mining resource.  Today, I happened to stumble upon an article in Getting Genetics Done that talks about just that (just with a different conference).

I looked into their script, and it points to a twitter command line program called t.

That and a little bit of shell scripting gave me something I could run to get the tweets in the last 10 minutes:

What this means is that I can get tweets in CSV for the last 10 minutes.  This can easily be run via cron:

*/10 * * * * sh /root/trbam_tweets/searchTweets.sh >/var/www/tstat.txt 2>&1

I have the output redirected to somewhere I'll be able to see from DC, as I don't know how my access will be or how much I'll be able to do prior to then.  I will make the data available to other researchers since it is all public tweets... That being said, if I (@okiAndrew) follow you on twitter and you've made your timeline private, contact me if you're concerned (or don't use "#trbam").  I don't specifically know if protected tweets would show up in the search - I DO have to be authenticated with Twitter, though.

Duplicates and Misses

I am going to write some code (whenever I get some spare time) to import the CSV files into mySQL or couchDB or something.  This will allow me to use the twitter ID as a way to test for and remove (or not import) duplicates.

As far as misses are concerned, that's just life.  This script is being fired off every 10 minutes - there are 144 files from each day, there's 71 days left until the annual meeting starts at the time of me typing this, and TRBAM lasts for 5 days... so that's about 11,000 files (plus more because people will still talk about it afterwards).  I'm not sure anyone has a count of how many tweets from last year (and I'm not going looking), and Twitter's API may decide to hate me during this.

# Where is this Going?

Many of the charts in the first referenced article are great charts that can easily be done in R.  I'll have a few more to add, I'm sure, and as soon as others get their hands on the data, there will be many more.  I also will possibly use Hadoop (or something) to do some text analysis.

Another place this will be going is #ESRIUC.  I've submitted an abstract for their conference.  I don't know if I'm going, but whether I do or not is a moot point - there's usually some good stuff there.

## NFC Tag Differences Part Deux: Nexus 7 (2013)

August 2nd, 2013

I posted a while back about the differences in two NFC tags that I have.  I've since got a Nexus 7 and started wanting to use some of my tags with it.  I've ran into a few other differences.  I was aware of the issue, but didn't think it mattered unless I was doing something wild.  Apparently not.

The first tag is a Tags for Droid tag.  In the second picture, a screen shot from my N7, there is an error: that tag type is not supported.

If you've read the other post, you can already see where this is going.

The second tag is from Tagstand (full disclosure: Tagstand sent me two free tags in a promo a while back - this had nothing to do with my blog, I just asked for them, like many others).

These tags are different and they work.  They're also thinner.

Final word: the thick tags DO work on my Galaxy Nexus phone, so there still is a use.  They're not bad, just not compatible with my Nexus 7.

## Fixing Asterisks in Number Fields in a DBF

July 22nd, 2013

Somehow, I have a table with 77,000 records and in some cases some of the data in number fields came out to be asterisks. I've tried all manner of selecting these records to change the data to 0 (which would be an indicator that there is no valid data for that field), but nothing seems to work.

Note the asterisks in several of the fields, including dep_time, arv_time, trip_dur, O_Longtitude, and O_latitude.

So I tried a few things.  One thing that works on SOME fields is VAL(STR(Field)).  Note that image below.

Code:SELECT dep_time, STR(dep_time), ISDIGIT(STR(dep_time)),VAL(STR(dep_time)) FROM trip

Note the departure times. They don't change across the fields, but the ISDIGIT function is useless for this.

I tried that with a decimal field and it didn't work off the bat (it truncated the decimals completely...or maybe it didn't, but it looks like it did).  So changed the string functions to "STR(O_Latitude,12,8)" (which matches the field spec).  It gave me two decimal places, but I want more, so I found the SET DECIMALS TO command that fixed it.

Code: SELECT O_Latitude, STR(O_Latitude) as str_fn, ISDIGIT(STR(O_Latitude)) as dig_str_fn,VAL(STR(O_Latitude)) as val_str FROM trip

Ummm.... Where are my decimals!?

Code: SELECT O_Latitude, STR(O_Latitude,12,8) as str_fn, ISDIGIT(STR(O_Latitude,12,8)) as dig_str_fn,VAL(STR(O_Latitude,12,8)) as val_str FROM trip

Two decimals!  Progress!

From:
SET DECIMALS TO 8 SELECT O_Latitude, STR(O_Latitude,12,8) as str_fn, ISDIGIT(STR(O_Latitude,12,8)) as dig_str_fn,VAL(STR(O_Latitude,12,8)) as val_str FROM trip

Finally!

From this I was able to write an update SQL query to fix the asterisk problem.

## New Project In the Works

May 31st, 2013

I haven't worked in R in several days because I've been working on a new project that will assist with getting good transit speed curves from highway data.  The project is in Java and is on my Github page.

I'm working on making part of it multi-threaded, which is new to me.

A second project that is still in my mind (and I'm not sure if this will become part of this one or another separate project) will be to use transit GPS traces to get good trip length frequencies on transit.

Stay tuned!

## New Series on R in Transportation Modeling [Updated 10 October 2013]

May 17th, 2013

I've been doing a lot of statistical stuff over the past several weeks, and I think it is worth some value to the Interwebs if I try and post some of it.  I'm considering making it a course of some sort with some scrubbed HHTS data (no, I can't post real peoples' locations and names, I think I might get in a little bit of trouble for that).

The "syllabus" is roughly something like this (last update: 10 October 2013):

1. Intro to R: getting data in, making summaries
2. Trip rates - Linear and Non-linear modeling 6/7/13
3. Mode Choice Estimation in R 6/14/13
4. Trip rates - Averages 9/13/13
5. Complex Mode Choice Estimation in Biogeme <-Coming in two weeks or less!
6. Distribution Friction Factors
7. Distribution K Factors
8. Outputs and Graphics

I can't guarantee that these will be the next eight weeks worth of posts - there will probably be some weeks with a different post, since I don't know if I can get all this stuff done in six weeks, even with the head start I have.

In other news...

I've been disabling comments on these short posts that really don't warrant any sort of responses.  I've been getting a lot of spam on this blog, so I'm cutting it down where I can.  This is not a global thing, I've been turning them off on certain posts, but the default is on.

## TRB Applications Conference Mobile Website

May 3rd, 2013

For those going to the TRB Transportation Planning Applications Conference in Columbus, Ohio next week (May 5-9), I've released a very simple mobile website for it.  I have part of an API designed into the site, and I intend to continue that with the next Applications Conference, as I want to see a mobile/tablet app happen.  I can make some Android platform stuff happen, but I have no iPhone development experience nor do I have an iDevice to do that on.

In addition, I'd love to see people that tweet during the conference to use the hashtag #TRBAppCon.  I will be tweeting (sometimes) and taking some pictures during the conference.  My twitter handle is @okiAndrew.

Next up...

The day I'm writing this (I generally schedule posts anywhere from a day to 2 weeks in advance), I read the sad news that Astrid was acquired by Yahoo!.  I'm no fan of Yahoo!, in fact, I'm quite shocked they're still in business.  I see this as the end of Astrid, so my next post will likely be about my solution to this problem.

## Prepping my Computer (for a conference, but that part doesn’t matter)

April 12th, 2013

Update July 24, 2014: I'm using these exact directions with Linux Mint, which is my current preferred Linux Distro.

Note: I thought I posted this last January, but it appears I didn't.

This post could be re-titled "Why I Love Linux" because it requires Linux.

Like many other transportation geeks, I'm getting ready to go to this little conference in Washington, DC.  I've been getting things together because I found out a few years ago that being stuck in DC with problematic technology (like a bad cell phone battery) is no fun.  And to top it all off, my laptop feels like it has a failing hard drive.

So I booted into Ubuntu and used Disk Utility to check the SMART status via disk utility.  Which claims everything is fine.

Still, though, I didn't receive any disk with my laptop (it instead has a rescue partition) and my intuition disagrees with what my disk drive thinks of itself, so I decided the smart thing to do would be to arm myself with a few good USB flash drives.

The first USB flash drive is a live image of Ubuntu or Mint (or many other distros).

The second is my rescue partition image that can be restored to a new drive.  I got this by:

1. Getting an image file using the ntfsclone command:

sudo ntfsclone -o rescue.img /dev/sda4

Where /dev/sda4 is the Lenovo rescue partition (as indicated in Disk Utility)

1. Compress the rescue image

gzip rescue.img

1. Split the image into 1 GB bits

split -b 1024m rescue.img.gz

(note: steps 2 and 3 can be combined with gzip rescue.img |split -b 1024m

I then copied these to a USB flash drive.

## New Open Source ArcMap Tool Posted: Point Location Fixer

March 29th, 2013

I stumbled on a problem that seems to have no easy answer.  Working on the count stations layer here at the office, I found that we had a small number of points that weren't located in the GIS feature class, although we DO have X and Y coordinates for them.

Since searching on Google turned up nothing, I wrote my own solution.  Since I already had some Java code to look for selected features and get to the actual features, I copied that code into a new project and made a few modifications.  Those modifications are posted on Github.  Even better, I actually used a few comments in this one! 🙂

## New Website on Open Civic Hardware

January 23rd, 2013

I've started up a new blog that will hopefully be more maintained than this one: www.opencivichardware.org.  The idea of civic hardware came about from a presenter from Transportation Camp DC 2013.  Civic hardware are things created to help with a city (or state, or region).  It could be things like traffic counters, data loggers, tools to help with public involvement, or infrastructure.

The idea of this site is similar in nature to Hack-A-Day, but with a focus on civic hardware.  There will probably be a lot of things that can be cross-posted to both.  Additionally, look for things on this blog to be cross-posted there.

## NFC Tag Differences

January 16th, 2013

I've been playing around with NFC tags a lot lately.  I have one with my contact info ready to go to a conference with me, I have one on my gym bag to open Endomondo and Google Play Music.  I have one on my keychain that opens a note for me of things I deem important if I'm going somewhere (the note is in Evernote, so I can change it pretty easily).

I originally bought a pack of tags and a keychain from tagsfordroid.com through Amazon.  These tags are pretty beefy.  In using NFC Task Launcher, I posted a twitter update that ultimately earned me two free tags from tagstand.  I noticed theirs seems much thinner.

The differences are substantial, as illustrated in the image below.

The tagstand sticker is a normal sticker thickness.  The tagsfordroid.com sticker is much thicker.

The image below shows the entire group - the two tags from tagstand and a stack of tags from tagsfordroid and a set of a dozen decals to apply to the tags so you know what your tags do.

Disclaimers:

While the tags provided by tagstand were free, they do this for anyone that downloads the NFC Task Launcher app and posts a twitter update using the application.  They aren't aware I'm writing this, the tags were not provided to help write this, and I've not been offered any compensation for writing this.

I am not trying to show that one is better than the other.  Both tags work.  There are times one may want a thicker tag, and there are times that one may want a thinner tag.  The purpose of this post is to illustrate a difference between the two.

## Reloaded Kindle Fire with AOKP... fixed navbar issue

January 9th, 2013

I loaded my old Kindle Fire with AOKP.  This is awesome!

But...

So I was poking around in the ROM Settings and ultimately stumbled on a solution. The solution is to add a fourth button to the navbar, set it as the menu, and leave well enough alone.

As illustrated in these screenshots, the problem is solved:

That's it!

## Interesting INT() Issue Between Cube and Excel

July 24th, 2012

I don't know about anyone else, but I do a lot of calculation prototyping in Excel before applying that in scripts.  One of the most recent was to do a script to add expansion zones (also known as "dummy zones", although they aren't really dumb, just undeveloped!).

The problem I had was related to the following equation:

R=INT((819-N)/22)+1   Where N={820..906}

In Excel, the results are as below (click on it if it is too small to see):

In Cube, I got the result of (click on it to expand, and I only took it into Excel to move stuff around and make it easier to see):

Note the sheer number of zeroes in the Cube version and all the numbers are 'off'.

The reason, as I looked into things was because of how INT() works differently in the two platforms.  In Cube, INT simply removes everything to the right of the decimal, so INT(-0.05) = 0, and INT(-1.05)=-1.  In Excel, INT rounds down to the nearest integer.  This means that negative values will be different between the two platforms.  Note the table below.

 Excel Cube 3.4 3 3 2.3 2 2 1.1 3 1 0.5 0 0 0 0 0 -0.5 -1 0 -1.1 -2 -1 -2.3 -3 -2 -3.4 -4 -3

While neither software is truly wrong in it's approach (there is no standard spec for INT()) it is important to know why things may not work as expected.

## What Have I Been Up To Lately?

July 23rd, 2012

I've been up to a few things that haven't made it to this blog.

First, I've done a few conversion tools for converting Tranplan/INET to Voyager PT and back again.  These are open-source tools that are meant to help, but they may not be perfect (and I don't have the time to make sure they do).  If anyone wants to upload fixes, you'll get credit for it (but you have to let me know, as I think I have to allow that in Github).

Next, I've been heavily working on QC of my transit on-board survey.  This has resulted in some more work being uploaded to Github.  I've written some to assist in trying to figure out what I need to actually look at and what is probably okay enough to ignore.

I've seen some stuff come out of the Census related to an API, and I did post some example code to the CTPP listserve to help.  Since I didn't want to bog down some people with my code, I put it in a Gist (which is below).

This code will get Census data using their API and chart it.  Note that you have to install PyGTK All-In-One to make it work.  Of course, mind the items that Krishnan Viswanathan posted to the Listserve - they help make sense of the data!

I'm also working on an ArcMap add-in that will help with QC-ing data that has multiple elements.  It is on Github, but currently unfinished.  This is something for advanced users.

I will have a few tips coming for some Cube things I've done recently, but those will be for another blog post.  With that, I will leave with the first publicly-available video I've ever posted to YouTube.  Of a traffic signal malfunction.  I'm sure Hollywood will start calling me to direct the next big movie any day now... 🙂

## Playing with Google Docs Scripts and Get Satisfaction

March 15th, 2012

Sometimes I do things that don't really have a point... yet. One of them was pulling some information from GetSatisfaction (GSFN) to a Google Docs Spreadsheet (GDS). GSFN has an API that returns everything in JSON, so writing script in a GDS to pull in that information is quite easy.

The first step is to create a spreadsheet in Google Docs.  This will act as a container for the data.

The second step is to create a script to parse the JSON output and put it in the spreadsheet.  An example of this, which is a script I used to only get the topic, date, and type of topic (question, idea, problem, or praise).  It's simple, and it can be expanded on.  But for the sake of example, here it is:

function fillGSFN() {
var r=1;
for(var page=89;page<200;page++){
var jsondata = UrlFetchApp.fetch("http://api.getsatisfaction.com/companies/{COMPANY}/topics.json?page="+page);
var object = Utilities.jsonParse(jsondata.getContentText());
var sheet=ss.getSheets()[0];

for(var i in object.data){
sheet.getRange(r, 1).setValue(object.data[i].subject);
sheet.getRange(r,2).setValue(object.data[i].created_at);
sheet.getRange(r,3).setValue(object.data[i].style);
r++;
}
if(i!="14") return 1; //This was not a full page
}
}


This script is still a work in progress, and there are better ways to consume a JSON feed, but for what I was doing, this was a nice quick-and-simple way to do it.

## Arduino Based Bluetooth Scanners

September 30th, 2011

This is a post about a work in progress...

If you're in the transportation field, you've likely heard of the Bluetooth Scanners that cost around $4,000 each. These devices scan MAC (Media Access Control) addresses and log them (with the time of the scan) and use that for travel time studies or for origin-destination studies. My question is, can we build something good-enough with an Arduino for much less money? Something like the concept below? There's reasons for everything: ### Arduino Controls it all and brings it together. Turns on the GPS, Bluetooth, listens to the stream of data from both, writes to the memory card. ### GPS The Arduino has no real-time clock (meaning that unless you tell it what time it is, it doesn't know!). The GPS signal includes time. It also includes position, which would be pretty useful. ### Bluetooth If we're going to scan for Bluetooth MAC addresses, something to receive them might come in handy... ### Something to Write To Scanning the addresses would be pretty pointless without storing the data. ## Initial Design /* Bluetooth Tracker Written by Andrew Rohne (arohne@oki.org) www.oki.org */ #include #include NewSoftSerial ol(10,11); char inByte; boolean ext=false; void setup(){ String btreturn; Serial.begin(115200); delay(1500); Serial.print("$");
delay(1000);

}

void loop(){
byte incomingByte=-1;
byte index=0;

while(Serial.available()>0){
index=0;
Serial.println("IN15");
delay(16500);
while(incomingByte>-1 && index<160){
index++;
}
Serial.end();
Serial.begin(115200);
}
}
if(Serial.available()<=0){
delay(1000);
Serial.begin(115200);
}

}

void writelog(String line)
{
ol.begin(9600);
ol.print(line);
ol.end();
}


## The Results

The program wrote about 5kb of text to the file before dying after 489986 milliseconds (8 minutes). I had left it on a windowsill overnight (the windowsill is literally about 15 feet from Fort Washington Way in Cincinnati, which is 6 lanes (see below for the range centered on roughly where the setup was located).

There were 9 unique Bluetooth MAC addresses scanned. During the 8 minutes, there were 25 groups of MAC addresses written to the file. 5 MAC addresses appeared in multiple groups, with 3 of the MAC addresses appearing in 24 of the groups (and they may have appeared in the last group, it appears to have been cut off). Those same 4 have been seen in earlier tests, too, so I don't know what's going on there.

## The Problems to Fix

Well, first there's the problem that I had let it run all night, and it only had 8 minutes of data. Something is causing the Arduino to stop writing or the OpenLog to stop operating.

In the output file, there are a few issues. First, some processing needs to be done, and second, it appears I am reading past the end of the serial buffer (if you look in the image below, you can see a lot of characters that look like a y with an umlaut).

In the code above, the IN15 command is sent to the Bluetooth Mate Gold, which tells it to inquire for 15 seconds, and then I delay for 16.5 seconds. This is because I THINK there is a delay after the scan finishes. I don't know how long that delay is. Vehicles traveling by at 65 MPH is 95.333 feet per second. Assuming I can get the Bluetooth device very close to the road, that 1.5 second gap SHOULD be okay, but if I have to go longer it could be a problem (the range of a Class 1 Bluetooth device is 313 feet, so a device can be scanned anytime in 626 feet (up to 313 feet before the Bluetooth Station and up to 313 feet after the Bluetooth station). A vehicle would be in range for about 6.6 seconds. However, the Bluetooth signal is at 2.4 - 2.485 Ghz, and is susceptible to some interference from the vehicle, driver, passengers, etc., so speed is key.

## Conclusion

I'm on the fence as to whether or not the Bluetooth Mate Gold is the right way to do this. I will still be doing some research to see if I can get better speed out of it, or if I need to look into a different receiver that can receive the 2.4 GHz area and look for MAC addresses and stream them to the Arduino.

I also need to get the GPS up and running. That is a different story altogether, as I have been trying on that and have not been successful (despite using code that works for my personal Arduino and GPS, although the model of GPS 'chip' is different.

## More Voyager PT + AWK Goodness

September 20th, 2011

One thing I've missed from the old TranPlan days was the reporting group.  We've used that for many years to compare our transit loadings by major corridor.  Unfortunately, that functionality was lost going to PT.  I still need it, though, and enter awk.

The script below looks at the transit line file and outputs ONLY the line code, comma-separated.  It uses a loop to check each field for ' NAME=' and 'USERN2', which is where we now store our reporting group codes.

BEGIN{
FS=","
RS="LINE"
}
{
for (i=1;i<20;i++)
{
if($i~/ NAME=/) { printf "%s,",substr($i,8,length($i)-8) } if($i~/USERN2/)
{
printf "%s\n",substr($i,9) } } }  The contents of the above need to be saved to a .awk file - I used trn.awk. To call this, I use a Pilot script to call awk and pass the input and get the output. *awk -f {CATALOG_DIR}/INPUTS/trn.awk {CATALOG_DIR}/INPUTS/OKIROUTES.LIN >{CATALOG_DIR}/OKIROUTES.CSV  The output of this is a simple two-column comma-separated-value file of the route ID and the reporting group. ## Using Gawk to get a SimpleTransit Loadings Table from Cube PT September 19th, 2011 One thing that I don't like about Cube is the transit loadings report is stuck in the big program print report. To pull this out, the following code works pretty well: gawk /'^REPORT LINES UserClass=Total'/,/'^Total '/ 63PTR00A.PRN >outputfile.txt Where 63PTR00A.PRN is the print file. Note the spaces after ^Total. For whatever reason, using the karat (the '^') isn't working to find 'Total' as the first thing on the line. So, I added the spaces so it gets everything. Outputfile.txt is where this will go. It will just be the table. NOTE: You need GNUWin32 installed to do this. ## Using GAWK to Get Through CTPP Data August 18th, 2011 The 3-year CTPP website lacks a little in usability (just try getting a county-county matrix out of it). One of the CTPP staff pointed me to the downloads, which are a double-edge sword. On one hand, you have a lot of data without an interface in the way. On the other hand, you have a lot of data. I found it was easiest to use GAWK to get through the data, and it was pretty easy: gawk '/.*COUNTY_CODE.*/' *.csv >Filename.txt Where COUNTY_CODE is the code from Pn-Labels-xx.txt where n is the part number (1,2, or 3) and xx is the state abbreviation. NOTE: Look up the county code EACH TIME. It changes among parts 1, 2, and 3. This command will go through all .csv files and output any line with the county code to the new file. ### UPDATE I have multiple counties to deal with. There's an easy way to start on getting a matrix: gawk '/C4300US.*(21037|21015|21117).*32100.*/' *.csv >TotalFlowsNKY.csv This results in a CSV table of only the total flows from three Northern Kentucky counties (21037, 21015, 21117; Campbell, Boone, and Kenton county, respectfully). For simplicity's sake, I didn't include all 11 that I used. ### Finishing Up Then, I did a little Excel magic to build a matrix for all 11 counties and externals. The formula is shown. I have an additional sheet which is basically a cross reference of the county FIPS codes to the name abbreviations I'm using. See the image below (click for a larger version). After this, I built a matrix in Excel. The matrix uses array summation (when you build this formula, you press CTRL+Enter to set it up right, else the returned value will be 0). Using these techniques, I was able to get a journey to work matrix fairly quickly and without a lot of manual labor. NOTE You need to have GNUWin32 installed to use gawk. ## Using gawk to Get PT Unassigned Trips Output into a Matrix July 15th, 2011 In the process of quality-control checking a transit on-board survey, one task that has been routinely mentioned on things like TMIP webinars is to assign your transit trip-table from your transit on-board survey. This serves two purposes - to check the survey and to check the transit network. PT (and TranPlan's LOAD TRANSIT NETWORK, and probably TRNBUILD, too) will attempt to assign all trips. Trips that are not assigned are output into the print file. In PT (what this post will focus on), will output a line similar to this:  W(742): 1 Trips for I=211 to J=277, but no path for UserClass 1.  When a transit path is not found. With a transit on-board survey, there may be a lot of these. Therefore, less time spent writing code to parse them, the better. To get this to a file that is easier to parse, start with your transit script, and add the following line near the top:  GLOBAL PAGEHEIGHT=32767  This removes the page headers. I had originally tried this with page headers in the print file, but it created problems. Really, you probably won't print this anyway, so removing the page headers is probably a Godsend to you! Then, open a command line, and type the following: gawk '/(W.*)\./ {print$2,$5,$7}' TCPTR00A.PRN >UnassignedTransitTrips.PRN


Note that TCPTR00A.PRN is the transit assignment step print file, and UnassignedTransitTrips.PRN is the destination file. The {print $2,$5,\$7} tells gawk to print the second, fifth, and seventh columns. Gawk figures out the columns itself based on spaces in the lines. The >UnassignedTransitTrips.PRN directs the output to that file, instead of listing it on the screen.

The UnassignedTransitTrips.PRN file should include something like:

 1 I=3 J=285, 1 I=3 J=289, 1 I=3 J=292, 1 I=6 J=227, 1 I=7 J=1275, 

The first column is the number of unassigned trips, the second column is the I zone, and the last column is the J zone.

This file can then be brought into two Matrix steps to move it to a matrix. The first step should include the following code:

RUN PGM=MATRIX PRNFILE="S:\USER\ROHNE\PROJECTS\TRANSIT OB SURVEY\TRAVELMODEL\MODEL\TCMAT00A.PRN"
FILEO RECO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF",
FIELDS=IZ,JZ,V
FILEI RECI = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\UnassignedTransitTrips.PRN"

RO.V=RECI.NFIELD[1]
RO.IZ=SUBSTR(RECI.CFIELD[2],3,STRLEN(RECI.CFIELD[2])-2)
RO.JZ=SUBSTR(RECI.CFIELD[3],3,STRLEN(RECI.CFIELD[3])-2)
WRITE RECO=1

ENDRUN


This first step parses the I=, J=, and comma out of the file and inserts the I, J, and number of trips into a DBF file. This is naturally sorted by I then J because of the way PT works and because I am only using one user class in this case.

The second Matrix step is below:

RUN PGM=MATRIX
FILEO MATO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.MAT" MO=1
FILEI MATI[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF" PATTERN=IJM:V FIELDS=IZ,JZ,0,V

PAR ZONES=2425

MW[1]=MI.1.1
ENDRUN


This step simply reads the DBF file and puts it into a matrix.

At this point, you can easily draw desire lines to show the unassigned survey trips. Hopefully it looks better than mine!

## Getting the 2nd Line through the Last Line of a File

June 24th, 2011

One recent work task involved compiling 244 CSV traffic count files and analyzing the data.

I didn't want to write any sort of program to import the data into Access or FoxPro, and I didn't want to mess with it (since it would be big) in Excel or Notepad++.

So, I took the first of the 244 files and named it CountData.csv. The remaining files all begin with 'fifteen_min' and they are isolated in their own folder with no subfolders.

Enter Windows PowerShell really powered up with GNUWin.

One command:
awk 'NR==2,NR<2' .\f*.csv >> CountData.csv

awk is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports (source: Wikipedia).

The first argument, NR==2 means start on record #2, or the second line in the file.
The second argument, NR<2, means end on the record less than 2. In this case, it always returns false, and thus the remainder of the file is output. The .\f*.csv means any file in this folder where the first letter is f and the last 4 letters are .csv (and anything goes between them). The '>> CountData.csv' means to append to CountData.csv

Once I started this process, it ran for a good 45 minutes and created a really big file (about 420 MB).

After all this, I saw a bunch of "NUL" characters in Notepad++, roughly one every-other-letter, and it looked like the data was there (just separated by "NUL" characters).  I had to find and replace "\x00" with blank (searching as Regular Expression).  That took a while.

Acknowledgements:

The Linux Commando.  His post ultimately helped me put two and two together to do what I needed to do.

Security 102.  The NUL thing.

## Adding a Search Engine in Chrome to Track UPS Shipments

December 22nd, 2010

One of the cool features of the Google Chrome Browser is the ability to add search engines and search them from the address bar. This tip builds on that capability to track UPS shipments based on their UPS Tracking Number.

The first step is to go to the options menu by clicking on the wrench icon and going to Options:

The second step is to go to the Basics tab (or on Mac, click on the Basics icon)

The third step is to add the search engine.  On Windows, click Add, and then fill out the resulting form, on OS X, click the '+' button and do the same.

Windows Form:

The following are the items for the form:

Name: UPS

Keyword: UPS

URL: http://wwwapps.ups.com/WebTracking/processInputRequest?sort_by=status&tracknums_displ ayed=1&TypeOfInquiryNumber=T&loc=en_US&InquiryNumber1=%s&track.x=0&track.y=0

NOTE: The entire URL above should be one line with no spaces!

Click OK on everything (or in some cases, the red circle on OS X).  To use this, open Chrome, type 'ups' in the address bar and press Tab and enter the tracking number (copy-paste works well for this).

Once you press Enter, you will immediately go to the UPS website showing your tracking information.  In this case, my shipment won't make it by Christmas.  Oh well.

October 20th, 2010

While looking into backing up my Google Docs, I realized that GoogleCL is not backing up drawings.

The first fix is in the try block on line 51.
was:

from gdata.docs.data import DOCUMENT_LABEL, SPREADSHEET_LABEL, \
PRESENTATION_LABEL, FOLDER_LABEL, PDF_LABEL

To:

from gdata.docs.data import DOCUMENT_LABEL, SPREADSHEET_LABEL, \
PRESENTATION_LABEL, FOLDER_LABEL, PDF_LABEL, DRAWING_LABEL

Then, beginning on 52 (the except ImportError block), it should include DRAWING_LABEL = 'drawing' as below:

except ImportError:
DOCUMENT_LABEL = 'document'
PRESENTATION_LABEL = 'presentation'
DRAWING_LABEL = 'drawing'
FOLDER_LABEL = 'folder'
PDF_LABEL = 'pdf'

Then, on line371, the following needs to be added before the 'else':
except ImportError:

elif doctype_label == DRAWING_LABEL:
return googlecl.CONFIG.get(SECTION_HEADER, 'drawing_format')

Finally, in your .googlecl file (mine is under my "profile drive" because of our network settings, your mileage likely will vary, so you'll have to search for it), open config in any text editor and add the following in the [DOCS] section:

drawing_format = png

Note: while you're at it, you might want to change document_format = txt to document_format = doc

That's it. Now if you run 'google docs get .* ./backup', you get the drawings as well.