R Quick-Take: Reading a Ton of Files in a Few Lines

December 1st, 2014

I just downloaded 2,159 traffic count files over the Internet. I’m going to have to work with these in various ways.

So the following quick snippet of code reads all of them into one data frame:

Loop Detectors!

August 16th, 2013

Since I haven’t done much of anything on this blog, I figured I’d add a few pictures of a loop detector site.  The Ohio Department of Transportation installed the loops, cabinet, and solar panel as part of a road project (thanks ODOT!), and I just installed the loop detector counter inside.

2013-08-15 17.05.17

This is the loop detector cabinet. The wires on the upper-right are the loop lead-in wires, the big grey box below them is a battery, the small black box in the upper-left is a solar voltage regulator, and the circuit boards below that are mystery boards.

2013-08-15 17.09.45

These are a mystery to me. There is two, one is powered, one is not.

Getting the 2nd Line through the Last Line of a File

June 24th, 2011

One recent work task involved compiling 244 CSV traffic count files and analyzing the data.

I didn’t want to write any sort of program to import the data into Access or FoxPro, and I didn’t want to mess with it (since it would be big) in Excel or Notepad++.

So, I took the first of the 244 files and named it CountData.csv. The remaining files all begin with ‘fifteen_min’ and they are isolated in their own folder with no subfolders.

Enter Windows PowerShell really powered up with GNUWin.

One command:
awk 'NR==2,NR<2' .\f*.csv >> CountData.csv

awk is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports (source: Wikipedia).

The first argument, NR==2 means start on record #2, or the second line in the file.
The second argument, NR<2, means end on the record less than 2. In this case, it always returns false, and thus the remainder of the file is output. The .\f*.csv means any file in this folder where the first letter is f and the last 4 letters are .csv (and anything goes between them). The ‘>> CountData.csv’ means to append to CountData.csv

Once I started this process, it ran for a good 45 minutes and created a really big file (about 420 MB).

After all this, I saw a bunch of “NUL” characters in Notepad++, roughly one every-other-letter, and it looked like the data was there (just separated by “NUL” characters).  I had to find and replace “\x00” with blank (searching as Regular Expression).  That took a while.

Acknowledgements:

The Linux Commando.  His post ultimately helped me put two and two together to do what I needed to do.

Security 102.  The NUL thing.