DOS Commands You Should Know: FINDSTR

February 4th, 2014

The last time I talked about DOS, it was FIND.  Find is great for certain uses, but not for others... like when you need to search for a string through a lot of files in many subfolders.

In my case, I wanted to look for where I've used DELIMTER in a Cube script.  I tried Microsoft's example, and it doesn't work (and their comment box doesn't work with Chrome, so there's that, too).

This is a two step process.  The first is easy, and it uses a very basic DOS command: dir.

dir *.s /a/b >filelist

This creates a list of files to search in the current folder.  The list will include the full path.

The second command is actually three-in-one:

echo off & for /F "tokens=*" %A in (filelist) do findstr /i /m "DELIMITER" "%A"

The first part of this is "echo off".  This turns off the command prompt every time (else, you'll see every findstr command).

The second part is the for... do loop.  This basically says "for each line in the file" and stores it (temporarily) as %A.

The third part is the findstr command.  The i switch turns off case sensitivity, and the m switch prints ONLY files that match.  I'm searching for DELIMITER (not case sensitive, of course).  The "%A" is the file to search, being passed along from the for...do loop.  This is in quotes because there are spaces in some of my path names, and without the quotes, the command would fail when a space is encountered because it would think it is the end of input.

This is useful if you're like me and have 1,563,169 lines of script file in your model folder!

BONUS TIP!

I found the number of lines using gawk wrapped in the same process:

echo off & for /F "tokens=*" %A in (filelist) do gawk 'END{print NR}' "%A" >> filelen

This gave me a long list of numbers that I brought into Excel to get the sum.

In the gawk command, 'END{print NR}' means to print the number of records (by default, lines) at the end of looking through the file.  "%A" is the file to check (just like in the findstr command).  The >>filelen APPENDS the output to a file called filelen.  It is important to use the append here because the command runs on each loop.  If a single > is used, only the final number of lines is placed in the file.

Taking CSV Exported Cube Voyager Path Files to A New Level Using GAWK (part 1)

January 30th, 2013

In a prior post, I link to some code that outputs a path file.  I've done something a tad different because I needed some select link analysis and reading the path file in Cube was taking far too long to do it the normal way.

So, I took that program on Github and extended it to perform a selected link:

And this outputs a few GB of paths in CSV format.  I went from 42 GB of paths in the AM to 3.4 GB of CSV paths.  Still not good enough. The next thing I did was use GAWK to get just the Origin and Destination

This returns a CSV file of just the origin and destination (which can be linked to the vehicle trip matrix).

Part 2 will discuss how to link to a vehicle trip matrix and if this approach actually works!

More Voyager PT + AWK Goodness

September 20th, 2011

One thing I've missed from the old TranPlan days was the reporting group.  We've used that for many years to compare our transit loadings by major corridor.  Unfortunately, that functionality was lost going to PT.  I still need it, though, and enter awk.

The script below looks at the transit line file and outputs ONLY the line code, comma-separated.  It uses a loop to check each field for ' NAME=' and 'USERN2', which is where we now store our reporting group codes.

BEGIN{
FS=","
RS="LINE"
}
{
	for (i=1;i<20;i++)
	{
		if($i~/ NAME=/)
		{
			printf "%s,",substr($i,8,length($i)-8)
		}
		if($i~/USERN2/)
		{
			printf "%s\n",substr($i,9)
		}
	}
}

The contents of the above need to be saved to a .awk file - I used trn.awk.

To call this, I use a Pilot script to call awk and pass the input and get the output.

*awk -f {CATALOG_DIR}/INPUTS/trn.awk {CATALOG_DIR}/INPUTS/OKIROUTES.LIN >{CATALOG_DIR}/OKIROUTES.CSV

The output of this is a simple two-column comma-separated-value file of the route ID and the reporting group.

Using Gawk to get a SimpleTransit Loadings Table from Cube PT

September 19th, 2011

One thing that I don't like about Cube is the transit loadings report is stuck in the big program print report.  To pull this out, the following code works pretty well:

gawk /'^REPORT LINES  UserClass=Total'/,/'^Total     '/ 63PTR00A.PRN >outputfile.txt

Where 63PTR00A.PRN is the print file. Note the spaces after ^Total. For whatever reason, using the karat (the '^') isn't working to find 'Total' as the first thing on the line. So, I added the spaces so it gets everything. Outputfile.txt is where this will go. It will just be the table.

NOTE: You need GNUWin32 installed to do this.

Using GAWK to Get Through CTPP Data

August 18th, 2011

The 3-year CTPP website lacks a little in usability (just try getting a county-county matrix out of it).

One of the CTPP staff pointed me to the downloads, which are a double-edge sword. On one hand, you have a lot of data without an interface in the way. On the other hand, you have a lot of data.

I found it was easiest to use GAWK to get through the data, and it was pretty easy:

gawk '/.*COUNTY_CODE.*/' *.csv >Filename.txt

Where COUNTY_CODE is the code from Pn-Labels-xx.txt where n is the part number (1,2, or 3) and xx is the state abbreviation.

NOTE: Look up the county code EACH TIME.  It changes among parts 1, 2, and 3.

This command will go through all .csv files and output any line with the county code to the new file.

UPDATE

I have multiple counties to deal with.  There's an easy way to start on getting a matrix:

gawk '/C4300US.*(21037|21015|21117).*32100.*/' *.csv >TotalFlowsNKY.csv

This results in a CSV table of only the total flows from three Northern Kentucky counties (21037, 21015, 21117; Campbell, Boone, and Kenton county, respectfully).  For simplicity's sake, I didn't include all 11 that I used.

Finishing Up

Then, I did a little Excel magic to build a matrix for all 11 counties and externals.  The formula is shown.  I have an additional sheet which is basically a cross reference of the county FIPS codes to the name abbreviations I'm using.  See the image below (click for a larger version).

After this, I built a matrix in Excel.  The matrix uses array summation (when you build this formula, you press CTRL+Enter to set it up right, else the returned value will be 0).

Using these techniques, I was able to get a journey to work matrix fairly quickly and without a lot of manual labor.

NOTE

You need to have GNUWin32 installed to use gawk.

 

 

 

Using gawk to Get PT Unassigned Trips Output into a Matrix

July 15th, 2011

In the process of quality-control checking a transit on-board survey, one task that has been routinely mentioned on things like TMIP webinars is to assign your transit trip-table from your transit on-board survey.  This serves two purposes - to check the survey and to check the transit network.

PT (and TranPlan's LOAD TRANSIT NETWORK, and probably TRNBUILD, too) will attempt to assign all trips.  Trips that are not assigned are output into the print file.  In PT (what this post will focus on), will output a line similar to this:


W(742): 1 Trips for I=211 to J=277, but no path for UserClass 1.

When a transit path is not found.  With a transit on-board survey, there may be a lot of these.  Therefore, less time spent writing code to parse them, the better.

To get this to a file that is easier to parse, start with your transit script, and add the following line near the top:


GLOBAL PAGEHEIGHT=32767

This removes the page headers. I had originally tried this with page headers in the print file, but it created problems. Really, you probably won't print this anyway, so removing the page headers is probably a Godsend to you!

Then, open a command line, and type the following:

gawk '/(W742.*)\./ {print $2,$5,$7}' TCPTR00A.PRN >UnassignedTransitTrips.PRN

Note that TCPTR00A.PRN is the transit assignment step print file, and UnassignedTransitTrips.PRN is the destination file. The {print $2,$5,$7} tells gawk to print the second, fifth, and seventh columns. Gawk figures out the columns itself based on spaces in the lines. The >UnassignedTransitTrips.PRN directs the output to that file, instead of listing it on the screen.

The UnassignedTransitTrips.PRN file should include something like:


1 I=3 J=285,
1 I=3 J=289,
1 I=3 J=292,
1 I=6 J=227,
1 I=7 J=1275,

The first column is the number of unassigned trips, the second column is the I zone, and the last column is the J zone.

This file can then be brought into two Matrix steps to move it to a matrix. The first step should include the following code:

RUN PGM=MATRIX PRNFILE="S:\USER\ROHNE\PROJECTS\TRANSIT OB SURVEY\TRAVELMODEL\MODEL\TCMAT00A.PRN"
FILEO RECO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF",
 FIELDS=IZ,JZ,V
FILEI RECI = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\UnassignedTransitTrips.PRN"

RO.V=RECI.NFIELD[1]
RO.IZ=SUBSTR(RECI.CFIELD[2],3,STRLEN(RECI.CFIELD[2])-2)
RO.JZ=SUBSTR(RECI.CFIELD[3],3,STRLEN(RECI.CFIELD[3])-2)
WRITE RECO=1

ENDRUN

This first step parses the I=, J=, and comma out of the file and inserts the I, J, and number of trips into a DBF file. This is naturally sorted by I then J because of the way PT works and because I am only using one user class in this case.

The second Matrix step is below:

RUN PGM=MATRIX
FILEO MATO[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.MAT" MO=1
FILEI MATI[1] = "S:\User\Rohne\Projects\Transit OB Survey\TravelModel\Model\Outputs\UnassignedAM.DBF" PATTERN=IJM:V FIELDS=IZ,JZ,0,V

PAR ZONES=2425

MW[1]=MI.1.1
ENDRUN

This step simply reads the DBF file and puts it into a matrix.

At this point, you can easily draw desire lines to show the unassigned survey trips. Hopefully it looks better than mine!