DOS Commands You Should Know: FINDSTR

The last time I talked about DOS, it was FIND.  Find is great for certain uses, but not for others... like when you need to search for a string through a lot of files in many subfolders.

In my case, I wanted to look for where I've used DELIMTER in a Cube script.  I tried Microsoft's example, and it doesn't work (and their comment box doesn't work with Chrome, so there's that, too).

This is a two step process.  The first is easy, and it uses a very basic DOS command: dir.

dir *.s /a/b >filelist

This creates a list of files to search in the current folder.  The list will include the full path.

The second command is actually three-in-one:

echo off & for /F "tokens=*" %A in (filelist) do findstr /i /m "DELIMITER" "%A"

The first part of this is "echo off".  This turns off the command prompt every time (else, you'll see every findstr command).

The second part is the for... do loop.  This basically says "for each line in the file" and stores it (temporarily) as %A.

The third part is the findstr command.  The i switch turns off case sensitivity, and the m switch prints ONLY files that match.  I'm searching for DELIMITER (not case sensitive, of course).  The "%A" is the file to search, being passed along from the for...do loop.  This is in quotes because there are spaces in some of my path names, and without the quotes, the command would fail when a space is encountered because it would think it is the end of input.

This is useful if you're like me and have 1,563,169 lines of script file in your model folder!

BONUS TIP!

I found the number of lines using gawk wrapped in the same process:

echo off & for /F "tokens=*" %A in (filelist) do gawk 'END{print NR}' "%A" >> filelen

This gave me a long list of numbers that I brought into Excel to get the sum.

In the gawk command, 'END{print NR}' means to print the number of records (by default, lines) at the end of looking through the file.  "%A" is the file to check (just like in the findstr command).  The >>filelen APPENDS the output to a file called filelen.  It is important to use the append here because the command runs on each loop.  If a single > is used, only the final number of lines is placed in the file.

Tags: , , ,

Comments from Other Sites

Visitor Comments

  1. dave o'brien Says:

    If you're going to install gawk, you might as well install Cygwin, and then you can do proper unix scripting. Then, searching for specific text in all your script files becomes:

    grep -Ir DELIMITER /my/script/folder/*

    (-r is a recursive search, -i is a case insensitive search.)

    This will give a bunch of results like:


    planning_data/PREPAR00.S: delimiter=',', zone=1, rurallife=2, resident=3, transport=4, education=5,
    planning_data/PREPAR00.S: delimiter=',', z=1, households=2, dwellings=3, white=4, blue=5, presch=6, prisch=7, secsch=8, tertedu=9,
    planning_data/PREPAR00.S: delimiter=',', zone=1, football=2, cricket=3, golf=4, mixed=5, racing=6, motors=7,
    planning_data/parse_zonal_data.s: delimiter=',', z=1, rurallife=2, resident=3, transport=4, education=5,

    Another (non-free) option is to buy ZtreeWin (http://www.ztree.com/html/ztreewin.htm), which will let you search for strings in files, and tag them for you. If you ever used Xtree back in the day, this is the same thing.

  2. Krishnan Says:

    Agree that grep is an awesome utility. Does a lot more than just find strings. Here is an example to delete blank lines from a dataset (Assuming you are running CMD) :

    grep -v "^$" input file > output file

    AWK is a gift to read large datasets easily and just love it. Another implementation of AWK to find # of lines.

    gawk -F, 'NR>1 {rows++}END{print "numrows=", rows}' input.csv > output.csv