I am pleased to announce the first release of grin. Colleagues and a few other interested parties have been pleasantly using the tool for a while now, but this is the first time it has been announced properly.
grin does exactly what I want 99% of the time with the least amount of thinking: recursive grep that skips crud I’m almost never interested in. For example, many of the projects I work on use Subversion as source control. Copies of the unmodified files are stored in the directory .svn/text-base/. When doing a typical recursive grep, say for a particular import Im trying to remove from my code, grep will search these files too and find all kinds of false positives.
Now, one could construct a find command that would do the equivalent and store it away as little oneliner shell script, but that’s boring. Im a coder, and this is an excuse to code up something fun and useful. If Im going to use a tool 50 times a day, I might as well make it do exactly what I want. This also gives me an opportunity to make a library module of a featureful grepalike that can be repurposed to make any number of small, specific tools. But before we get to that, lets take a look at some typical examples of using grin at the command line.
To recursively search the current directory for a regex:
$ grin some_regex
To search an explicit set of files:
$ grin some_regex file1.txt path/to/file2.txt
To search data piped to stdin:
$ cat somefile | grin some_regex -
To only search Python .py files:
$ grin -I "*.py" some_regex
To just show the names of the files that contain matches rather than the matches
$ grin -l some_regex
Match highlighting was a feature added by my colleague Peter Wang. He took care to only use ANSI color escapes when they would be interpreted correctly. If you are piping the output to a file or a pager, you typically don’t want the ANSI escape sequences in there. But if you need to explicitly suppress the use of color highlighting:
$ grin --no-color some_regex
To force the use of color highlighting when piping the output to something that
is capable of understanding ANSI color escapes:
$ grin --force-color some_regex | less -R
To avoid recursing into directories named either CVS or RCS:
$ grin -d CVS,RCS some_regex
By default grin skips a large number of files. To suppress all of this behavior
and search everything:
$ grin -sbSDE some_regex
To use another program to determine the list of files to search, you have two options. The most common is to pass it on the command line using your shells backtick mechanism:
$ grin some_regex `find . -newer some_file.txt`
But this fails, sometimes. My shells typically break up arguments on whitespace, so if you have directories with spaces in them, grin will get busted paths. xargs helps, but I can never remember how to use it without reading the man page. Instead, grin can read the list of paths to search from a file, one path per line.
$ find . -newer some_file.txt | grin -f - some_regex
If you have embedded newlines in your directory names…you have my sympathy. Fortunately, you also have an option to use embedded NULs as the separator. But mostly sympathy.
$ find . -newer some_file.txt -print0 | grin -0 -f - some_regex
Now lets talk about libraries. I tried to write grin with a clean design such that people could reuse pieces of it and shim in custom behavior in useful places. Want to search text files inside ZIP files? Just replace the generator that recurses through the paths with one that recurses into ZIP files and provide a function that will make an a filelike object for the ZIPped files. We can grep through texty-but-not-actual-plaintext files with the appropriate converters, too. For example, pyPdf is a nice library for reading and munging PDF files. It can extract the plain text from many PDF files. If you write a function that will do this and return a StringIO with the extracted text, grin will search through the plain text of PDFs.
I am also considering more exotic uses, too. For example, lets say that I want to find all instances of a particular import in my Python code. There are a number of variations on the syntax for importing a package.
import foo.bar from foo import bar from foo import baz, bar from foo import \ bar from foo import (baz, bar) # Plus comments that tell you to do "from foo import bar"
As an example included in the source tarball, I wrote up a little script that will use the compiler module to parse Python files, extract just the import statements, and normalize them so they are easily searchable. Lets say I have the following Python file:
$ cat example.py import foo import foo.baz as blah from foo import bar, baz as bat def somefunction(): "Do something to foo.baz" import from_inside.function
We can look at just the normalization by searching for “import” since that will show up on every line:
$ grinimports.py import example.py example.py: 1 : import foo 2 : import foo.baz as blah 3 : from foo import bar 4 : from foo import baz as bat 5 : import from_inside.function
To look for just the imports of foo.baz, a fairly simple regex can find all of them in the normalized output:
$ grinimports.py "import foo\.baz|from foo import baz" example.py example.py: 2 : import foo.baz as blah 4 : from foo import baz as bat
The script that implements this is small. Outside of the code that implements the normalization and help text, it just recapitulates grin’s main() function with small modifications. There are more possibilities along these lines, of course. A grin that just searches docstrings and comments? Or reformats paragraphs to be on a single line so you can search for phrases that happen to be broken by a newline? Or search through the wiki pages in Tracs relational database? Id like to hear about what you think you can do with grin. Leave a comment here or send me email.
grin can be downloaded from its PyPI page or checked out from Subversion:
$ svn co https://svn.enthought.com/svn/sandbox/grin/trunk grin
If you have setuptools, you may also easy_install it:
$ easy_install grin
The Enthought Python Distribution currently has version 1.0 of grin, which is only missing one or two features compared to 1.1.