I am pleased to announce the first release of grin. Colleagues and a few other interested parties have been pleasantly using the tool for a while now, but this is the first time it has been announced properly.
I wrote grin to help me search directories full of source code. The venerable
GNU grep and find are great tools, but they fall just a little short for my
normal use cases.
grin does exactly what I want 99% of the time with the least amount of thinking: recursive grep that skips crud I’m almost never interested in. For example, many of the projects I work on use Subversion as source control. Copies of the unmodified files are stored in the directory .svn/text-base/. When doing a typical recursive grep, say for a particular import I’m trying to remove from my code, grep will search these files too and find all kinds of false positives.
Now, one could construct a find command that would do the equivalent and store it away as little oneliner shell script, but that’s boring. I’m a coder, and this is an excuse to code up something fun and useful. If I’m going to use a tool 50 times a day, I might as well make it do exactly what I want. This also gives me an opportunity to make a library module of a featureful grepalike that can be repurposed to make any number of small, specific tools. But before we get to that, let’s take a look at some typical examples of using grin at the command line.
To recursively search the current directory for a regex:
$ grin some_regex
To search an explicit set of files:
$ grin some_regex file1.txt path/to/file2.txt
To search data piped to stdin:
$ cat somefile | grin some_regex -
To only search Python .py files:
$ grin -I "*.py" some_regex
To just show the names of the files that contain matches rather than the matches
themselves:
$ grin -l some_regex
Match highlighting was a feature added by my colleague Peter Wang. He took care to only use ANSI color escapes when they would be interpreted correctly. If you are piping the output to a file or a pager, you typically don’t want the ANSI escape sequences in there. But if you need to explicitly suppress the use of color highlighting:
$ grin --no-color some_regex
To force the use of color highlighting when piping the output to something that
is capable of understanding ANSI color escapes:
$ grin --force-color some_regex | less -R
To avoid recursing into directories named either CVS or RCS:
$ grin -d CVS,RCS some_regex
By default grin skips a large number of files. To suppress all of this behavior
and search everything:
$ grin -sbSDE some_regex
To use another program to determine the list of files to search, you have two options. The most common is to pass it on the command line using your shell’s backtick mechanism:
$ grin some_regex `find . -newer some_file.txt`
But this fails, sometimes. My shells typically break up arguments on whitespace, so if you have directories with spaces in them, grin will get busted paths. xargs helps, but I can never remember how to use it without reading the man page. Instead, grin can read the list of paths to search from a file, one path per line.
$ find . -newer some_file.txt | grin -f - some_regex
If you have embedded newlines in your directory names…you have my sympathy. Fortunately, you also have an option to use embedded NULs as the separator. But mostly sympathy.
$ find . -newer some_file.txt -print0 | grin -0 -f - some_regex
Now let’s talk about libraries. I tried to write grin with a clean design such that people could reuse pieces of it and shim in custom behavior in useful places. Want to search text files inside ZIP files? Just replace the generator that recurses through the paths with one that recurses into ZIP files and provide a function that will make an a filelike object for the ZIPped files. We can grep through texty-but-not-actual-plaintext files with the appropriate converters, too. For example, pyPdf is a nice library for reading and munging PDF files. It can extract the plain text from many PDF files. If you write a function that will do this and return a StringIO with the extracted text, grin will search through the plain text of PDFs.
I am also considering more exotic uses, too. For example, let’s say that I want to find all instances of a particular import in my Python code. There are a number of variations on the syntax for importing a package.
import foo.bar
from foo import bar
from foo import baz, bar
from foo import \
bar
from foo import (baz,
bar)
# Plus comments that tell you to do "from foo import bar"
As an example included in the source tarball, I wrote up a little script that will use the compiler module to parse Python files, extract just the import statements, and normalize them so they are easily searchable. Let’s say I have the following Python file:
$ cat example.py
import foo
import foo.baz as blah
from foo import bar, baz as bat
def somefunction():
"Do something to foo.baz"
import from_inside.function
We can look at just the normalization by searching for “import” since that will show up on every line:
$ grinimports.py import example.py
example.py:
1 : import foo
2 : import foo.baz as blah
3 : from foo import bar
4 : from foo import baz as bat
5 : import from_inside.function
To look for just the imports of foo.baz, a fairly simple regex can find all of them in the normalized output:
$ grinimports.py "import foo\.baz|from foo import baz" example.py
example.py:
2 : import foo.baz as blah
4 : from foo import baz as bat
The script that implements this is small. Outside of the code that implements the normalization and help text, it just recapitulates grin’s main() function with small modifications. There are more possibilities along these lines, of course. A grin that just searches docstrings and comments? Or reformats paragraphs to be on a single line so you can search for phrases that happen to be broken by a newline? Or search through the wiki pages in Trac’s relational database? I’d like to hear about what you think you can do with grin. Leave a comment here or send me email.
grin can be downloaded from its PyPI page or checked out from Subversion:
$ svn co https://svn.enthought.com/svn/sandbox/grin/trunk grin
If you have setuptools, you may also easy_install it:
$ easy_install grin
The Enthought Python Distribution currently has version 1.0 of grin, which is only missing one or two features compared to 1.1.
So, er, you are famialiar with , aren’t you? grin sounds quite a bit like it, although I have no idea how easy it’s to extend ack to deal with zips, pdfs etc.
Apart from the installation methods listed on the homepage, it’s also available in MacPorts and Ubuntu, at least.
Do you know about ack? Other than being written in Perl it seems to do the same job as grin.
I installed from grin-1.1.tar.gz.
grin –version still displays “grin 1.0″.
Thanks for creating a wonderful tool.
I can already think of customized search for different filetypes using this very useful backend.
I use Enthought version for Python instead of the standard one because of all its niceties (IPython).
Very happy to find Grin as a welcome addition
I’m fairly certain that grep does everything above on its own. Look at options -e -n –color -P (if you have it compiled in) -R. grep is the king of all tools.
You just build ack(1). http://search.cpan.org/~petdance/ack/ack
What does this achieve that Andy Lester’s ack program doesn’t?
very nice!
I use ack, sounds like grin is pretty similar:
http://petdance.com/ack/
Grin is Great.
I’ve had to write this sort of thing in both perl and python. (and even java).
Reading man pages for find, grep, and xargs got old pretty fast.
and don’t get me started on windows find.
grep does the same thing.
ack does the same thing.
Windows search does the same thing. Just install Vista and you get search out-of-the-box!
Its open source, too. If you have a hex editor.
But some of us wanted an extensible python solution.
I’m sure it was fun to write, but a few lines worth of shell scripting, to stitch find + grep together, would give you the same thing. In fact, I already did that, so here you go, just in case:
$ cat ~/bin/findgrep
#!/bin/bash
FILETYPE="*.php"
MODE="match"
GREPCOLOR="--color=auto"
OPTS="Ei"
print_usage()
{
printf "Usage: %s: [OPTIONS] pattern\n" $(basename $0) >&2
echo
echo "-f PATTERN Search in files with filename = PATTERN (default to $FILETYPE)"
echo "-l print only names of files containing matches"
echo "-c use markers to highlight the matching strings (default)"
echo "-C do NOT highlight matching strings"
echo "-i Run grep pattern case-insensitive (default)"
echo "-I Run grep pattern case-sensitive"
exit 2
}
while getopts 'f:lCciI' OPTION
do
case $OPTION in
f) FILETYPE="*.$OPTARG"
;;
l) MODE="filename"
;;
c)
;;
C) GREPCOLOR=""
;;
i)
;;
I)
OPTS="E"
;;
?) print_usage
;;
esac
done
shift $(($OPTIND - 1))
if [ $# -ne 1 ]
then
print_usage
fi
case $MODE in
match)
OPTS="nH$OPTS"
;;
filename)
OPTS="lH$OPTS"
;;
?)
print_usage
;;
esac
find . -type f -name "$FILETYPE" -print0 | xargs -0 grep -$OPTS $GREPCOLOR "$1"
On ack: yup, I used to use ack for this purpose. I got used to -C context lines, though, and at the time I wrote grin, at least, this feature was unimplemented. Being a Python programmer, I thought it would be more fun to write a new tool instead of hacking my way through Perl. The context handling code is not the most straightforward, and I shudder to think about what I would have to do in Perl to get it right.
Uwe: typo in grin.py. I updated the version in setup.py but not in the file itself.
William: I don’t see where grep skips the directories I want to skip. Variations on “–exclude=.svn” certainly don’t seem to accomplish it. Can you give me an example?
Hi,
Tried to use after installing via “easy_install grin”.
I am getting following error.
using python 2.5.1
regards,
raja
——————————————————
Traceback (most recent call last):
File “d:\python24\Scripts\grin-script.py”, line 7, in ?
sys.exit(
File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
y”, line 236, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
y”, line 2097, in load_entry_point
return ep.load()
File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
y”, line 1830, in load
entry = __import__(self.module_name, globals(),globals(), ['__name__'])
File “d:\python24\lib\site-packages\grin-1.1-py2.4.egg\grin.py”, line 423
finally:
^
SyntaxError: invalid syntax
@rkern So let’s say you have a handy dandy project and you want to see which python files you’ve imported datetime. It has several directories deep and it’s svn controlled. You don’t want to match anything in the .svn directories.
grep -R "import datetime" *.py --exclude=*.svn* --color
raja: I had some Python 2.5 syntax in there. This is fixed in SVN and will be pushed as 1.1.1 next week (I’m waiting for more bug reports). In the meantime, ‘easy_install -U “grin==dev”‘ or grab it from SVN at the URL given above.
William: How do I exclude more than one directory name? Like both “build” and “.svn”? As far as I can tell, the glob syntax used there does not allow {.svn,build}.
Thanks for writing grin, I really like it.
About ack: Its defaults suck. It skips files whose types it doesn’t understand. I’m sure you can reconfigure this, but why worry about re-configuring it correctly on every machine I use, when I can just “easy_install grin” and have it work sanely by default?
Also, a small feature request: grep’s -w option. I think you could basically just tack \b onto the beginning and end of the specified regex and it’d work like -w. It’s an option I find really useful