grin 1.1

I am pleased to announce the first release of grin. Colleagues and a few other interested parties have been pleasantly using the tool for a while now, but this is the first time it has been announced properly.

I wrote grin to help me search directories full of source code. The venerable
GNU grep and find are great tools, but they fall just a little short for my
normal use cases.

grin does exactly what I want 99% of the time with the least amount of thinking: recursive grep that skips crud I’m almost never interested in. For example, many of the projects I work on use Subversion as source control. Copies of the unmodified files are stored in the directory .svn/text-base/. When doing a typical recursive grep, say for a particular import Im trying to remove from my code, grep will search these files too and find all kinds of false positives.

Now, one could construct a find command that would do the equivalent and store it away as little oneliner shell script, but that’s boring. Im a coder, and this is an excuse to code up something fun and useful. If Im going to use a tool 50 times a day, I might as well make it do exactly what I want. This also gives me an opportunity to make a library module of a featureful grepalike that can be repurposed to make any number of small, specific tools. But before we get to that, lets take a look at some typical examples of using grin at the command line.

To recursively search the current directory for a regex:

$ grin some_regex

To search an explicit set of files:

$ grin some_regex file1.txt path/to/file2.txt

To search data piped to stdin:

$ cat somefile | grin some_regex -

To only search Python .py files:

$ grin -I "*.py" some_regex

To just show the names of the files that contain matches rather than the matches
themselves:

$ grin -l some_regex

Match highlighting was a feature added by my colleague Peter Wang. He took care to only use ANSI color escapes when they would be interpreted correctly. If you are piping the output to a file or a pager, you typically don’t want the ANSI escape sequences in there. But if you need to explicitly suppress the use of color highlighting:

$ grin --no-color some_regex

To force the use of color highlighting when piping the output to something that
is capable of understanding ANSI color escapes:

$ grin --force-color some_regex | less -R

To avoid recursing into directories named either CVS or RCS:

$ grin -d CVS,RCS some_regex

By default grin skips a large number of files. To suppress all of this behavior
and search everything:

$ grin -sbSDE some_regex

To use another program to determine the list of files to search, you have two options. The most common is to pass it on the command line using your shells backtick mechanism:

$ grin some_regex `find . -newer some_file.txt`

But this fails, sometimes. My shells typically break up arguments on whitespace, so if you have directories with spaces in them, grin will get busted paths. xargs helps, but I can never remember how to use it without reading the man page. Instead, grin can read the list of paths to search from a file, one path per line.

$ find . -newer some_file.txt | grin -f - some_regex

If you have embedded newlines in your directory names…you have my sympathy. Fortunately, you also have an option to use embedded NULs as the separator. But mostly sympathy.

$ find . -newer some_file.txt -print0 | grin -0 -f - some_regex

Now lets talk about libraries. I tried to write grin with a clean design such that people could reuse pieces of it and shim in custom behavior in useful places. Want to search text files inside ZIP files? Just replace the generator that recurses through the paths with one that recurses into ZIP files and provide a function that will make an a filelike object for the ZIPped files. We can grep through texty-but-not-actual-plaintext files with the appropriate converters, too. For example, pyPdf is a nice library for reading and munging PDF files. It can extract the plain text from many PDF files. If you write a function that will do this and return a StringIO with the extracted text, grin will search through the plain text of PDFs.

I am also considering more exotic uses, too. For example, lets say that I want to find all instances of a particular import in my Python code. There are a number of variations on the syntax for importing a package.

import foo.bar
from foo import bar
from foo import baz, bar
from foo import \
    bar
from foo import (baz,
    bar)
# Plus comments that tell you to do "from foo import bar"

As an example included in the source tarball, I wrote up a little script that will use the compiler module to parse Python files, extract just the import statements, and normalize them so they are easily searchable. Lets say I have the following Python file:

$ cat example.py
import foo
import foo.baz as blah
from foo import bar, baz as bat

def somefunction():
    "Do something to foo.baz"
    import from_inside.function

We can look at just the normalization by searching for “import” since that will show up on every line:

$ grinimports.py import example.py
example.py:
    1 : import foo
    2 : import foo.baz as blah
    3 : from foo import bar
    4 : from foo import baz as bat
    5 : import from_inside.function

To look for just the imports of foo.baz, a fairly simple regex can find all of them in the normalized output:

$ grinimports.py "import foo\.baz|from foo import baz" example.py
example.py:
    2 : import foo.baz as blah
    4 : from foo import baz as bat

The script that implements this is small. Outside of the code that implements the normalization and help text, it just recapitulates grin’s main() function with small modifications. There are more possibilities along these lines, of course. A grin that just searches docstrings and comments? Or reformats paragraphs to be on a single line so you can search for phrases that happen to be broken by a newline? Or search through the wiki pages in Tracs relational database? Id like to hear about what you think you can do with grin. Leave a comment here or send me email.

grin can be downloaded from its PyPI page or checked out from Subversion:

$ svn co https://svn.enthought.com/svn/sandbox/grin/trunk grin

If you have setuptools, you may also easy_install it:

$ easy_install grin

The Enthought Python Distribution currently has version 1.0 of grin, which is only missing one or two features compared to 1.1.

17 thoughts on “grin 1.1

  1. avatarSushant Srivastava

    Thanks for creating a wonderful tool.
    I can already think of customized search for different filetypes using this very useful backend.
    I use Enthought version for Python instead of the standard one because of all its niceties (IPython).
    Very happy to find Grin as a welcome addition

    Reply
  2. avatarGregory Matous

    Grin is Great.

    I’ve had to write this sort of thing in both perl and python. (and even java).

    Reading man pages for find, grep, and xargs got old pretty fast.

    and don’t get me started on windows find.

    Reply
  3. avatarGregory Matous

    grep does the same thing.
    ack does the same thing.

    Windows search does the same thing. Just install Vista and you get search out-of-the-box!

    Its open source, too. If you have a hex editor.

    But some of us wanted an extensible python solution.

    Reply
  4. avatartroelskn

    I’m sure it was fun to write, but a few lines worth of shell scripting, to stitch find + grep together, would give you the same thing. In fact, I already did that, so here you go, just in case:

    $ cat ~/bin/findgrep

    #!/bin/bash
    FILETYPE="*.php"
    MODE="match"
    GREPCOLOR="--color=auto"
    OPTS="Ei"

    print_usage()
    {
    printf "Usage: %s: [OPTIONS] pattern\n" $(basename $0) >&2
    echo
    echo "-f PATTERN Search in files with filename = PATTERN (default to $FILETYPE)"
    echo "-l print only names of files containing matches"
    echo "-c use markers to highlight the matching strings (default)"
    echo "-C do NOT highlight matching strings"
    echo "-i Run grep pattern case-insensitive (default)"
    echo "-I Run grep pattern case-sensitive"
    exit 2
    }

    while getopts 'f:lCciI' OPTION
    do
    case $OPTION in
    f) FILETYPE="*.$OPTARG"
    ;;
    l) MODE="filename"
    ;;
    c)
    ;;
    C) GREPCOLOR=""
    ;;
    i)
    ;;
    I)
    OPTS="E"
    ;;
    ?) print_usage
    ;;
    esac
    done

    shift $(($OPTIND - 1))
    if [ $# -ne 1 ]
    then
    print_usage
    fi

    case $MODE in
    match)
    OPTS="nH$OPTS"
    ;;
    filename)
    OPTS="lH$OPTS"
    ;;
    ?)
    print_usage
    ;;
    esac

    find . -type f -name "$FILETYPE" -print0 | xargs -0 grep -$OPTS $GREPCOLOR "$1"

    Reply
  5. avatarrkern Post author

    On ack: yup, I used to use ack for this purpose. I got used to -C context lines, though, and at the time I wrote grin, at least, this feature was unimplemented. Being a Python programmer, I thought it would be more fun to write a new tool instead of hacking my way through Perl. The context handling code is not the most straightforward, and I shudder to think about what I would have to do in Perl to get it right.

    Uwe: typo in grin.py. I updated the version in setup.py but not in the file itself.

    William: I don’t see where grep skips the directories I want to skip. Variations on “–exclude=.svn” certainly don’t seem to accomplish it. Can you give me an example?

    Reply
  6. avatarraja

    Hi,
    Tried to use after installing via “easy_install grin”.
    I am getting following error.
    using python 2.5.1

    regards,
    raja
    ——————————————————

    Traceback (most recent call last):
    File “d:\python24\Scripts\grin-script.py”, line 7, in ?
    sys.exit(
    File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
    y”, line 236, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
    File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
    y”, line 2097, in load_entry_point
    return ep.load()
    File “d:\python24\lib\site-packages\setuptools-0.6c1-py2.4.egg\pkg_resources.p
    y”, line 1830, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
    File “d:\python24\lib\site-packages\grin-1.1-py2.4.egg\grin.py”, line 423
    finally:
    ^
    SyntaxError: invalid syntax

    Reply
  7. avatarWilliam Stearns

    @rkern So let’s say you have a handy dandy project and you want to see which python files you’ve imported datetime. It has several directories deep and it’s svn controlled. You don’t want to match anything in the .svn directories.


    grep -R "import datetime" *.py --exclude=*.svn* --color

    Reply
  8. avatarrkern Post author

    raja: I had some Python 2.5 syntax in there. This is fixed in SVN and will be pushed as 1.1.1 next week (I’m waiting for more bug reports). In the meantime, ‘easy_install -U “grin==dev”‘ or grab it from SVN at the URL given above.

    William: How do I exclude more than one directory name? Like both “build” and “.svn”? As far as I can tell, the glob syntax used there does not allow {.svn,build}.

    Reply
  9. avatarNick

    Thanks for writing grin, I really like it.

    About ack: Its defaults suck. It skips files whose types it doesn’t understand. I’m sure you can reconfigure this, but why worry about re-configuring it correctly on every machine I use, when I can just “easy_install grin” and have it work sanely by default?

    Also, a small feature request: grep’s -w option. I think you could basically just tack \b onto the beginning and end of the specified regex and it’d work like -w. It’s an option I find really useful

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>