Skip to main content.
home | support | download

Back to List Archive

Re: Excluding Files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Mar 25 2002 - 18:37:23 GMT
At 09:16 AM 03/25/02 -0800, Una Cullen wrote:
>
>Is there a way to exclude the swish-e parser from indexing a file with a
>specified piece of text residing in it?

Not directly, but like anything in swish, there's a way:

There's a few options:

When using HTML2 it can use either robots.txt or the robots <meta> tag
exclusion.  Or, swish can skip html documents that match a given title.  

If you want to ignore files based on some text you could write a very
simple filter that uses grep and cat.  Grep for the text, and if not found,
cat the file back to swish.  Otherwise, return a title that tells swish to
skip the text.

In the config do something like

  FileFilterMatch .\f.sh %p /./
  FileRules title is skip

Then use something like:

> cat f.sh
#!/bin/sh

if grep foo "$1" >/dev/null 
then
   echo "<title>skip</title>"
else
   cat "$1"
fi

Myself, I would  use -S prog, read in the file, look for the text, and then
only pass the files I want indexed on to swish because that avoids forking
swish and running a shell script for every document, but that's more work.

Does that help?

-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Mar 25 18:41:35 2002