Skip to main content.
home | support | download

Back to List Archive

Re:

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 23 2006 - 17:00:52 GMT
On Mon, Oct 23, 2006 at 08:42:59AM -0700, Jones, David H wrote:
> Can you be more explicit about how to tell the spided to index the
> image/svg+xml or .svg content type?

>  http://swish-e.org/docs/spider.html#configuration_options

Did you look at the examples under "test_response" in the docs?

> In the latest release of Swish-e I
> don't see any example of SwishSpiderConfig.pl or some similar config
> file.

It's in the source dir and it's also installed (install in my prefix
here):

bmoseley@willie:~/swish/swish_release_build/latest_swish_build$ find | grep SwishSpider
/swish-e-2.4.4/prog-bin/SwishSpiderConfig.pl
/install/share/doc/swish-e/examples/prog-bin/SwishSpiderConfig.pl



When you use your own spider config file (instead of using "default")
the spider will index every file it finds.  Clearly, you don't want to
index, say, images so you can either block those in "test_url" by file
name (file extension), or you can test the file's content-type in a
"test_response" function.

The advantage of test_url, of course, is the spider doesn't have to
actually request the document (although there's no guarantee that
foo.jpg is really image/jpeg and not text/plain until you fetch it.)

Post your spider config file if you still can't get it to work -- but
there should be lots of examples in the docs to work from.


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Oct 23 10:00:55 2006