Skip to main content.
home | support | download

Back to List Archive

Re: Does the <!-- Swishcommand noindex --> work when spidering?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jun 12 2003 - 23:34:42 GMT
On Thu, Jun 12, 2003 at 01:43:40PM -0700, Jody Cleveland wrote:
> Hello,
> 
> I've got a site I spider using swish-e. There are certain portions of
> their pages they don't want spidered. For a site I've got local on that
> machine, I just ad a <!-- Swishcommand noindex --> before the chunk I
> don't want indexed. Then I pick up again with <!-- Swishcommand index
> -->. That doesn't seem to work when spidering. This person is putting
> those tags before and after certain links in pages they don't want
> spidered. Is there a different line I should have her put in there?

When in doubt... test!

moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/noindex.html
<html>
<head><title>noindex</title></head>
<body>
indexthisword
<!-- Swishcommand noindex -->
butnotthisword
<!-- Swishcommand index -->
thisisok
</body>
</html>

moseley@bumby:~/apache$ swish-e -S http -i 

http://localhost/apache/noindex.html -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'noindex'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'indexthisword'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'thisisok'   Pos:6  Stuct:0x9 ( BODY FILE )


moseley(at)not-real.bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default http://localhost/apache/noindex.html | swish-e -S prog -i stdin -T indexed_words -v0
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'

Summary for: http://localhost/apache/noindex.html
Total Bytes: 163  (163.0/sec)
 Total Docs:   1  (1.0/sec)
Unique URLs:   1  (1.0/sec)
    Adding:[1:swishdefault(1)]   'noindex'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'indexthisword'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'thisisok'   Pos:6  Stuct:0x9 ( BODY FILE )


Humm -- I think the word position needs to be incremented.  Otherwise 
you could get a phrase match across that comment....


-- 
Bill Moseley
moseley@hank.org
Received on Thu Jun 12 23:35:28 2003