Skip to main content.
home | support | download

Back to List Archive

**JUNK** Re: Does the <!-- Swishcommand noindex --> work when spidering?

From: Carlos Rocha <caroch00(at)not-real.hotmail.com>
Date: Fri Jun 13 2003 - 02:08:00 GMT
SPAM: -------------------- Start SpamAssassin results ----------------------
SPAM: This mail is probably spam.  The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future.
SPAM: See http://spamassassin.org/tag/ for more details.
SPAM: 
SPAM: Content analysis details:   (4.10 points, 3.5 required)
SPAM: PLING_QUERY        (0.7 points)  Subject has exclamation mark and question mark
SPAM: FROM_ENDS_IN_NUMS  (0.7 points)  From: ends in numbers
SPAM: HTML_10_20         (1.4 points)  BODY: Message is 10% to 20% HTML
SPAM: HTML_MESSAGE       (0.1 points)  BODY: HTML included in message
SPAM: QUOTED_EMAIL_TEXT  (-0.5 points) BODY: Contains what looks like a quoted email text
SPAM: SEMIFORGED_HOTMAIL_RCVD (1.7 points)  hotmail.com 'From' address, but no 'Received:'
SPAM: 
SPAM: -------------------- End of SpamAssassin results ---------------------

Hi,
I have been using just <!-- index --> and <!-- noindex --> and seems to be 
working OK.
Is it necessary to put "Swishcommand noindex" ?

Carlos


>From: Bill Moseley <moseley@hank.org>
>Reply-To: moseley@hank.org
>To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
>Subject: [SWISH-E] Re: Does the <!-- Swishcommand noindex --> work when 
>spidering?
>Date: Thu, 12 Jun 2003 16:34:38 -0700 (PDT)
>
>On Thu, Jun 12, 2003 at 01:43:40PM -0700, Jody Cleveland wrote:
> > Hello,
> >
> > I've got a site I spider using swish-e. There are certain portions of
> > their pages they don't want spidered. For a site I've got local on that
> > machine, I just ad a <!-- Swishcommand noindex --> before the chunk I
> > don't want indexed. Then I pick up again with <!-- Swishcommand index
> > -->. That doesn't seem to work when spidering. This person is putting
> > those tags before and after certain links in pages they don't want
> > spidered. Is there a different line I should have her put in there?
>
>When in doubt... test!
>
>moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/noindex.html
><html>
><head><title>noindex</title></head>
><body>
>indexthisword
><!-- Swishcommand noindex -->
>butnotthisword
><!-- Swishcommand index -->
>thisisok
></body>
></html>
>
>moseley@bumby:~/apache$ swish-e -S http -i
>
>http://localhost/apache/noindex.html -T indexed_words -v0
>     Adding:[1:swishdefault(1)]   'noindex'   Pos:2  Stuct:0x7 ( HEAD TITLE 
>FILE )
>     Adding:[1:swishdefault(1)]   'indexthisword'   Pos:5  Stuct:0x9 ( BODY 
>FILE )
>     Adding:[1:swishdefault(1)]   'thisisok'   Pos:6  Stuct:0x9 ( BODY FILE 
>)
>
>
>moseley@bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default 
>http://localhost/apache/noindex.html | swish-e -S prog -i stdin -T 
>indexed_words -v0
>/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
>
>Summary for: http://localhost/apache/noindex.html
>Total Bytes: 163  (163.0/sec)
>  Total Docs:   1  (1.0/sec)
>Unique URLs:   1  (1.0/sec)
>     Adding:[1:swishdefault(1)]   'noindex'   Pos:2  Stuct:0x7 ( HEAD TITLE 
>FILE )
>     Adding:[1:swishdefault(1)]   'indexthisword'   Pos:5  Stuct:0x9 ( BODY 
>FILE )
>     Adding:[1:swishdefault(1)]   'thisisok'   Pos:6  Stuct:0x9 ( BODY FILE 
>)
>
>
>Humm -- I think the word position needs to be incremented.  Otherwise
>you could get a phrase match across that comment....
>
>
>--
>Bill Moseley
>moseley@hank.org
>

_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail
Received on Fri Jun 13 02:11:54 2003