SPAM: -------------------- Start SpamAssassin results ----------------------
SPAM: This mail is probably spam. The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future.
SPAM: See http://spamassassin.org/tag/ for more details.
SPAM:
SPAM: Content analysis details: (4.10 points, 3.5 required)
SPAM: PLING_QUERY (0.7 points) Subject has exclamation mark and question mark
SPAM: FROM_ENDS_IN_NUMS (0.7 points) From: ends in numbers
SPAM: HTML_10_20 (1.4 points) BODY: Message is 10% to 20% HTML
SPAM: HTML_MESSAGE (0.1 points) BODY: HTML included in message
SPAM: QUOTED_EMAIL_TEXT (-0.5 points) BODY: Contains what looks like a quoted email text
SPAM: SEMIFORGED_HOTMAIL_RCVD (1.7 points) hotmail.com 'From' address, but no 'Received:'
SPAM:
SPAM: -------------------- End of SpamAssassin results ---------------------
Hi,
I have been using just <!-- index --> and <!-- noindex --> and seems to be
working OK.
Is it necessary to put "Swishcommand noindex" ?
Carlos
>From: Bill Moseley <moseley@hank.org>
>Reply-To: moseley@hank.org
>To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
>Subject: [SWISH-E] Re: Does the <!-- Swishcommand noindex --> work when
>spidering?
>Date: Thu, 12 Jun 2003 16:34:38 -0700 (PDT)
>
>On Thu, Jun 12, 2003 at 01:43:40PM -0700, Jody Cleveland wrote:
> > Hello,
> >
> > I've got a site I spider using swish-e. There are certain portions of
> > their pages they don't want spidered. For a site I've got local on that
> > machine, I just ad a <!-- Swishcommand noindex --> before the chunk I
> > don't want indexed. Then I pick up again with <!-- Swishcommand index
> > -->. That doesn't seem to work when spidering. This person is putting
> > those tags before and after certain links in pages they don't want
> > spidered. Is there a different line I should have her put in there?
>
>When in doubt... test!
>
>moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/noindex.html
><html>
><head><title>noindex</title></head>
><body>
>indexthisword
><!-- Swishcommand noindex -->
>butnotthisword
><!-- Swishcommand index -->
>thisisok
></body>
></html>
>
>moseley@bumby:~/apache$ swish-e -S http -i
>
>http://localhost/apache/noindex.html -T indexed_words -v0
> Adding:[1:swishdefault(1)] 'noindex' Pos:2 Stuct:0x7 ( HEAD TITLE
>FILE )
> Adding:[1:swishdefault(1)] 'indexthisword' Pos:5 Stuct:0x9 ( BODY
>FILE )
> Adding:[1:swishdefault(1)] 'thisisok' Pos:6 Stuct:0x9 ( BODY FILE
>)
>
>
>moseley@bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default
>http://localhost/apache/noindex.html | swish-e -S prog -i stdin -T
>indexed_words -v0
>/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
>
>Summary for: http://localhost/apache/noindex.html
>Total Bytes: 163 (163.0/sec)
> Total Docs: 1 (1.0/sec)
>Unique URLs: 1 (1.0/sec)
> Adding:[1:swishdefault(1)] 'noindex' Pos:2 Stuct:0x7 ( HEAD TITLE
>FILE )
> Adding:[1:swishdefault(1)] 'indexthisword' Pos:5 Stuct:0x9 ( BODY
>FILE )
> Adding:[1:swishdefault(1)] 'thisisok' Pos:6 Stuct:0x9 ( BODY FILE
>)
>
>
>Humm -- I think the word position needs to be incremented. Otherwise
>you could get a phrase match across that comment....
>
>
>--
>Bill Moseley
>moseley@hank.org
>
_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=features/featuredemail
Received on Fri Jun 13 02:11:54 2003