# Re: Swish-E indexing

From: Chris Humphries <ChrisJMH(at)not-real.vermilion99.freeserve.co.uk>
Date: Fri Feb 18 2000 - 11:18:34 GMT
```Ron, I tried this with the url mentioned below, and got these results:

.contents file had the contents of the html file (it was 6,426 bytes)

.links file was not created (the file does have links - could this be a
problem?)

.response file contained just the text "200" (another file - which indexed
correctly - returned "200
text/html, text/html; charset=iso-8859-1" in the response file. Could
*this* be a/the problem?)

Thanks for suggesting this - I did not know the spider could be used in
this way.

Chris Humphries

P.S.
Note for PC users - usage is:
swishspider var\tmp\data <url>
This could be confusing because at other times (for example, in the Swish-E
.config file and in Perl files) local paths are written using forward
slashes.

-----Original Message-----
From:	Ron Samuel Klatchko [SMTP:rsk@corpmail.brightmail.com]
Sent:	Thursday, February 17, 2000 9:45 PM
To:	Multiple recipients of list
Subject:	[SWISH-E] Re: Swish-E indexing

Chris Humphries wrote:
> I tried indexing this file
>
> IndexDir http://www.ifac.org/StandardsAndGuidance/FMAC/IMAP1.html
>
> using the HTTP method.
>
> It indexed just 2 words.
>
> The file did not look unusual, so out of curiosity I tried using "get()"
in
> a Perl program, saving the string out as a .htm file, and then passing
this
> file to Swish-E - which then indexed 718 words.
>
> Can anyone help explain to me why this should be so?

I'm not sure, but an interesting experiment to do is to use swishspider
to retrieve the URL.  You can use a command line like:

/path/to/swishspider /var/tmp/data
http://www.ifac.org/StandardsAndGuidance/FMAC/IMAP1.html

Which would then generate files:

/var/tmp/data.contents