Skip to main content.
home | support | download

Back to List Archive

Re: spider indexes only filenames

From: <jmruiz(at)not-real.boe.es>
Date: Mon Sep 25 2000 - 10:09:15 GMT
Hi Bryan,

Sorry, it was my fault!!

Just change line 251 of http.c:
 
From:

fgets(buffer, sizeof(buffer), fp);

To:

fgets(buffer, lenbuffer, fp);

Explanation:
I changed buffer from a internal function variable to a char pointer 
that is allocated dinamically. In this case, sizeof(buffer) is always 4 
(the size of a pointer) instead of the true length of the buffer. For this 
reason the contenttype was never read properly and the contents 
was not indexed.

Sorry for the inconvenience. Let me know if this fix the problem.

cu
Jose

On 22 Sep 2000, at 16:30, Bryan Heidorn wrote:

> 
> I have the unhappy behavior in the spider that it indexes only the file 
> name of the files if finds. So if I index a file 
> http://somewhere.com/~someone/index.html  I get the words "index" and 
> "html" in the index and no other words that are in index.html. The spider 
> does follow the links out of index.html to other files.. and then it 
> indexes those files names and not the contents. I have tried several 
> different start points on two different servers with the same behavior.
> It seems like a problem with the pipe between the spider and swish.
> 
Received on Mon Sep 25 10:09:43 2000