Skip to main content.
home | support | download

Back to List Archive

Re: Alrighty then (RE: RE: No title being returned for version 1.3.2)

From: <john.leth(at)not-real.gulfaero.com>
Date: Mon May 31 1999 - 14:07:10 GMT
This may sound like a stupid reply if you have already checked this but I was
experiencing a similar problem. I found the solution in the list archives.
A compile time option must be set to the depth of your TITLE.

in config.h:

#define TITLETOPLINES 30

/* This is how many lines deep SWISH will look into an HTML file to
** attempt to find a <TITLE> tag.
*/


The default is like 7. It may slow things down a little but 30 seems to be a
good number.

----
John Leth-Nissen
Web Developer
Gulfstream Aerospace Corporation


David Norris wrote:

> OK, I think this makes some sense.
>
> If I index http://www.misma.org/contact.html using the spider the TITLE is
> set to "contact.html" in the swish index file.
> HTTP Headers:
>         HTTP/1.1 200 OK
>         Date: Mon, 31 May 1999 10:22:55 GMT
>         Server: Apache/1.2.5
>         X-Server-CGI: PHP/3.0.7
>         X-Resource-Indicator:
>         X-Resource-Modified: 923650015
>         Expires: Tue, 01 Jun 1999 10:22:55 GMT
>         Cache-Control: post-check=43200,pre-check=86400
>         Last-Modified: 1999-04-09T09:26:55Z
>         Connection: close
>         Content-Type: text/html; charset=iso-8859-1
>
> If I index http://localhost/test/contact.html using the spider the TITLE is
> set to "Contacts - MiSMA..."
> HTTP Headers:
>         HTTP/1.1 200 OK
>         Date: Mon, 31 May 1999 10:21:54 GMT
>         Server: Apache/1.3.6 (Win32)
>         Parser: PHP/3.0.6 (Win32)
>         Connection: close
>         Content-Type: text/html
>
> If I index /my_documents/test/contact.html using file system the TITLE is
> set to "Contacts - MiSMA..."
> No HTTP Header Equivalents.
>
> This is exactly the same file in all three cases.  Line feed is Unix LF in
> all three cases.  I sorta hacked my copy of the swishspider to force it to
> index text/html; charset=iso-8859-1.  That appears to be the only major
> difference which could have an effect on the parsing.  Something, somewhere
> doesn't recognize that it should be parsing that document with the HTML
> parser.  There is some other code somewhere that assumes anything not
> exactly text/html isn't HTML.  Forcing the spider to index the contents of
> text/html; charset=... isn't enough.
>
> So, to test this theory I changed my content-type header on the misma.org
> server.  Sure enough, the titles are now indexed correctly.  So, this
> appears to be the Content-Type 'feature' of that old PERL module.
>
> I don't know if this helps anyone else.  But, I can, at least, hack
> something to change my content-type header when swishspider visits a
> document until someone figures this out.
>
> ,David Norris
>
> World Wide Web - http://www.geocities.com/CapeCanaveral/Lab/1652/
> Home Computer - http://illusionary.tzo.cc/
> Page via mail - 412039@pager.mirabilis.com
> ICQ Universal Internet Number - 412039
> E-Mail - kg9ae@geocities.com
Received on Mon May 31 07:04:03 1999