Skip to main content.
home | support | download

Back to List Archive

Not Saving Correct Title to the Index

From: Ken Schweigert <ken(at)not-real.byte-productions.com>
Date: Wed Aug 02 2006 - 20:19:27 GMT
I'm having trouble getting Swish-e to write the correct title to the  
index.  I use the "swishspider" to index the site because it is a  
dynamic site and uses mod_rewrite.

Here is some info about Swish-e and the output of a search:
-------------------------------------
[ken@www swish]$ uname -a
Linux my.web.server 2.4.9-e.68smp #1 SMP Thu Jan 19 18:14:54 EST 2006  
i686 unknown
[ken@www swish]$ ./swish-e -V
SWISH-E 2.4.3
[ken@www swish]$ ./swish-e -f cedarhomes-cedar_homes.index -w oak
# SWISH format: 2.4.3
# Search words: oak
# Removed stopwords:
# Number of hits: 24
# Search time: 0.000 seconds
# Run time: 0.016 seconds
1000 http://www.cedarhomes.com/cedar_homes/featured_articles_1//? 
expand=44 "?expand=44" 11555
1000 http://www.cedarhomes.com/cedar_homes/featured_articles_1///? 
expand=44 "?expand=44" 11556
1000 http://www.cedarhomes.com/cedar_homes/featured_articles_1////? 
expand=44 "?expand=44" 11557
..
[ken@www swish]$ more cedarhomes-cedar_homes.swish.conf
IndexDir http://www.cedarhomes.com/[some hidden entry point just for  
swish-e]
IndexFile /[apache_root]/swish/cedarhomes-cedar_homes.index
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-
IgnoreFirstChar .-
IgnoreLastChar  .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
MinWordLimit 2
DefaultContents HTML2
IndexContents TXT2 txt
StoreDescription HTML <body> 2000
StoreDescription HTML2 <body> 2000
IgnoreWords file: stopwords.txt
MetaNames search keywords description title
PropertyNames title search
MaxDepth 10
Delay 0
TmpDir .
SpiderDirectory /usr/local/lib/swish-e
-------------------------------------

If you go to one of the one of the links in the results there is a  
TITLE tag, but for some reason the trailing part of the URL is what  
gets added as the title.

The really odd thing for me is that I have a very similar site, also  
on the same server, and it has no problems properly indexing the  
site.  I've spent hours looking at the config files, and re-indexing,  
and there must be something simple that I'm missing.

Can anyone give me a clue?
--
Ken Schweigert, Network Administrator
Byte Productions, LLC
Received on Wed Aug 2 13:19:31 2006