On Thu, Aug 26, 2004 at 04:07:04AM -0700, Nic Gibson wrote:
> I'm having an odd problem with swish-e 2.4.2. I have an index generated using
> spider.pl. Contrary to my expectations it appears to be indexing the href content
> of html anchors. I've attached the index configuration file to this message. The only
> odd thing I can think of about this particular website is that the URLs don't have
> file extensions (see http://pmr.corbas.co.uk/dynamic/). However, the content type
> is definitely correct.
You might set:
ParserWarnLevel 9
All I saw were some errors about HTML entities that couldn't be mapped
to 8859-1.
Otherwise, can you show the text of the hrefs that is being indexed?
You will likely get better help if you can provide a working example.
I added a "/" to WordCharacters (along with a-z0-9) and used -T
indexed_words and didn't see anything that looked like a URL path.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Aug 26 08:17:15 2004