On Thu, Aug 26, 2004 at 04:07:04AM -0700, Nic Gibson wrote:
> I'm having an odd problem with swish-e 2.4.2. I have an index generated using
> spider.pl. Contrary to my expectations it appears to be indexing the href content
> of html anchors. I've attached the index configuration file to this message. The only
> odd thing I can think of about this particular website is that the URLs don't have
> file extensions (see http://pmr.corbas.co.uk/dynamic/). However, the content type
> is definitely correct.
You might set:
All I saw were some errors about HTML entities that couldn't be mapped
Otherwise, can you show the text of the hrefs that is being indexed?
You will likely get better help if you can provide a working example.
I added a "/" to WordCharacters (along with a-z0-9) and used -T
indexed_words and didn't see anything that looked like a URL path.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Thu Aug 26 08:17:15 2004