Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swishtitle bug

From: Robert Nelson <rnelson(at)not-real.real.com>
Date: Tue Jul 20 2010 - 00:36:32 GMT
So I got off my lazy butt and actually debugged the problem. :-)

Turns out it was TITLETOPLINES set to 12 that was causing my problem.  Mhonarc puts a bunch of comments at the beginning of the file.  Parser.c wasn't involved because I didn't have libxml2-devel installed.  Installed it, rebuilt and now everything works fine.

-----Original Message-----
From: users-bounces@lists.swish-e.org [mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
Sent: Monday, July 19, 2010 2:59 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] swishtitle bug

Robert Nelson wrote on 7/19/10 2:13 PM:

> 
> I finally tracked it down to the title tag appearing after the meta tags
> in the header.  If it appears after the meta tags it isn't found.  This
> is probably a bug in parser.c but I figured someone more familiar with
> it could find it faster than I can.
> 

hm. that seems suspicious to me:

[karpet@pekmac:~/tmp/swishtitle]$ cat doc.html
<html>
 <head>
  <meta name="foo" content="bar" />
  <title>ima doc</title>
 </head>
 <body>
  content here
 </body>
</html>

[karpet@pekmac:~/tmp/swishtitle]$ swish-e -w content
# SWISH format: 2.5.8
# Search words: content
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.008 seconds
1000 doc.html "ima doc" 126
.


I would be more inclined to guess that there were one or more *particular* meta
tags that were throwing off your indexing process.

Can you create a small, reproduce-able test case? That would help a lot.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Jul 19 20:36:38 2010