On Wed, Mar 10, 2004 at 02:38:31PM -0800, Kevin Lewandowski wrote:
> Hello, I'm testing Swish-e on three html files saved on my local disk.
> Using the following config:
>
> DefaultContents HTML2
> IndexDir /somedir/
> IndexFile /somedir/
> StoreDescription HTML2 <body> 20000
>
> Indexing and searching works okay but with the config above the
> swishdescription field is not stored. But if I change the document type
> to "HTML" (in lines 1 and 4), it now stores the swishdescription. But
> now I'm able search against text which I've tried to prevent using the
> <!-- index --> and <!-- noindex --> tags (previously search would not
> find this when using the HTML2 type). Any ideas on what I'm doing wrong?
The index/noindex thing only works with the libxml2 parser. I can't
explain why it's not storing the description. Post a complete example
and I can try.
> Also, is it possible to store the swishdescription with the <!--
> noindex --> sections removed? Right now it stores the entire document
> text.
Like this?
moseley@bumby:~$ cat c
DefaultContents HTML2
StoreDescription HTML2 <body> 20000
moseley@bumby:~$ cat 1.html
<html>
<head><title>titleword</title></head>
<body>
top
<!-- noindex -->
dontindexthis
<!-- index -->
bottom
</body>
</html>
moseley@bumby:~$ swish-e -c c -i 1.html -T indexed_words properties -v0
Adding:[1:swishdefault(1)] 'titleword' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'top' Pos:5 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'bottom' Pos:6 Stuct:0x9 ( BODY FILE )
swishdocpath: 6 ( 6) S: "1.html"
swishtitle: 7 ( 9) S: "titleword"
swishdocsize: 8 ( 4) N: "126"
swishlastmodified: 9 ( 4) D: "2004-03-10 20:41:23 PST"
swishdescription:10 ( 10) S: "top bottom"
--
Bill Moseley
moseley@hank.org
Received on Wed Mar 10 22:22:37 2004