Skip to main content.
home | support | download

Back to List Archive

Re: Problem swish-e not finding words present in index

From: <moseley(at)not-real.hank.org>
Date: Wed Sep 03 2003 - 18:33:06 GMT
On Wed, Sep 03, 2003 at 08:11:45AM -0700, John P. Rouillard wrote:

> Hmm, 10 is swishtitle. Wierd. I wonder why its not showing up under
> swishdefault since swishtitle should be in swishdefault should mirror
> each other right?

Yes, title words get indexed as swishdefault metaname with the TITLE 
flag set:

moseley@bumby:~$ cat 1.html
<html>
<head><title>Title</title>
</head>
<body>
body
</body>
</html>

moseley@bumby:~$ cat c
metanames swishtitle

moseley@bumby:~$ swish-e -i 1.html -c c -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'title'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishtitle(10)]   'title'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'body'   Pos:5  Stuct:0x9 ( BODY FILE )


> What is wierd is that I am seeing this on two other indexes as
> well. In one case its indexed under metaname id 11 that is also the
> swishtitle. This is wierd. I am spidering for the other two indexes,
> and the hypermail program is producing valid HTML, but its not being
> indexed under swishdefault.

The trick is to find the single document where it's not working and then 
index that by itself and narrow down the document and the config until 
you see where it's breaking.

> I have tried:
> 
> % /tools/swish_e-2.4.0_pr1/share/doc/swish-e/examples/prog-bin/\
>   index_hypermail.pl  /data/www/mailing-lists/admin/0016.html > test.html
> 
> % /tools/swish_e-2.4.0_pr1/bin/swish-e -i test.html -T indexed_words

But you are not using your config file there.  If you specify the config 
file does it still work?

> 
>   Indexing Data Source: "File-System"
>   Indexing "test.html"
>   ...
>     Adding:[1:swishdefault(1)]   'guest'   Pos:172  Stuct:0x9 ( BODY FILE )
>     Adding:[1:swishdefault(1)]   'guest'   Pos:206  Stuct:0x9 ( BODY FILE )
>     Adding:[1:swishdefault(1)]   'guest'   Pos:235  Stuct:0x9 ( BODY FILE )
>     Adding:[1:swishdefault(1)]   'guest'   Pos:245  Stuct:0x9 ( BODY FILE )

> Which shows that guest is swishdefault.
> 
>   % /tools/swish_e-2.4.0_pr1/bin/swish-e -w guest
>   # SWISH format: 2.4.0-pr1
>   # Search words: guest
>   # Removed stopwords: 
>   # Number of hits: 1
>   # Search time: 0.001 seconds
>   # Run time: 0.023 seconds
>   1000 test.html "TWiki security setup." 3085
> 
> So the simple test case works. Doing a guest search on the entire 
> directory tree returns no hits, my config file is:

So if you index the document by itself it works but as part of the full 
indexing it doesn't?

I'd do the brute force method of just writing all of
index_hypermail.pl's output to a file, index that and confirm that it
doesn't work.  Then just divide up that large file until I find the 
reason why it's not working.

-- 
Bill Moseley
moseley@hank.org
Received on Wed Sep 3 18:33:27 2003