Skip to main content.
home | support | download

Back to List Archive

Re: diff'ing indexes

From: Peter Karman <karman(at)not-real.cray.com>
Date: Thu Oct 14 2004 - 17:48:14 GMT
hmm. very close. the only difference I can see (besides that fact that 
yours worked...damn ;) ) is that the content of the <body> tags is 
different in yours. mine were identical in <body> except that file2 had 
'some more content' and file1 'some content'.

can you see if changing the body content makes a difference for your test?

Bill Moseley wrote on 10/14/2004 12:11 PM:
> Am I reproducing correctly?
> 
> 
> moseley@bumby:/tmp/peter$ cat file?.html
> <html>
> <head>
> <title>file1</title>
> <meta name='metaA' content='foo'>
> <meta name='metaB' content='bar'>
> </head>
> 
> <body>
> some1 content1
> </body>
> </html>
> 
> <html>
> <head>
> <title>file2</title>
> <meta name='metaA' content='foo'>
> <meta name='metaB' content='bar'>
> </head>
> 
> <body>
> some2 content2
> </body>
> </html>
> 
> moseley@bumby:/tmp/peter$ cat c
> Metanames metaA metaB
> PropertyNames metaA metaB
> IgnoreTotalWordCountWhenRanking 0
> 
> 
> moseley@bumby:/tmp/peter$ cat c2
> Metanames metaB metaA
> PropertyNames metaB metaA
> IgnoreTotalWordCountWhenRanking 0
> 
> 
> moseley@bumby:/tmp/peter$ rm out.index
> 
> moseley@bumby:/tmp/peter$ swish-e -c c -i file?.html -v0
> 
> moseley@bumby:/tmp/peter$ swish-e -c c2 -i file1.html -f fileone.index -v0
> 
> moseley@bumby:/tmp/peter$ swish-e -M index.swish-e fileone.index out.index
> Input index 'index.swish-e' has 2 files and 8 words
> Input index 'fileone.index' has 1 files and 5 words
> Replaced file 'file1.html 2004-10-14 09:55:57 PDT' with 'file1.html 2004-10-14 09:55:57 PDT'
> Getting words in index 'index.swish-e':      8 words
> Getting words in index 'fileone.index':      5 words
> Processing words in index 'out.index':      8 words
> Removed      0 words no longer present in docs for index 'out.index'
> Writing main index...
> Sorting words ...
> Sorting 8 words alphabetically
> Writing header ...
> Writing index entries ...
>   Writing word text: Complete
>   Writing word hash: Complete
>   Writing word data: Complete
> 8 unique words indexed.
> 6 properties sorted.                                              
> 2 files indexed.  0 total bytes.  10 total words.
> Elapsed time: 00:00:00 CPU time: 00:00:00
> Indexing done!
> 
> 
> moseley@bumby:/tmp/peter$ swish-e -w metaa=foo -f out.index
> # SWISH format: 2.5.1
> # Search words: metaa=foo
> # Removed stopwords: 
> # Number of hits: 2
> # Search time: 0.005 seconds
> # Run time: 0.025 seconds
> 1000 file2.html "file2" 151
> 1000 file1.html "file1" 151
> .
> 

-- 
Peter Karman . http://www.cray.com/craydoc/ . karman(at)not-real.cray.com
"I love deadlines. I love the whooshing sound they make as they go by."
         - Douglas Adams
Received on Thu Oct 14 10:48:33 2004