Skip to main content.
home | support | download

Back to List Archive

Re: diff'ing indexes

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Oct 14 2004 - 17:11:58 GMT
Am I reproducing correctly?


moseley@bumby:/tmp/peter$ cat file?.html
<html>
<head>
<title>file1</title>
<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>
</head>

<body>
some1 content1
</body>
</html>

<html>
<head>
<title>file2</title>
<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>
</head>

<body>
some2 content2
</body>
</html>

moseley@bumby:/tmp/peter$ cat c
Metanames metaA metaB
PropertyNames metaA metaB
IgnoreTotalWordCountWhenRanking 0


moseley@bumby:/tmp/peter$ cat c2
Metanames metaB metaA
PropertyNames metaB metaA
IgnoreTotalWordCountWhenRanking 0


moseley@bumby:/tmp/peter$ rm out.index

moseley@bumby:/tmp/peter$ swish-e -c c -i file?.html -v0

moseley@bumby:/tmp/peter$ swish-e -c c2 -i file1.html -f fileone.index -v0

moseley@bumby:/tmp/peter$ swish-e -M index.swish-e fileone.index out.index
Input index 'index.swish-e' has 2 files and 8 words
Input index 'fileone.index' has 1 files and 5 words
Replaced file 'file1.html 2004-10-14 09:55:57 PDT' with 'file1.html 2004-10-14 09:55:57 PDT'
Getting words in index 'index.swish-e':      8 words
Getting words in index 'fileone.index':      5 words
Processing words in index 'out.index':      8 words
Removed      0 words no longer present in docs for index 'out.index'
Writing main index...
Sorting words ...
Sorting 8 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
8 unique words indexed.
6 properties sorted.                                              
2 files indexed.  0 total bytes.  10 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!


moseley@bumby:/tmp/peter$ swish-e -w metaa=foo -f out.index
# SWISH format: 2.5.1
# Search words: metaa=foo
# Removed stopwords: 
# Number of hits: 2
# Search time: 0.005 seconds
# Run time: 0.025 seconds
1000 file2.html "file2" 151
1000 file1.html "file1" 151
.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Oct 14 10:12:24 2004