Bill Moseley wrote on 10/13/2004 11:05 PM:
> That is, if the order of the metanames in each index is different and
> that causes problems then that's a bug.
I was able to duplicate the behaviour I experienced. I still don't know
if it's a bug or I'm just missing something, but here it all is. I have
seen this under both 2.4.1 and the most recent CVS build, so I know it's
not just some recent change.
I haven't even looked at the source code yet. If I can get more time on
this, I will...
In sum:
I made two html files, file1.html and file2.html. Index together, just
fine. Change the config order and index one file, just fine. Merge the
two indexes together, and odd things start to happen. If I index the oen
file with the identical config file as the original two, then the merge
works as expected.
A -T index_all dump of the two merged indexes (one with identical
config, one with reversed order config) shows that the metaname numbers
are being incremented in the reverse order version. See the dumps below.
Details:
This is long. Sorry.
The files have identical meta content, but slightly different words
overall (just for comparison).
pubs@topaz08 170% cat file1.html
<html>
<head>
<title>file1</title>
<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>
</head>
<body>
some content
</body>
</html>
pubs@topaz08 171% cat file2.html
<html>
<head>
<title>file2</title>
<meta name='metaA' content='foo'>
<meta name='metaB' content='bar'>
</head>
<body>
some more content
</body>
</html>
I created a simple config:
MetaNames metaA metaB
PropertyNames metaA metaB
IgnoreTotalWordCountWhenRanking 0
Then I indexed the two files. A search works as expected:
pubs@topaz08 172% swish-e -w 'metaa=foo'
# SWISH format: 2.4.1
# Search words: metaa=foo
# Removed stopwords:
# Number of hits: 2
# Search time: 0.068 seconds
# Run time: 0.112 seconds
1000 file2.html "file2" 153
1000 file1.html "file1" 148
.
Then I created a new config, identical but with the order of the
metanames and properties reversed:
pubs@topaz08 168% cat configafter
MetaNames metaB metaA
PropertyNames metaB metaA
IgnoreTotalWordCountWhenRanking 0
I indexed just file1.html and tested the search:
pubs@topaz08 173% swish-e -w 'metaa=foo' -f fileone.index
# SWISH format: 2.4.1
# Search words: metaa=foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.082 seconds
# Run time: 0.126 seconds
1000 file1.html "file1" 148
.
All good so far.
Then I merged fileone.index and the index.swish-e:
swish-e -M fileone.index index.swish-e newmerge
Now the search does not work as expected:
pubs@topaz08 174% swish-e -w 'metaa=foo' -f newmerge
# SWISH format: 2.4.1
# Search words: metaa=foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.092 seconds
# Run time: 0.135 seconds
1000 file1.html "file1" 148
.
file2.html is missing.
Here's the dump of the merged index with the config order reversed. NOTE
that there are different meta numbers for each file, even though the
metaname (metaA and metaB) are the same:
pubs@topaz08 165% swish-e -T index_all -f newmerge
...
-----> METANAMES for newmerge <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
metab : id=10 type= 1 META_INDEX Rank Bias= 0
metaa : id=11 type= 1 META_INDEX Rank Bias= 0
metab : id=12 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
metaa : id=13 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
-----> WORD INFO in index newmerge <-----
bar
Meta:10 file1.html Freq:1 Pos/Struct:8/85
Meta:11 file2.html Freq:1 Pos/Struct:8/85
content
Meta:1 file1.html Freq:1 Pos/Struct:12/9
Meta:1 file2.html Freq:1 Pos/Struct:13/9
file1
Meta:1 file1.html Freq:1 Pos/Struct:2/7
file2
Meta:1 file2.html Freq:1 Pos/Struct:2/7
foo
Meta:10 file2.html Freq:1 Pos/Struct:5/85
Meta:11 file1.html Freq:1 Pos/Struct:5/85
more
Meta:1 file2.html Freq:1 Pos/Struct:12/9
some
Meta:1 file1.html Freq:1 Pos/Struct:11/9
Meta:1 file2.html Freq:1 Pos/Struct:11/9
-----> FILES in index newmerge <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
Dumping File Properties for File Number: 2
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
Number of File Entries: 2
pubs@topaz08 176% swish-e -T index_all -f index.swish-e
..
-----> METANAMES for index.swish-e <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
metaa : id=10 type= 1 META_INDEX Rank Bias= 0
metab : id=11 type= 1 META_INDEX Rank Bias= 0
metaa : id=12 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
metab : id=13 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
-----> WORD INFO in index index.swish-e <-----
bar
Meta:11 file1.html Freq:1 Pos/Struct:8/85
Meta:11 file2.html Freq:1 Pos/Struct:8/85
content
Meta:1 file1.html Freq:1 Pos/Struct:12/9
Meta:1 file2.html Freq:1 Pos/Struct:13/9
file1
Meta:1 file1.html Freq:1 Pos/Struct:2/7
file2
Meta:1 file2.html Freq:1 Pos/Struct:2/7
foo
Meta:10 file1.html Freq:1 Pos/Struct:5/85
Meta:10 file2.html Freq:1 Pos/Struct:5/85
more
Meta:1 file2.html Freq:1 Pos/Struct:12/9
some
Meta:1 file1.html Freq:1 Pos/Struct:11/9
Meta:1 file2.html Freq:1 Pos/Struct:11/9
-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
Dumping File Properties for File Number: 2
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
Number of File Entries: 2
pubs@topaz08 177% swish-e -T index_all -f fileone.index
..
-----> METANAMES for fileone.index <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
metab : id=10 type= 1 META_INDEX Rank Bias= 0
metaa : id=11 type= 1 META_INDEX Rank Bias= 0
metab : id=12 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
metaa : id=13 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
-----> WORD INFO in index fileone.index <-----
bar
Meta:10 file1.html Freq:1 Pos/Struct:8/85
content
Meta:1 file1.html Freq:1 Pos/Struct:12/9
file1
Meta:1 file1.html Freq:1 Pos/Struct:2/7
foo
Meta:11 file1.html Freq:1 Pos/Struct:5/85
some
Meta:1 file1.html Freq:1 Pos/Struct:11/9
-----> FILES in index fileone.index <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metab:12 ( 3) S: "bar"
metaa:13 ( 3) S: "foo"
Number of File Entries: 1
Here's the -T index_all for newmerge, when the same config was used for
both index.swish-e and fileone indexes:
-----> METANAMES for newmerge <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
metaa : id=10 type= 1 META_INDEX Rank Bias= 0
metab : id=11 type= 1 META_INDEX Rank Bias= 0
metaa : id=12 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
metab : id=13 type=70 META_PROP:STRING(case:ignore)
SortKeyLen: 100 *presorted*
-----> WORD INFO in index newmerge <-----
bar
Meta:11 file1.html Freq:1 Pos/Struct:8/85
Meta:11 file2.html Freq:1 Pos/Struct:8/85
content
Meta:1 file1.html Freq:1 Pos/Struct:12/9
Meta:1 file2.html Freq:1 Pos/Struct:13/9
file1
Meta:1 file1.html Freq:1 Pos/Struct:2/7
file2
Meta:1 file2.html Freq:1 Pos/Struct:2/7
foo
Meta:10 file1.html Freq:1 Pos/Struct:5/85
Meta:10 file2.html Freq:1 Pos/Struct:5/85
more
Meta:1 file2.html Freq:1 Pos/Struct:12/9
some
Meta:1 file1.html Freq:1 Pos/Struct:11/9
Meta:1 file2.html Freq:1 Pos/Struct:11/9
-----> FILES in index newmerge <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file1.html"
swishtitle: 7 ( 5) S: "file1"
swishdocsize: 8 ( 4) N: "148"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:12"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
Dumping File Properties for File Number: 2
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 10) S: "file2.html"
swishtitle: 7 ( 5) S: "file2"
swishdocsize: 8 ( 4) N: "153"
swishlastmodified: 9 ( 4) D: "2004-10-14 10:42:34"
metaa:12 ( 3) S: "foo"
metab:13 ( 3) S: "bar"
Number of File Entries: 2
--
Peter Karman . http://www.cray.com/craydoc/ . karman(at)not-real.cray.com
"I love deadlines. I love the whooshing sound they make as they go by."
- Douglas Adams
Received on Thu Oct 14 09:19:51 2004