Hi,
Just wanted to mention some weirdness that we are seing since we upgraded
from 2.4.3 to 2.4.4 and solicite any thoughts on how to go about tracking
it down.
Essentially, the issue is that certain seraches seem to fail against our
full index, yet they work on smaller indexes using the exact same config
options.
here is my meandering attempt at showing the involved stuff.
first, the searches,
- search against full index:
bwayipn02# swish-e -w "Corey Rich" -f indexes/master_ipn.index
# SWISH format: 2.4.4
# Search words: Corey Rich
# Removed stopwords:
err: no results
.
- search against smaller test index:
bwayipn02# swish-e -w "Corey Rich" -f test.index
# SWISH format: 2.4.4
# Search words: Corey Rich
# Removed stopwords:
# Number of hits: 54
# Search time: 0.001 seconds
# Run time: 0.009 seconds
next, index stats,
- full index headers:
bwayipn02# swish-e -T INDEX_HEADER -f indexes/master_ipn.index
# Name:
# Saved as: master_ipn.index
# Total Words: 2455059
# Total Files: 1011839
# Removed Files: 0
# Total Word Pos: 200275309
# Removed Word Pos: 0
# Indexed on: 2006-11-08 08:37:40 EST
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: -.0123456789_abcdefghijklmnopqrstuvwxyz
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: -.
# IgnoreLastChar: -.
# StopWords: although their but than that an as at a be whether by should
had has where with do moreover them then they could during were there this
if in must these between is it about however here our was either also does
no which only of on more each into already therefore most because what
since other always for from to its thus whose within see have we though
after when the while having through all seen further those both been any
would will among such are
# BuzzWords:
# Stemming Applied: 1
# Soundex Applied: 0
# Fuzzy Mode: Stemming_en
# IgnoreTotalWordCountWhenRanking: 0
- small test index headers:
bwayipn02# swish-e -T INDEX_HEADER -f test.index
# Name:
# Saved as: test.index
# Total Words: 596
# Total Files: 54
# Removed Files: 0
# Total Word Pos: 12067
# Removed Word Pos: 0
# Indexed on: 2006-11-08 09:39:00 EST
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: -.0123456789_abcdefghijklmnopqrstuvwxyz
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: -.
# IgnoreLastChar: -.
# StopWords: although their but than that an as at a be whether by should
had has where with do moreover them then they could during were there this
if in must these between is it about however here our was either also does
no which only of on more each into already therefore most because what
since other always for from to its thus whose within see have we though
after when the while having through all seen further those both been any
would will among such are
# BuzzWords:
# Stemming Applied: 1
# Soundex Applied: 0
# Fuzzy Mode: Stemming_no
# IgnoreTotalWordCountWhenRanking: 0
next, conf files
- full index conf:
IncludeConfigFile /usr/local/indexing/conf/site.config
FuzzyIndexingMode Stemming_en2
IndexFile /usr/local/indexing/indexes/master_ipn.index
IndexDir /usr/local/indexing/IDX
ParserWarnLevel 0
- small test index conf:
IncludeConfigFile /usr/local/indexing/conf/site.config
FuzzyIndexingMode Stemming_en2
IndexFile /usr/local/indexing/test.index
IndexDir /usr/local/indexing/test
ParserWarnLevel 0
- shared site.config
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-_
IgnoreFirstChar .-
IgnoreLastChar .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters abcdefghijklmnopqrstuvwxyz0123456789
IndexReport 2
TmpDir /usr/tmp
IgnoreTotalWordCountWhenRanking no
IndexComments 0
BumpPositionCounterCharacters |.
DefaultContents XML*
IgnoreWords File: /usr/local/indexing/stopwords.txt
MetaNameAlias swishdefault searchable
MetaNames sphotogs rmrftype photographer sort_date qphotographer
ipn_ignore_keys siteowner date_shot
released crop profile keywords short_caption id orig_id subject altkeys
location_state location_country
location_city short_keys ipn_keys similar_to prime_to
UndefinedMetaTags index
PropertyNamesDate sort_date
PropertyNamesNumeric weight
PropertyNames id photographer subject released orig_id date_shot
image_restrictions siteowner short_caption
altkeys file_size profile hasvideo hasaudio rmrftype lookup agent_name
adweight sportsweight newsweight
travelweight celebrityweight scienceweight
PreSortedIndex id weight adweight sportsweight newsweight travelweight
celebrityweight scienceweight
orig_id date_shot sort_date profile
MetaNamesRank 10 subject
MetaNamesRank 10 ipn_keys
MetaNamesRank 5 keywords
finally, an example of an xml file that should be found:
http://tools.ipnstock.com/8284600079.xml
we are currently messing with downgrading to 2.4.3 and retesting, but it
will take a while to rebuild the full index, so in the meantime, any
advice is welcome.
regards,
Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com
Received on Wed Nov 8 07:50:55 2006