Skip to main content.
home | support | download

Back to List Archive

some 2.4.3 -> 2.4.4 weirdness

From: brad miele <bmiele(at)not-real.ipnstock.com>
Date: Wed Nov 08 2006 - 15:50:54 GMT
Hi,

Just wanted to mention some weirdness that we are seing since we upgraded 
from 2.4.3 to 2.4.4 and solicite any thoughts on how to go about tracking 
it down.

Essentially, the issue is that certain seraches seem to fail against our 
full index, yet they work on smaller indexes using the exact same config 
options.

here is my meandering attempt at showing the involved stuff.

first, the searches,
- search against full index:
bwayipn02# swish-e -w "Corey Rich" -f indexes/master_ipn.index
# SWISH format: 2.4.4
# Search words: Corey Rich
# Removed stopwords:
err: no results
.

- search against smaller test index:
bwayipn02# swish-e -w "Corey Rich" -f test.index
# SWISH format: 2.4.4
# Search words: Corey Rich
# Removed stopwords:
# Number of hits: 54
# Search time: 0.001 seconds
# Run time: 0.009 seconds


next, index stats,
- full index headers:

bwayipn02# swish-e -T INDEX_HEADER -f indexes/master_ipn.index
# Name:
# Saved as: master_ipn.index
# Total Words: 2455059
# Total Files: 1011839
# Removed Files: 0
# Total Word Pos: 200275309
# Removed Word Pos: 0
# Indexed on: 2006-11-08 08:37:40 EST
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: -.0123456789_abcdefghijklmnopqrstuvwxyz
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: -.
# IgnoreLastChar: -.
# StopWords: although their but than that an as at a be whether by should 
had has where with do moreover them then they could during were there this 
if in must these between is it about however here our was either also does 
no which only of on more each into already therefore most because what 
since other always for from to its thus whose within see have we though 
after when the while having through all seen further those both been any 
would will among such are
# BuzzWords:
# Stemming Applied: 1
# Soundex Applied: 0
# Fuzzy Mode: Stemming_en
# IgnoreTotalWordCountWhenRanking: 0

- small test index headers:

bwayipn02# swish-e -T INDEX_HEADER -f test.index
# Name:
# Saved as: test.index
# Total Words: 596
# Total Files: 54
# Removed Files: 0
# Total Word Pos: 12067
# Removed Word Pos: 0
# Indexed on: 2006-11-08 09:39:00 EST
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: -.0123456789_abcdefghijklmnopqrstuvwxyz
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar: -.
# IgnoreLastChar: -.
# StopWords: although their but than that an as at a be whether by should 
had has where with do moreover them then they could during were there this 
if in must these between is it about however here our was either also does 
no which only of on more each into already therefore most because what 
since other always for from to its thus whose within see have we though 
after when the while having through all seen further those both been any 
would will among such are
# BuzzWords:
# Stemming Applied: 1
# Soundex Applied: 0
# Fuzzy Mode: Stemming_no
# IgnoreTotalWordCountWhenRanking: 0

next, conf files

- full index conf:
IncludeConfigFile /usr/local/indexing/conf/site.config
FuzzyIndexingMode Stemming_en2
IndexFile /usr/local/indexing/indexes/master_ipn.index
IndexDir /usr/local/indexing/IDX
ParserWarnLevel 0

- small test index conf:
IncludeConfigFile /usr/local/indexing/conf/site.config
FuzzyIndexingMode Stemming_en2
IndexFile /usr/local/indexing/test.index
IndexDir /usr/local/indexing/test
ParserWarnLevel 0

- shared site.config
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-_
IgnoreFirstChar .-
IgnoreLastChar  .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
IndexReport 2
TmpDir /usr/tmp

IgnoreTotalWordCountWhenRanking no

IndexComments 0
BumpPositionCounterCharacters |.
DefaultContents XML*

IgnoreWords File: /usr/local/indexing/stopwords.txt
MetaNameAlias swishdefault searchable
MetaNames sphotogs rmrftype photographer sort_date qphotographer 
ipn_ignore_keys siteowner date_shot
released crop profile keywords short_caption id orig_id subject altkeys 
location_state location_country
location_city short_keys ipn_keys similar_to prime_to
UndefinedMetaTags index

PropertyNamesDate sort_date
PropertyNamesNumeric weight
PropertyNames id photographer subject released orig_id date_shot 
image_restrictions siteowner short_caption
altkeys file_size profile hasvideo hasaudio rmrftype lookup agent_name 
adweight sportsweight newsweight
travelweight celebrityweight scienceweight
PreSortedIndex id weight adweight sportsweight newsweight travelweight 
celebrityweight scienceweight
orig_id date_shot sort_date profile

MetaNamesRank 10 subject
MetaNamesRank 10 ipn_keys
MetaNamesRank 5 keywords

finally, an example of an xml file that should be found:

http://tools.ipnstock.com/8284600079.xml

we are currently messing with downgrading to 2.4.3 and retesting, but it 
will take a while to rebuild the full index, so in the meantime, any 
advice is welcome.

regards,

Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com
Received on Wed Nov 8 07:50:55 2006