Re: [swish-e] Title and Description URL is wrong

From: Ronny Rahardjo
Date: Fri Sep 18 2009
Sorry for being unclear. We are running on Windows Server 2003, and the
swish is on version 2 of July 2008.

Here is our problem:
Couple days ago we made some changes to our website which one of them is
using javascripts tabs to display information under one page. We backed up
everything including the indexing database .idx file for swish. When we use
the old .idx file, the search worked just fine with the new website, beside
obviously the link will not be correct anymore (because we change some of
the directory structure as well).

However, when we use the latest .idx file (we run a scheduled job everyday
to reindex the site), the search result is showing duplicate result and some
of the url is just wrong such as: (with extra '/')

So I assume that the problem is on the indexing configuration.


# Include our site-wide configuration settings:
IncludeConfigFile common.config

# Specify the program to run
#IndexDir output.txt
IndexFile d:/htdocs/www2/cgi-bin/indexdb.idx
SwishProgParameters default

ParserWarnLevel 1

# Tell swish that about how to parse the content
DefaultContents HTML
IndexContents HTML .htm .html .php
IndexContents TXT .txt .conf
StoreDescription HTML* <body> 1000


# These settings tell swish what defines a word.
# We only index words that include letters, numbers, a dash,
# or a period.  (Not very realistic)
# These are the characters that are allowed in a "word".
# i.e. words are split on any character NOT found in WordCharacters
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-

# We allow a period and a dash within words, but strip them
# from the beginning or end of a word.  This is done after
# WordCharacters above is used to split words.
IgnoreFirstChar .-
IgnoreLastChar  .-
# Finally, resulting words must begin/end with one
# of the characters listed here
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789

# Turn this on for a slight performance improvement
#FollowSymLinks yes

# This is how detailed you want reporting. You can specify numbers
# 0 to 3 - 0 is totally silent, 3 is the most verbose.
# 4 is debugging.  Can be overridden with -v on the command line
IndexReport 1

# Set the stopwords (words to ignore when searching and when indexing)
# Carefully think about this feature before using a list of stopwords
# You can list the words here:
#  IgnoreWords of or and the a to
# Or you can use the compiled in defaults:
#  IgnoreWords SwishDefault
# Or you can use a file that includes your own words:
IgnoreWords file: stopwords/english.txt

# Since we are using such a restrictive WordCharacters settings, we
# want to map eight-bit characters to ascii.
# For example, "resumé" will be indexed and searched as "resume".
# See docs for more info.
TranslateCharacters :ascii7:

# We don't want pharse searches to work across sentenses, plus
# we use the pipe "|" to force a break in phrases when indexing.
BumpPositionCounterCharacters |.

For, we use the default file when we downloaded swish-e.


Received on Fri Sep 18 02:20:23 2009