Skip to main content.
home | support | download

Back to List Archive

help: inconsistant indexing on SWISH-E

From: David Richard <david(at)not-real.monkey.com>
Date: Wed Nov 25 1998 - 18:42:43 GMT
Not so much a bug report as a question of implementation...

We are using swish-e/wwwwais for indexing and search on our web site on
Best.com.  It was quite easy to get the tools working (woohoo!) but we are
seeing some inconsistancies between two indices that we generated.  We have
two site that are nearly identical in content but differ in format
(text-only vs. frames).  You can look at these versions from
www.monkey.com/lobby.htm (if this does not work than we have gone live and
you can see them from www.monkey.com)...

When we do a search on the command line (from telnet), swish returns
different things.  For example, 

On our text site, TX1

shell5: {%25} ./swish -w "affordance*" -t Bthe -f source.swish
# SWISH format 1.1
# Search words: affordance*
# Name: Index of MONKEYmedia text web site
# Saved as: source.swish
# Counts: 2630 words, 34 files
# Indexed on: 24/11/98 15:35:28 PST
# Description: This is a full index of http://www.user.com/TX1/
# Pointer: http://www.user.com/
# Maintained by: David Richard (david@monkey.com)
1000 /home/user/public_html/TX1/reference/devchar/devchar.htm "MONKEYmedia
- REFERENCE: Interaction Device Characteristics" 12440
1000 /home/user/public_html/TX1/technology/devchar/devchar.htm "MONKEYmedia
- TECHNOLOGY: Interaction Device Characteristics" 12445
..

On our frames site FRC, however,

shell5: {%28} ./swish -w 'affordance*' -t Bthe -f source.swish
# SWISH format 1.1
# Search words: affordance*
# Name: Index of My web site
# Saved as: source.swish
# Counts: 2756 words, 58 files
# Indexed on: 24/11/98 15:34:49 PST
# Description: This is a full index of my web site.
# Pointer: http://www.user.com/
# Maintained by: David Richard (david@monkey.com)
1000 /home/user/public_html/FRC/technology/devchar/devchar_popup.htm
"MONKEYmedia - TECHNOLOGY: Device Characteristics: Definitions" 12692
909 /home/user/public_html/FRC/reference/devchar/devchar_popup.htm
"MONKEYmedia - REFERENCE: Device Characteristics: Definitions" 14698
39 /home/user/public_html/FRC/reference/styles/styles_text.htm "MONKEYmedia
- REFERENCE: Five Styles of Interaction: Text" 16940
..

The content in the third (and unmatched) hit in FRC (Five Styles of
Interaction) also exists identically in TX1.  The only difference between
the two sites is that this page containing the content 'affordance' in FRC
is in a frameset.

Why in the world is the same '-t Bthe' switch on the same content
(essentially) returning to different results?
BTW, the reason that we choose to use the switch is that we have content in
the <meta> tags that we don't want returned in queries...

help -
david
_____________________________________________________
David S. Richard           <mailto:david@monkey.com>

Information Architect
MONKEYmedia - Austin, Texas
(512) 440-8000 x.14, 440-1050 fax
<http://www.monkey.com/>
Received on Wed Nov 25 10:44:27 1998