From: Gentile, Jeff <GentileJ(at)>
Date: Thu Feb 13 2003 - 14:42:30 GMT
[First time posting here]


I am using SWISH to search a knowledge base (read: text files) for my support department that has a cgi/perl front end... 
all html is within the script. The main page is a image-mapped flow chart, each box leading to a "leaf" page pertaining to 
that (sub)category. Each leaf has various description fields that are associated with the category by filename. I also have 
each "tech note" associated with the applicable categories via a "header" that looks something like this:


Parent, Parent/Child, Parent/Child/GrandChild, Parent2/Child2


My script parses the header from each technote whenever a category page is hit, and displays only those with a matching
category. This allows me to maintain the name of the technote, and still associate. Also, I have added a feature whereby
normally a tech note name entered through the form interface for edits/adds is appended with a ".txt" extension, however,
if the filename entered ends in "htm" or "html", I change the extension to that, and display the note as html, allowing folks
to have their tech notes as html (otherwise I display the regular text tech notes in a "read-only" text area)


I am trying to get SWISH to Ignore the header (first 5 lines) of the tech notes. However, even if there was a feature that was
the reverse of "TruncateDocSize" to allow me to skip the first 5 lines, that wouldn't work, because of the "description" files
that do not have this header and are associated by name.

I've tried using "Ignore Words" however, this doesn't appear to work out for me, since "start" is a word I want indexed, and
when I use say "#start_hdr" as an Ignore word, it is not ignored, to say nothing for something like "Parent/Child" - which
would have worked out fine, since I could have simply listed the finite list of Categories in an "ignorewords" file, and then
if a "child" was added, simply had the script update this file.

Does anyone have any ideas as to how I can get swish to Ignore these headers in this specific subset of files?

This is running on RedHat 8, and the technotes are in their own directory structure... 

I know that I could try and re-architect the tech note association method that I've devised, however, I am trying to avoid that.


