Ok, I was just looking over the README file that is included with 2.1-dev.
The first section of the README is titled "What is Swish-e?"
I'd like to have a clear description of swish, something that helps people
decide if swish is the right program to use. Nothing I dislike more than
finding some software and after reading the introduction still not having a
clear idea what it really does.
Swish is not always the best choice, and I think it would be helpful to
explain this as best as we can, too.
If someone asked me, I'd say that swish-e is more of a tool than an
application. Besides being very fast at indexing and searching, I think
its strong points are the ability to control the input and output to a fine
degree. In other words, it's very customizable, and can be integrated into
an existing web site design.
For example, the soon-to-be-released new mod_perl site contains quite a bit
of documentation. The docs are quite long pages so the search results were
not that useful. So adding two small chunks of code the *config* file for
the spider we were able to make it split the pages into sections and index
them separately. So now search results point to the specific section of a
page, not just a page.
Swish-e's big weaknesses are lack of incremental indexing (often made up by
indexing speed), limit of eight bit characters, and lack of a turn-key
setup (hence "tool" instead of "application").
Anyway, over the last couple of months or so a few people have written me
saying that they had selected swish after reviewing a number of options.
It might be helpful to others to describe what factors made swish stand
out. Or not.
I made a few minor changes in the README, so this is different from what's
currently on-line.
What is Swish-e?
Swish-e is Simple Web Indexing System for Humans -
Enhanced. Swish-e can quickly and easily index
directories of files or remote web sites and search the
generated indexes.
Swish-e it extremely fast in both indexing and searching,
highly configurable, and can be seamlessly integrated with
existing web sites to maintain a consistent design.
Swish-e can index web pages, but can just as easily index
text files, mailing list archives, or data sorted in a
relational database.
Swish-e is an Open Source program supported by developers
and a large group of users. Please take time to join the
Swish-e discussion list at http://Swish-e.org.
Basically, what I'm asking is what *basic* information would have been
helpful to you when you first were learning about (or deciding to use)
swish. Unfortunately, I doubt there's many people on the list that decided
not to use swish, as that would be very good feedback, too.
I said *basic* because I'm not talking about a better description of some
config directive or how to implement the search script. Just basic,
initial info.
There's also a "Features" section of the README doc. I've included it
below. Please let me know if there's anything missing or you feel is
incorrect or misleading.
Thanks!
I hope people appreciate my use of "easily"...
Key features
* Quickly index a large number of documents in different formats
including text, HTML, and XML.
* Use "filters" to index other types of files such as PDF, gzip, or
Postscript.
* Includes a web spider for indexing remote documents over HTTP.
Follows Robots Exclusion Rules (including <META> tags).
* Use an external program to supply documents to Swish-e, such as an
advanced spider for your web server, or a program to read and format
records from a relational database management system (RDBMS).
* Document "properties" (some subset of the source document, usually
defined as a META or XML elements) may be stored in the index and
returned with search results
* Document summaries can be returned with each search
* Word stemming and soundex indexing
* Phrase searching and wildcard searching
* Limit searches to HTML links
* Use powerful Regular Expressions to select documents for indexing
* Easily limit searches to parts or all of your web site
* Results can be sorted by relevance or by any number of properties in
ascending or descending order
* Limit searches to parts of documents such as certain HTML tags
(META, TITLE, comments, etc.) or to XML elements.
* Can report structural errors in your XML and HTML documents
* Includes example search scripts
* Swish-e is fast.
* It's open source and FREE! You can customize Swish-e and you can
contribute your fancy new features to the project.
* Supported by on-line user and developer groups
--
Bill Moseley
mailto:moseley@hank.org
Received on Sun Apr 14 23:08:31 2002