Skip to main content.
home | support | download

Back to List Archive

Re: swish

From: Philip Mak <pmak(at)not-real.aaanime.net>
Date: Thu May 31 2001 - 14:59:09 GMT
On Thu, 31 May 2001, Bill Moseley wrote:

> So to answer your question, yes, it will be easy to modify the script to
> read from an external file.  But implementing that will be, as they say,
> left to the reader.

Well, at least the files I'm searching are plain text, so I don't have to
worry about HTML tags.

> >This way, people can use your example script no matter how the original
> >document is stored (text file, Berkeley DB, MySQL, etc.).
>
> Philip, are you indexing files in a MySQL database using -S prog?  If so,
> I'd like to see the script you are using.  There isn't an example yet in
> the swish distribution for indexing from a RDBMS.

Here you go. The attached sql.pl file is used in -S prog. It calls some
utility functions from db.pl as well: query() executes a MySQL query and
returns the statement handle, and encode() changes <>& to &lt;&gt;&amp; so
that it doesn't get messed up in XML.

sql.pl executes a SELECT query on MySQL and returns each file in XML
format (these are e-mail messages being indexed):

<from_name>Philip Mak</from_name>
<from_email>pmak@aaanime.net</from_email>
<subject>Hello</subject>
<message>Hello, world!</message>

from_name, from_email, subject and message are separately searchable
metanames.

Content-Length is a tricky part; if it's even off by one, the -S prog
thing will mess up of course.

Last-Mtime is used to store the date of the e-mail message.

Path-Name is used to store the "num" of the e-mail message (the primary
key in the MySQL table for the message). When I use swish to search, I get
back a list of "num"s which I can then use to query the MySQL database to
retrieve the whole message.

-Philip Mak (pmak@aaanime.net)

Received on Thu May 31 15:03:03 2001