On Thu, 27 Aug 1998, Lars Kellogg-Stedman <lars@bu.edu> wrote:
>
> I have a fairly large collection of email (over 100MB in 20,000 messages),
> and finding a piece of information -- esp. one several months old -- can
> be challenging. In the past I've used glimpse to index the mail, but I
> wanted to give Swish-e a try (easier search syntax, and it looked as
> though it was under more active development).
>
> I had to give up, because after 18 hours it was still churning away. On
> this same collection, glimpse took approx. 3 hours. Is swish-e's
> performance *really* this poor? Are there any faster alternative out
> there?
Lars,
I'm not sure what the problem is, but there is definately a problem in
your setup. Sounds to me like it is infinitely looping somehow...
I have one mailing list which I archive which roughly approximates what
you are doing. It has 19,044 messages since 1992 constituting 115Mb of
data (prior to indexing). On an Alpha 1000/300 with Digital Unix, it
takes less than two minutes to index.
Our entire Web site is a couple of gigabytes of data. I index it in
logical pieces (about 20 separate indexes), but I regenerate each index
on a daily basis using a cron script in the wee hours of the morning.
Generating all 20 or so indexes takes less than 30 minutes as near as
I can tell from the time stamps on the e-mail confirmations.
--
Craig A. Summerhill, Systems Coordinator and Program Officer
Coalition for Networked Information
21 Dupont Circle, N.W., Washington, D.C. 20036
Internet: craig@cni.org AT&Tnet (202) 296-5098
Received on Thu Aug 27 22:39:56 1998