Skip to main content.
home | support | download

Back to List Archive

Re: Request for comments on new project

From: Michael Peters <mpeters(at)not-real.plusthree.com>
Date: Thu Jan 06 2005 - 15:46:23 GMT
Dave Seff wrote:
> I have a few reasons why I am doing this:
> 
> 1. A personal learning exercise. 

A very good reason in and of itself :)

> 2. I wanted to write it in C. I wanted to make it as fast as possible.
> Not to say that mod_perl/apache is not, but I did not want the added
> overhead. My company warehouses hundreds of millions of documents for
> the insurance industry and they need to be able to search quickly
> throught them. 

I didn't necessarily mean that it should be written in Perl. I'm 
assuming you would need C for performance for an application like this 
anyway. I just think it would solve a lot of issues if you built it as a 
C module into apache. Maybe mod_swishe ?

> 3. If any of you are familliar with Verity K2 sevrer, my company is
> looking for an open-source replacement for it. Meaning we have
> applications written for it and would like to rewrite as little of our
> main apps as possible. I need something that is completely transparent
> to the applications.

I'm not familiar with Verity K2 (other than what their website says) so 
I may be out of line here, but I don't see why apache would get in the 
way of being transparent. With apache2 you don't even need to use HTTP 
as the transport protocol. There are people writing POP, IMAP, SMTP 
deamons using it. You pick/write the protocol, you pick the port... you 
customize everything.

> 4. Load balancing isn't exactly the goal I am looking for here. While
> that is fine, I needed a way to take multiple responces from the swish
> results and collate them into a coherent order whether by date or
> relevance etc . . . For example if you are searching 100 indecies across
> 50 mcahines, I would normally sort them by the order on which your load
> balancer received them. That may be fine, But I wanted the results from
> the server to be transparent to the client rather then have the client
> figure out what order to tally them. This is where the cluster_mgr comes
> in. It doesn't do it yet. But that is a project goal. 

I'm sorry I misunderstood the intention of the cluster_mgr. That is a 
good idea to be able to have it mesh the results from different servers 
together into one.

I'm not trying to derail your development, just thought I would chime in 
with some other ideas. Apache is working hard to get people to realize 
the power of Apache2 since most people think it's just a web server. 
Just thought I would evangelize a little. It would make a really cool 
article though... 'Distributed Searching with mod_swishe' :)

-- 
Michael Peters
Developer
Plus Three, LP
Received on Thu Jan 6 07:46:23 2005