Skip to main content.
home | support | download

Back to List Archive

Re: Again: Indexing laaarge ammounts of data (more than 2GB index size)

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Sep 24 2003 - 17:26:08 GMT
On Wed, Sep 24, 2003 at 04:22:24AM -0700, Peter Asemann wrote:
> Hi there!
> 
> I've taken a look into the swish-e mailing-list archives and found that I
> already got on everybody's nerves in 2001 by asking if not I could have an 
> index larger than 2 GB.
> I was told that i could have multiple <2GB index files as a work-around.
> But somehow I'm not feeling comfortable with that 2GB limit.
> I'm not sure if not something happened in between and swish-e can now index
> terabytes or something...
> but in case there's still that 2 GB barrier, I'd really like to know if and
> how it could be broken.
> Is that 2 GB limit so hard-coded that it's virtually impossible to remove?

I would not say impossible.  Swish is something like 7 or 8 years old 
with twice that number of people hacking on it over the years.  I 
suspect you would need to just go through the code line-by-line and 
clean things up as you go.  Swish-e builds on 64bit Alphas, but I've 
never tested with large file support and anything that needed more than 
32 bits.  I suspect that signed integer overflow might be the tricky 
part.

While you are in there you could convert to UTF-8, too.

Might be easer it rewrite from scratch.


-- 
Bill Moseley
moseley@hank.org
Received on Wed Sep 24 17:26:45 2003