Skip to main content.
home | support | download

Back to List Archive

Re: maximum size of files

From: Dave Stevens <dstevens(at)>
Date: Wed Jan 07 2004 - 09:42:28 GMT
> All released versions of SWISH-E support index files up to 2 GB.
> The index file size will depend completely on what you choose to store
> in the index.  My only advice is to test SWISH-E with your data.

Indeed.  Though my current project is more suited for Nutch, I'm still
using SWISH-E for proof of concept and for early adopter type users.  I
have a couple (of six) indices of just over a million pages total that are
near the 2GB limit and found out the hard way about the limit and how many
files can be in an index.  ;-)

Basically the more non text or non html docs (pdf, xl, doc) and the larger
the description text the bigger the file.  I worked around the file size
and crawl duration limitations (some crawls are 120 hrs plus) by
segmenting the indices by content type, sort of poor man's DMOZ style of

SWISH-E works very well in it's intended application.  I've got both the
current snap of Nutch and the 2.4.0 release of SWISH-E making the same
crawls for comparison sake.  Two vastly different tools for different

Received on Wed Jan 7 09:43:00 2004