Skip to main content.
home | support | download

Back to List Archive

Re: File-System optimization

From: Brad Miele <bmiele(at)>
Date: Wed Oct 18 2006 - 19:14:31 GMT
We have looked at SWISH::Prog::DBI, but we do it this way for a couple of 
reasons, the first one being that we have vendors who grab the xml files 
directly. The other thing is that we massage the data so much, doing 
keyword expansion, similars calculations, weighting, etc. it is just 
easier to externalize it.

In any case, you are correct about ls, we are going to start breaking it 
into multiple directories.


Brad Miele
VP Technology
866 476 7862 x902

On Wed, 18 Oct 2006, Peter Karman wrote:

> Bill Moseley scribbled on 10/18/06 1:57 PM:
>> On Wed, Oct 18, 2006 at 11:03:23AM -0700, brad miele wrote:
>>> hi,
>>> we currently use fs to index our stuff, this is because we are already
>>> dumping xml for every record in our database.
>>> my question is, is it faster to break the directory up into sub
>>> directories? right now, we have one directory called IDX that has all
>>> 900K+ files in it. it takes a very long time for swish-e to move from the
>>> "Checking dir" phase to the actual indexing phase. "a very long time" is
>>> not really quantifiable right now since we generally don't see it
>>> happening and i am just noticing because i am running things manually
>>> today. it has been sitting at this stage for about 1.5 hours so far.
>>> so should i try breaking the directory up into sub directories?
>> Depends on your file system.  But I'd probably break it up into
>> smaller directories.
> besides, must be next to impossible to 'ls' in that dir. ;)
> check out SWISH::Prog::DBI on cpan too. That lets you index directly from db via
> DBI without the intermediate XML files.
> -- 
> Peter Karman  .  .  peter(at)
Received on Wed Oct 18 12:14:32 2006