We have looked at SWISH::Prog::DBI, but we do it this way for a couple of
reasons, the first one being that we have vendors who grab the xml files
directly. The other thing is that we massage the data so much, doing
keyword expansion, similars calculations, weighting, etc. it is just
easier to externalize it.
In any case, you are correct about ls, we are going to start breaking it
into multiple directories.
regards,
Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com
On Wed, 18 Oct 2006, Peter Karman wrote:
>
>
> Bill Moseley scribbled on 10/18/06 1:57 PM:
>> On Wed, Oct 18, 2006 at 11:03:23AM -0700, brad miele wrote:
>>> hi,
>>>
>>> we currently use fs to index our stuff, this is because we are already
>>> dumping xml for every record in our database.
>>>
>>> my question is, is it faster to break the directory up into sub
>>> directories? right now, we have one directory called IDX that has all
>>> 900K+ files in it. it takes a very long time for swish-e to move from the
>>> "Checking dir" phase to the actual indexing phase. "a very long time" is
>>> not really quantifiable right now since we generally don't see it
>>> happening and i am just noticing because i am running things manually
>>> today. it has been sitting at this stage for about 1.5 hours so far.
>>>
>>> so should i try breaking the directory up into sub directories?
>>
>> Depends on your file system. But I'd probably break it up into
>> smaller directories.
>>
>
> besides, must be next to impossible to 'ls' in that dir. ;)
>
> check out SWISH::Prog::DBI on cpan too. That lets you index directly from db via
> DBI without the intermediate XML files.
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
>
>
Received on Wed Oct 18 12:14:32 2006