Skip to main content.
home | support | download

Back to List Archive

Re: Can Swish-e index 20 million urls? What is the

From: James Hemphill <hemphill(at)not-real.iotech.net>
Date: Fri Sep 30 2005 - 06:29:34 GMT
We are using the spider to index our records. We break the indexes apart 
to increase the speed of searches. We have tried building larger indexes 
with 8 million records per file but the memory usage and return 
times on searches were prohibitive. The indexing memory and speed both 
seem to be fine, building an index doesn't seem to add much load or 
memory, just a lot of disk io.

On Thu, 29 Sep 2005, Peter Karman wrote:

> Thanks, James. Can you tell us whether that was speed/memory for indexing or
> searching? (I'm assuming indexing, but just wanted to make sure.) Also, are you
> indexing via filesystem or spider (-S prog)?
>
> James Hemphill scribbled on 9/28/05 10:59 PM:
>
>>
>> For what it's worth. At Biblio.com we have around 25 million records
>> indexed using swish-e. The only kludge involved is that we do have to
>> break the index files into diffrent files. We found that putting 2
>> million listings in each index file was the optimum speed/memory usage
>> point for swish.
>>
>> James Hemphill
>>
>> On Wed, 28 Sep 2005, Peter Karman wrote:
>>
>>> Don't know if you've received replies offlist...
>>>
>>>
>>>> I'm a new user. I want to index 20 millions urls from one server. Is it
>>>> possible with Swish-e?
>>>
>>>
>>> Swish-e is intended for collections of a million docs or less. That
>>> said, some
>>> folks on this list have many more indexed successfully.
>>>
>>> If you try Swish-e with 20 million docs, please let us know how it
>>> goes for you.
>>>
>>>
>
>
Received on Thu Sep 29 23:29:41 2005