Wow, I installed version 2.1 and did the same command
i.e.
/export2/is/search/swish-e -i /export2/is/cerner -c
/export2/is/search/user_cerner.config
and the index was built in 2 minutes :-)
I guess I was out of memory...
The -e option did not make it faster in my scenario.
thanks!
Greg
Bill Moseley wrote:
>
> At 07:44 PM 05/31/01 -0700, Greg Caulton wrote:
>
> > Large, well compared to my other indexes :-)
> >
> > I wish to index a directory with 2800 word docs, of which the total
> >combined size is 720MB.
>
> I think Jose has indexed somewhere like 600,000 docs. (Is that right, Jose?).
>
> > However the indexing is getting slower and slower as the number of
> >documents indexed increases - and I believe it will run for several
> >hours before slowing to a crawl.
>
> Hard to tell without more information. Are you running out of memory when
> indexing?
>
> Swish 2.1 has a -e economy switch to use less memory, but it's currently
> unclear how much help this adds. If it keeps you from swapping then it's a
> big help.
>
> The other issue is with filters. If you are using a shell or (especially)
> a perl script with FileFilters then, yes, it can be very slow because it
> runs the script for every document.
>
> FileFilters are smarter now in that you can avoid a shell script or perl
> script with some filters and run the filter program directly. This still
> uses popen for every document so the shell is still run for every document,
> but it's still much, much, faster than running a perl script for every
> document.
>
> FileFilter .doc "/usr/local/bin/catdoc" "-s8859-1 -d8859-1 '%p'"
>
> Swish 2.1 has a new input method called "prog" where an external program
> feeds documents to swish. So the external program can be a perl script
> that runs (compiles) only one time and stays running while indexing all
> documents.
>
> This can be a very significant increase in indexing speed if you *must* use
> a perl or shell script in your processing.
>
> If you cannot avoid a shell or perl script for filtering, then you should
> probably try using the prog method. There are examples in the prog-bin
> directory of the 2.1-dev distribution. But if you are just indexing word
> docs, then try that FileFilter command first and let us know what happens.
>
> > Is it possible to merge seperate smaller indexes?
>
> Yes, but only if your problem is running low on memory. Otherwise it
> probably won't save you any time.
>
> But you must first find out if you are running out of memory while indexing.
>
> Bill Moseley
> mailto:moseley@hank.org
Received on Sat Jun 2 04:40:06 2001