Skip to main content.
home | support | download

Back to List Archive

[Fwd: Re: Re: Re: Swish-E with incremental index building]

From: Peter Karman <peter(at)>
Date: Sun Dec 05 2004 - 02:41:52 GMT
forgot to cc: the list...

I did a test, just to prove to myself that I understood it. I agree that
the docs need to be updated. The big thing: you can't do both -r and -u
at the same time.

here's my experience:

to create the initial index:

swish-e -f index.idx -c swish-e.config

to update files that have changed:

swish-e -u -f index.idx -c swish-e.config

to remove specific files:

swish-e -r -i filetoremove -f index.idx -c swish-e.config

NOTE that I think the IndexDir directive will probably screw things up
when trying to remove files with -r, since that requires a specific
input file name. So you might want to specify your input dir with the -i
option rather than in the config file.

I did discover that the -T index_all dump option gives confusing results
on the ReadAllDocProperties vs ReadSingleDocPropertiesFromDisk routines.
You can see the difference if you do this:

swish-e -i file1.xml file2.xml
swish-e -T index_all
alter contents of file1.xml
swish-e -i file1.xml -u
swish-e -T index_all

the file numbers get incremented when a doc changes, even though the
filename is the same. also, the two read routines seem to be getting
their info in different ways. wrote on 12/4/04 6:45 PM:

> Hi all,
> has anyone real-world knowledge about using -r and -u switches on a
> build which was done with "configure --enable-incremental"? I really
> don't know how those switches really affects the work of swish-e.
> My stripped down test configfile:
> IndexDir /some_dir/
> IndexOnly .txt .htm .html .doc .xls .pdf
> FileFilter .doc /usr/bin/catdoc "-s8859-1 -dcp1252 '%p'"
> FileFilter .pdf /usr/bin/pdftotext "-htmlmeta -nopgbrk '%p' -"
> IndexContents HTML .pdf
> IndexContents TXT  .doc
> This is the command we issue:
> swish-e -u -r -f index.idx -c swish-e.config
> Whenever we issue the command the index is rebuilded from scratch. Maybe
> I just have misunderstood what -u and -r should actually do?
> When I now try the following:
> swish-e -u -r -N index.idx -f index.idx -c swish-e.config
> As expected, the index isn't rebuilded because of the timestamp check.
> When I now manually remove a file and reindex, this file isn't removed
> from the index.
> So please give me hint what we have done wrong because I'm lost right
> now ;-)
> Best Regards,
> Tilo
> Hi Peter,
> Peter Karman wrote:
>>I think if you do use it, you might consider yourself a "pilot 
>>tester" and let us know what you discover. :)
> OK, we already have installed the latest build on one of our
> test-machines, so let's see how it really works in practice.
>>Even though the incremental feature has been available for a couple 
>>release cycles (including the soon-to-be-announced 2.4.3), it really 
>>needs more real-world exposure before the 'experimental' label is removed.
>>So try it out, stress it, see if it breaks. The more people who do that, 
>>the closer we can collectively come to calling it 'stable'.
> I will report my findings back to the list.
> Regards,
> Tilo

Peter Karman  .  .  peter(at)

Peter Karman  .  .  peter(at)
Received on Sat Dec 4 18:41:53 2004