Skip to main content.
home | support | download

Back to List Archive

[Fwd: Re: Re: Re: Swish-E with incremental index building]

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Sun Dec 05 2004 - 02:41:52 GMT
forgot to cc: the list...


I did a test, just to prove to myself that I understood it. I agree that
the docs need to be updated. The big thing: you can't do both -r and -u
at the same time.

here's my experience:

to create the initial index:

swish-e -f index.idx -c swish-e.config

to update files that have changed:

swish-e -u -f index.idx -c swish-e.config

to remove specific files:

swish-e -r -i filetoremove -f index.idx -c swish-e.config

NOTE that I think the IndexDir directive will probably screw things up
when trying to remove files with -r, since that requires a specific
input file name. So you might want to specify your input dir with the -i
option rather than in the config file.

I did discover that the -T index_all dump option gives confusing results
on the ReadAllDocProperties vs ReadSingleDocPropertiesFromDisk routines.
You can see the difference if you do this:

swish-e -i file1.xml file2.xml
swish-e -T index_all
alter contents of file1.xml
swish-e -i file1.xml -u
swish-e -T index_all

the file numbers get incremented when a doc changes, even though the
filename is the same. also, the two read routines seem to be getting
their info in different ways.


tmuetze@alanti.net wrote on 12/4/04 6:45 PM:

> Hi all,
> has anyone real-world knowledge about using -r and -u switches on a
> build which was done with "configure --enable-incremental"? I really
> don't know how those switches really affects the work of swish-e.
> 
> My stripped down test configfile:
> IndexDir /some_dir/
> IndexOnly .txt .htm .html .doc .xls .pdf
> FileFilter .doc /usr/bin/catdoc "-s8859-1 -dcp1252 '%p'"
> FileFilter .pdf /usr/bin/pdftotext "-htmlmeta -nopgbrk '%p' -"
> IndexContents HTML .pdf
> IndexContents TXT  .doc
> 
> This is the command we issue:
> swish-e -u -r -f index.idx -c swish-e.config
> 
> Whenever we issue the command the index is rebuilded from scratch. Maybe
> I just have misunderstood what -u and -r should actually do?
> 
> When I now try the following:
> swish-e -u -r -N index.idx -f index.idx -c swish-e.config
> 
> As expected, the index isn't rebuilded because of the timestamp check.
> When I now manually remove a file and reindex, this file isn't removed
> from the index.
> 
> So please give me hint what we have done wrong because I'm lost right
> now ;-)
> 
> Best Regards,
> Tilo
> 
> Hi Peter,
> 
> Peter Karman wrote:
> 
>>I think if you do use it, you might consider yourself a "pilot 
>>tester" and let us know what you discover. :)
> 
> 
> OK, we already have installed the latest build on one of our
> test-machines, so let's see how it really works in practice.
> 
> 
>>Even though the incremental feature has been available for a couple 
>>release cycles (including the soon-to-be-announced 2.4.3), it really 
>>needs more real-world exposure before the 'experimental' label is removed.
>>
>>So try it out, stress it, see if it breaks. The more people who do that, 
>>the closer we can collectively come to calling it 'stable'.
> 
> 
> I will report my findings back to the list.
> 
> Regards,
> Tilo

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com



-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Sat Dec 4 18:41:53 2004