Re: [swish-e] incremental mode: problems with Remove

From: Peter Karman <peter(at)>
Date: Wed Oct 31 2007 - 03:20:08 GMT
Judith Retief wrote on 10/30/07 7:59 AM:

> When I dump my XML data to files, and I use the file name for Path-Name in
> stead of the GUID both when I create and udpate the index as well as for the
> removal, the file is removed correctly. 
> Does the removal functionality only work when files are indexed, as opposed
> to data passed to swish-e via a pipe? So you can't remove files from an
> index if you use -S prog stdin?

I believe you are describing a known issue with trying to remove a document that
does not exist on disk.

> Another niggly; even in the case where the removal is working (ie when I'm
> indexing files) I see that the words appearing only in the removed file
> still shows when doing
>     swish-e -k
> Searching for those words don't bring back any results, so the file was
> removed. But I would have thought that the indexed words would also be
> removed if there are no files referring to them? Not that this makes a huge
> difference, I'm just worried about the index files growing too large over
> time.

yes, I believe the doc and its words are simply flagged as deleted and not
actually removed from the index. I'm not sure if they are actually removed when
doing a -M merge but I seem to recall that being true.

you are hitting all the known issues with the 2.4 incremental version, and
discovering why it is still labeled 'experimental'. :/

You may want to try out the 2.6 branch as I suggested earlier and see if the
problems are fixed there. The chances that anyone will fix the 2.4 issues is
much less now that the 2.6 version is in development, since that uses the more
stable Berkeley DB instead of the 2.4 home-grown index format.

Peter Karman  .  .  peter(at)
Received on Tue Oct 30 23:20:07 2007