On 09/14/2007 04:15 PM, William M Conlon wrote:
> The indexing process is not completing, hence the temp files.
> Take a look at the indexer output.
> On Sep 14, 2007, at 2:03 PM, Parker, Peter A CONTRACTOR WRAIR-Wash DC
>> I have recently completed installation of Swish-e on an apache server
>> machine with the follows details:
>> Swish-e version: 2.4.5
>> Apache version: 2.0.52
>> I now have approximately 50 files in the directory indexed, including
>> Word, Excel and Powerpoint documents and PDFs. I have gone through the
>> steps outlined for indexing non-text file. Initially, when there were
>> only about 7 files in the html directory the indexing worked fine and
>> command line searches worked flawlessly. Now after adding more
>> files to
>> the directory (about 50 files), the indexing is not working as it was.
My guess is one of the filter helper programs (pdftotext, catdoc, etc) are
choking the indexer and not delivering all the content you expect. Encodings
are often an issue; there are others.
>> FileFilter .pdf share/doc/swish-e/examples/filter-bin/_pdf2html.pl
Try running that pdf2html script by itself on some docs.
Also, I don't see any FileFilter lines for .doc, .ppt etc. You might want to
try DirTree.pl script instead, since it has all the filtering stuff work with
SWISH::Filter instead of FileFilter config opts.
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Mon Sep 17 12:26:42 2007