On 09/14/2007 04:15 PM, William M Conlon wrote:
> The indexing process is not completing, hence the temp files.
>
> Take a look at the indexer output.
>
> Bill
>
>
> On Sep 14, 2007, at 2:03 PM, Parker, Peter A CONTRACTOR WRAIR-Wash DC
> wrote:
>
>> Greetings,
>> I have recently completed installation of Swish-e on an apache server
>> machine with the follows details:
>>
>> Swish-e version: 2.4.5
>> Apache version: 2.0.52
>>
>> I now have approximately 50 files in the directory indexed, including
>> Word, Excel and Powerpoint documents and PDFs. I have gone through the
>> steps outlined for indexing non-text file. Initially, when there were
>> only about 7 files in the html directory the indexing worked fine and
>> command line searches worked flawlessly. Now after adding more
>> files to
>> the directory (about 50 files), the indexing is not working as it was.
>>
My guess is one of the filter helper programs (pdftotext, catdoc, etc) are
choking the indexer and not delivering all the content you expect. Encodings
are often an issue; there are others.
>> FileFilter .pdf share/doc/swish-e/examples/filter-bin/_pdf2html.pl
Try running that pdf2html script by itself on some docs.
Also, I don't see any FileFilter lines for .doc, .ppt etc. You might want to
try DirTree.pl script instead, since it has all the filtering stuff work with
SWISH::Filter instead of FileFilter config opts.
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Sep 17 12:26:42 2007