Skip to main content.
home | support | download

Back to List Archive

DirTree works in pipe but not config file on PDF

From: Gertjan Hofman <gertjan_hofman(at)>
Date: Fri Jun 30 2006 - 05:00:48 GMT
Dear Swish users,

I am new at using this indexing tool - still trying to
set it up. I ran into a snag that may be my
incompetence or a bug.
I want to use but noticed it was not
parsing PDFs. I have xpdf etc installed.  However, in
piping mode, it works fine.

I made a sample PDF file using soffice. pdftotext
parses it fine.  Using the pipe command:
 ./ swish_test.pdf | swish-e -i stdin -S

works like a charm (see complete output below). But
with my simple 3 line config file, it fails

# use spider for the web pages
IndexDir ./

SwishProgParameters ./swish_test.pdf

# end of the config tile

See complete file below - no keywords are found and it
complains about a damaged PDF. What is going ? I am
happy to provide more files/example etc.

Much appreciated

 Using swish 2.4.3 on Kubuntu 6.06

gertjan-laptop:~R/tmp/swish_test> ./
swish_test.pdf | swish-e -i stdin -S prog

Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 33 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
33 unique words indexed.
4 properties sorted.                                  
1 file indexed.  564 total bytes.  38 total words.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!


gertjan-laptop:~/tmp/swish_test> swish-e -S prog -c
Indexing Data Source: "External-Program"
Indexing "./"
External Program found: ./
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
Received on Thu Jun 29 22:00:59 2006