Dear Swish users,
I am new at using this indexing tool - still trying to
set it up. I ran into a snag that may be my
incompetence or a bug.
I want to use DirTree.pl but noticed it was not
parsing PDFs. I have xpdf etc installed. However, in
piping mode, it works fine.
I made a sample PDF file using soffice. pdftotext
parses it fine. Using the pipe command:
./DirTree.pl swish_test.pdf | swish-e -i stdin -S
prog
works like a charm (see complete output below). But
with my simple 3 line config file, it fails
# use spider for the web pages
#
IndexDir ./DirTree.pl
SwishProgParameters ./swish_test.pdf
# end of the config tile
See complete file below - no keywords are found and it
complains about a damaged PDF. What is going ? I am
happy to provide more files/example etc.
Much appreciated
Gertjan
Using swish 2.4.3 on Kubuntu 6.06
TRY 2: USING PIPE
gertjan-laptop:~R/tmp/swish_test> ./DirTree.pl
swish_test.pdf | swish-e -i stdin -S prog
Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 33 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
33 unique words indexed.
4 properties sorted.
1 file indexed. 564 total bytes. 38 total words.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!
TRY 1: USING CONFIG FILE
gertjan-laptop:~/tmp/swish_test> swish-e -S prog -c
swish_file.conf
Indexing Data Source: "External-Program"
Indexing "./DirTree.pl"
External Program found: ./DirTree.pl
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Received on Thu Jun 29 22:00:59 2006