On Tue, Dec 14, 2004 at 02:51:41AM -0800, Andrea Pasquini wrote:
> I use swish-e_2.4.2 and I've problem with the pdf files.
> After launch of $ ./swish-e -Sprog -c swish.conf this error is in the
> output and the crawler go on :
> Error: Couldn't find cidToUnicode file for the 'Adobe-WinCharSetFFFF' collection
> Error: Unknown character collection 'Adobe-WinCharSetFFFF'
> Error: Unknown font tag 'R137'
> Error: May not be a PDF file (continuing anyway)
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> Error: Couldn't find trailer dictionary
> Error: Couldn't read xref table
> http://www.di.unipi.it/sindacati/21set2004.pdf - Using HTML2 parser - (no
> words indexed)
That's output from pdftotext. This is all I get:
$ pdftotext 21set2004.pdf out.txt
Error: Unknown character collection 'Adobe-WinCharSetFFFF'
Error: Unknown font tag 'R137'
It seems to have generated the output without any other problems,
You might try updating your version of xpdf.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Tue Dec 14 07:08:10 2004