Skip to main content.
home | support | download

Back to List Archive

various problems on windows

From: Philippe A. <futhark77(at)not-real.gmail.com>
Date: Fri Sep 22 2006 - 23:47:29 GMT
I am having many little problems with 2.4.3 on Windows. I have ActivePerl
5.8.8. Any assistance is most welcome. I apologize in advance if I missed
anything obvious.

Thanks!

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1. Accented characters do not get translated properly
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

My cfg is as follow:

IndexFile hrtool.index
IndexDir ../docs
IndexOnly .doc
FileFilter .doc ./lib/swish-e/swish_filter.pl '"%p" "%P"'
TranslateCharacters :ascii7:

A word spelled "montr=E9al" gets converted to "montrcal", as shown by -T
INDEXED_WORDS.
    Adding:[7:swishdefault(1)]   'montrcal'   Pos:2  Stuct:0x9 ( BODY FILE =
)
    Adding:[7:swishdefault(1)]   'montrcal'   Pos:3  Stuct:0x9 ( BODY FILE =
)


Other accented letters produce similar odd results.

I tried both options, none helps:

TranslateCharacters :ascii7:
#TranslateCharacters =E9 e

If I omit TranslateCharacters, words get cut at accented letters position.
"Montr=E9al" becomes two words: "montr" and "al".

I need to be able to parse english and french documents. I don't mind
"loosing" accented letters during indexing, in fact I was quite happy when =
I
read swish could do that for me.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
2. Can't locate object method "filter" via package "SWISH::Filter"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

I am running the following command:
swish-e -c ..\hrtool.cfg -S prog

swish_filter reports an error and nothing gets indexed.
Can't locate object method "filter" via package "SWISH::Filter"

My cfg is as follows:

IndexFile hrtool.index
IndexDir swish_filter.pl
SwishProgParameters ../docs
TranslateCharacters :ascii7:

In 2.4.2, I have a different error:

Indexing Data Source: "External-Program"
Indexing "swish_filter.pl"
External Program found: C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e/swish_filter.pl
Use of uninitialized value in concatenation (.) or string at
C:\phil\pgms\swish\swish-2.4.2\lib\swish-e\perl/SWISH/Filter.pm line 341.
Failed to set content type for file reference ''Use of uninitialized value
in concatenation (.) or string at C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e\swish_filter.pl line 53.
 - Not filtered:  (../docs)
Use of uninitialized value in print at C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e\swish_filter.pl line 56.
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

With a different config, swish_filter will work under 2.4.2 (but never unde=
r
2.4.3):

IndexFile hrtool.index
IndexDir ../docs
IndexOnly .doc
FileFilter .doc ./lib/swish-e/swish_filter.pl '"%p" "%P"'
TranslateCharacters :ascii7:

But needless to say, I'd prefer not to have to define individual filters.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
3. Systematic error on PDF files: "May not be a PDF file"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

The complete error is the following:
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table

I obtain this error with PDF generated by OpenOffice or a PDF printer in
Windows.

Options I use to parse them are the following:
IndexOnly .pdf
FileFilter .pdf ./lib/swish-e/pdftotext.exe '"%p" "%P"'



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Sep 22 16:47:34 2006