=20
Hi,
=20
I'm trying to configure swish-e on my Windows XP machine to index pdf files=
, I then would like to use the cgi script to have a web interface...
=20
I have installed ActivePerl from the file: ActivePerl-5.8.8.817-MSWin32-x86=
-257965.msi to C:\Perl
And installed swish-e from the file: swish-e-2.4.3-win32.exe to C:\Program =
Files\SWISH-E
=20
Therefore,
Swish-e version: 2.4.3
Operating System: Windows XP Version 2002 Service Pack 1
=20
My swish.conf looks like:
=20
IndexName "Hardware Datasheets"
IndexDescription "This is an index of hardware datasheets from external sou=
rces."
IndexPointer C:\"Program Files"\SWISH-E
IndexAdmin "Swish-e Configuration Admin (holly.caruso@tenix.com)"
IndexDir P:\\datasheets
IndexOnly .pdf
FileFilter .pdf C:\"Program Files"\\SWISH-E\\share\\doc\\swish-e\\filter-bi=
n\\_pdf2html.pl
MetaNames title subject author swishdocpath
UndefinedMetaTags ignore
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-#,\/=3D+:
IndexReport 3
IgnoreWords of or and the a to i
TranslateCharacters :ascii7:
BumpPositionCounterCharacters |.
StoreDescription TXT* 10000
StoreDescription HTML* <body> 10000
=20
=20
The _pdf2html.pl looks like
=20
#! /usr/bin/perl -w
use strict;
=20
# -- Filter PDF to simple HTML for swish
# --
# -- 2000-05 rasc
#
=3Dpod
=20
This filter requires two programs "pdfinfo" and "pdftotext"...
=20
$ENV{PATH} =3D C:\\"Program Files"\\SWISH-E\\lib\\swish-e\\
=20
"pdfinfo" extracts...=20
=20
=3Dcut
=20
=20
my $file =3D shift || die "Usage: $0 <filename>\n";
=20
#
# -- read pdf meta information
#
=20
..
Nothing else in this file I have changed...
=20
I have done what is suggested, running the index on a single file with the =
following command:
C:\Program Files\SWISH-E>swish-e -i AM29LV128.pdf -T indexed_words
=20
I presume this commands doesn't use the swish.conf... some of the output fr=
om this commands is as follows:
=20
=20
Adding:[1:swishdefault(1)] '00000' Pos:743 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'n' Pos:744 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '0000058207' Pos:745 Stuct:0x9 ( BODY F=
ILE )
=20
Adding:[1:swishdefault(1)] 'v=F5j' Pos:800 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'n=DEwi' Pos:801 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=DF' Pos:802 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=DA=F5' Pos:803 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'm=DDi' Pos:804 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=FEjk=DA=B4' Pos:805 Stuct:0x9 ( BODY =
FILE )
Adding:[1:swishdefault(1)] 'y' Pos:806 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] ' ' Pos:807 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=DA' Pos:808 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=B7=AF' Pos:809 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '=DAc' Pos:810 Stuct:0x9 ( BODY FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 306 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
306 unique words indexed.
4 properties sorted.
1 file indexed. 652,348 total bytes. 806 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
=20
It looks like it isn't indexing words properly... I don't know how to fix t=
he problem. Any help would be greatly appreciated as I'm working on a deadl=
ine.
=20
Thank you.
=20
=20
=20
Disclaimer :
The contents of this e-mail including any attachments are intended only
for the person or entity to which this e-mail is addressed. If you are not,
or believe you may not be, the intended recipient, please advise the sender
immediately by return e-mail, delete this e-mail and destroy any copies.
Tenix does not warrant nor guarantee that this email communication is free
from errors, virus, interception or interference.
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jul 6 01:55:30 2006