Please check, if the "pdf-filter.sh" has the correct $PATH available
(use set >/tmp/debug-env or something like that) .
In the worst case, include a hard coded path to pdftotext into the filter
script.
cu - rainer
> -----Original Message-----
> From: Kemp Randy-W18971 [mailto:Randy.L.Kemp@motorola.com]
> Sent: Tuesday, July 31, 2001 6:58 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Indexing pdf files
>
>
> I can't for the life of me get my pdffiles to index. My
> executables are in
> /usr/local/ and they are work individually, including
> pdftotext. Could
> someone please help me with the filters on the Solaris sparc
> 5.6 platform
> with swishe 2.0.5?
>
> -----------------------> executables for pdftotext and swish-e
> <---------------------------------------------------------------------
> ee110:/usr/local> ls
> bin conf lib perl swish-e
> cmold.d ecg pdftotext psionic swish-search
>
> I am running version 2.0.5 of swish.e
>
> -----------------------------> pdf file
> <-------------------------------------------------------------
> --------------
> -------
> My pdf file is located in ee110:/usr2/apache/htdocs/pdffiles> ls
> requirements.pdf
>
> -----------------------------> config file at
> e110:/usr2/ecadtesting/swishe-index> ls
> index.swish search.log swisheconf.conf
>
> My config file is
>
> # Sample SWISH configuration file
>
> # Global Networks Technical Support,
> support@gobalnetworks.com, 5/10/96
>
>
>
> #IndexDir /usr/home/globalne/usr/local/etc/httpd/htdocs/
> IndexDir /usr2/apache/htdocs/pdffiles/
>
>
>
> # This is a space-separated list of files and
>
> # directories you want indexed. You can specify
>
> # more than one of these directives.
>
> # Be sure to change globalne to be your Server login name.
>
>
>
> IndexFile /usr2/ecadtesting/swishe-index/index.swish
>
> # This is what the generated index file will be.
>
>
>
> IndexName "PCS Web Page Index"
>
> IndexDescription "This is a full index of the PCS web site."
>
> IndexPointer "http://ee110.ecg.csg.mot.com:8000/cgi-bin/search.cgi"
>
> IndexAdmin "PCS Technical Support (Randy.L.Kemp@motorola.com)"
>
> # Extra information you can include in the index file.
>
> # You probably want to change the Global Networks references.
>
>
>
> IndexOnly .html .htm .txt .gif .xbm .au .mov .mpg
>
> # Only files with these suffixes will be indexed.
>
>
>
> IndexReport 3
>
> # This is how detailed you want reporting. You can specify numbers
>
> # 0 to 3 - 0 is totally silent, 3 is the most verbose.
>
>
>
> FollowSymLinks yes
>
> # Put "yes" to follow symbolic links in indexing, else "no".
>
>
>
> NoContents .gif .xbm .au .mov .mpg
>
> # Files with these suffixes will not have their contents indexed -
>
> # only their file names will be indexed.
>
>
>
> #ReplaceRules replace "/usr/home/globalne/usr/local/etc/httpd/htdocs"
> "http://www.globalnetworks.com"
> ReplaceRules replace "/usr2/apache/htdocs"
> "http://ee110.ecg.csg.mot.com:8000"
>
>
> # ReplaceRules allow you to make changes to file pathnames
>
> # before they're indexed.
>
> # Be sure to change globalne to be your Server login name.
>
>
>
> FileRules pathname contains admin testing demo trash construction
> confidential
>
> FileRules filename is index.html
>
> FileRules filename contains # % ~ .bak .orig .old old.
>
> FileRules title contains construction example pointers
>
> FileRules directory contains .htaccess
>
> # Files matching the above criteria will *not* be indexed.
>
>
>
> IgnoreLimit 50 100
>
> # This automatically omits words that appear too often in the files
>
> # (these words are called stopwords). Specify a whole percentage
>
> # and a number, such as "80 256". This omits words that occur in
>
> # over 80% of the files and appear in over 256 files. Comment out
>
> # to turn of auto-stopwording.
>
>
>
> IgnoreWords SwishDefault
>
> # The IgnoreWords option allows you to specify words to ignore.
>
> # Comment out for no stopwords; the word "SwishDefault" will
>
> # include a list of default stopwords. Words should be
> separated by spaces
>
> # and may span multiple directives.
>
> FilterDir /usr2/ecadtesting/shellscripts/
> FileFilter .pdf pdf-filter.sh
>
> ------------------> Text results with pdf (html docs will work ok in
> directory htdocs)
> <------------------------------------------------------
> My test results are:
>
> ee110:/usr2/ecadtesting/shellscripts> ls
> dailystats.sh ncftpput.sh rkgraph001.sh webalizer.sh
> http-analyze.sh pdf-filter.sh swishe.sh
> Checking dir "/usr2/apache/htdocs/pdffiles/"...
>
> Removing very common words...
> 336 words removed.
> 0 words removed not in common words array:
>
> Writing main index...
> Computing hash table ...
> Writing header ...
> Writing index entries ...
> Writing stopwords ...
> no unique words indexed.
> Writing file index...
> Writing file list ...
> Writing file offsets ...
> Writing MetaNames ...
> Writing offsets (2)...
> no files indexed.
> Running time: Less than a second.
> Indexing done!
> ee110:/usr2/ecadtesting/shellscripts>
>
> ee110:/usr2/ecadtesting/shellscripts> more swishe.sh
> /usr/local/swish-e -c /usr2/ecadtesting/swishe-index/swisheconf.conf
>
>
>
>
>
> -----------------------------------------------------------
> This Mail has been checked for Viruses
> Attention: Encrypted Mails can NOT be checked !
>
> ***
>
> Diese Mail wurde auf Viren ueberprueft
> Hinweis: Verschluesselte Mails koennen NICHT geprueft werden!
> ------------------------------------------------------------
>
Received on Tue Jul 31 17:13:58 2001