Hello,
I have swish-e running and working on a Unix box for years. I now have to
implement it on a Win2000 machine :(. Install was easy. I put it in
C:\tools. I can index and search fine, except I need to use the
FileFilters for pdf, etc etc. Just trying to do pdfs and I'm failing.
C:\tools\SWISH-E>swish-e -V
SWISH-E 2.2.3
This is my config file, mostly copied from the Online Docs and other
people's postings:
# Filter Directory
FilterDir C:/tools/SWISH-E/filter-bin
# include all the available filters and mappings for files that we index
# I copied these from some news postings I read...
# eventually will need all of these to work for me as well,
# copied temporarily, but will need the files they reference from somewheres.
# Use the file filter to index pdf files
#FileFilter .pdf c:/tools/SWISH-E/filter-bin/_pdf2html.pl '"%p" -'
#FileFilter .pdf c:/tools/SWISH-E/filter-bin/pdftotext.exe '"%p" -'
#FileFilter .pdf /tools/swish-e/filter-bin/_pdf2html.pl
#FileFilter .PDF /tools/swish_e/filter-bin/_pdf2html.pl
#Filefilter .ppt /tools/xlhtml-0.5.1/bin/ppthtml "'%p'"
#FileFilter .doc /tools/catdoc-0.91.5/bin/catdoc "-a -s8859-1 -d8859-1 '%p'"
#FileFilter .xls /tools/xlhtml-0.5.1/bin/xlhtml "-nc '%p'"
# IndexContents .pdf .PDF
#IndexContents HTML2 .pdf .ppt .PDF .xls
#IndexContents TXT2 .doc .xls .exe .zip .ZIP .tar.Z .tar.gz .tgz .tar
#IndexContents TXT2 .gz .z .Z .ps .rtf
# Define *what* to index
# IndexDir can point to a directories and/or a files
IndexDir .
# only index x files
IndexOnly .pdf .asp
# Show basic info while indexing
IndexReport 1
Here are some runs below:
---------------------------------------
# This is a run with the FileFilter uncommented for pdf as well as
IndexContents
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
err: IndexContents: Unknown document type ".pdf"
# this is a run commenting out all FileFilters/IndexContents, and saying
IndexOnly .pdf
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 235 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
235 unique words indexed.
4 properties sorted.
1 file indexed. 712159 total bytes. 660 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
# cool, it indexed the one pdf in the directory
# so I run a search on that pdf, but nothing
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w Bakajin
# SWISH format: 2.2.3
# Search words: Bakajin
err: no results
.
# this is a run commenting out all FileFilters, and saying IndexOnly .pdf .asp
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 1606 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
1606 unique words indexed.
4 properties sorted.
62 files indexed. 855309 total bytes. 8617 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
# still no results on the pdf file
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w Bakajin
# SWISH format: 2.2.3
# Search words: Bakajin
err: no results
.
# search on something in an asp file, no problem.
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w DirList
# SWISH format: 2.2.3
# Search words: DirList
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.031 seconds
1000 ./DirList.asp "Directory Listing of " & strPath & "" 6702
.
What am I missing? I'm floundering a bit, reading all the readmes and such...
Sharon
Received on Tue Sep 30 21:01:53 2003