On Tue, Sep 30, 2003 at 10:47:39PM -0700, jchen@hdc.org.nz wrote:
> I tried to install swish-e on win2k server, there is no problem indexing
> html file, but I found swish-e can not indexing word document which has
> white space in the file name, like "Wordfile.doc" & "copy of Wordfile.doc"
That's probably due to how the filter is written.
> FileFilter .pdf c:/wwwroot/cgi-bin/xpdf/pdftotext.exe '"%p" -'
> FileFilter .doc C:/wwwroot/cgi-bin/catdoc/catdoc.exe "%p"
FileFilter .doc C:/wwwroot/cgi-bin/catdoc/catdoc.exe '"%p"'
That will then include the quotes as part of the command passed to your
Windows shell. The way you had it those double quotes were just seen by
swish.
I can't test right now on Windows, but I can try 2.4.0 on my laptop.
i.e. different version of swish and a different OS, so doesn't really
apply... ;)
moseley@laptop:~$ cat c
SwishProgParameters default http://localhost/apache/index.html
IndexDir spider.pl
moseley@laptop:~$ swish-e -c c -S prog -v0
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
Summary for: http://localhost/apache/index.html
Connection: Keep-Alive: 4 (4.0/sec)
Total Bytes: 17,156 (17156.0/sec)
Total Docs: 5 (5.0/sec)
Unique URLs: 5 (5.0/sec)
moseley@laptop:~$ swish-e -w not dkdkd -H0
1000 http://localhost/apache/test.txt "test.txt" 12
1000 http://localhost/apache/doc with spaces.doc "doc with spaces.doc" 2172
1000 http://localhost/apache/test.doc "test.doc" 2172
1000 http://localhost/apache/test.pdf "test.pdf" 12593
1000 http://localhost/apache/index.html "title" 207
--
Bill Moseley
moseley@hank.org
Received on Wed Oct 1 06:31:56 2003