Skip to main content.
home | support | download

Back to List Archive

Re: Unable to retrieve documents

From: Bill Moseley <moseley(at)>
Date: Tue Jan 06 2004 - 21:55:18 GMT
On Tue, Jan 06, 2004 at 10:17:06AM -0800, Kaplan, Andrew H. wrote:
> I have set up our webserver such that the swish.cgi page comes up when
> a person wants to retrieve a document.  When the text is entered the
> results screen does appear with the appropriate links to the documents
> in question.  However, users are unable to access the documents.

Seems like if they can't be accessed then they are not appropriate

> The results screen does show the names of the files with their extensions, ie:
> pdf, doc, etc. Immediately under
> the files the word NULL appears in parentheses.

That NULL is in the FAQ.  See the swish.cgi docs.

> The information about the file
> including its modification date, 
> size, and path also appears. Clicking on the file causes the error screen 
> 			Not Found -- The requested url was not found on this
> server 
> to appear.

Well, that's just a web server issue -- you have to make sure the paths
point to the right locations.

You can rewrite the the path when indexing (in the swish-e config file)
with ReplaceRules, and you can also prepend text to each path by a
setting the the swish.cgi config file.

> The files that are being indexed are either Adobe pdf, MS-Word doc, MS-Excel
> xls, and htm documents. They all have 
> spaces between the words in their titles. The server itself has the catdoc,
> xls2csv, and xpdf programs installed. 

Space between their words in their "titles"?  Or do you mean file names.  I suspect you 
mean file names.  You don't give much details so I can't know for sure, but here's
an example of indexing files with a space:

Notice that the href is correct:

moseley@bumby:~/apache$ echo "hello" >  "file with space.txt"

moseley@bumby:~/apache$ swish-e -i "file with space.txt" -v0

moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=hello | grep txt
        <dt>1 <a href="file%20with%20space.txt">file with space.txt</a> <small>-- rank: <b>1000</b></small></dt>
<tr><td><small>Document Path:</small></td><td><small> <b>file with space.txt</b></small></td></tr>

> What do I need to do to correct this problem? Thanks.

Something like the above few lines that demonstrate the problem.

Here's another example with spidering:

moseley@bumby:~/apache$ cp test.pdf "test pdf with spaces.pdf"

moseley(at)not-real.bumby:~/apache$ /usr/local/lib/swish-e/ default http://localhost/apache/test%20pdf%20with%20spaces.pdf | swish-e -S prog -i stdin -v0
/usr/local/lib/swish-e/ Reading parameters from 'default'

Summary for: http://localhost/apache/test%20pdf%20with%20spaces.pdf
Total Bytes: 12,593  (12593.0/sec)
 Total Docs:      1  (1.0/sec)
Unique URLs:      1  (1.0/sec)

moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=the | grep pdf
        <dt>1 <a href="http://localhost/apache/test%20pdf%20with%20spaces.pdf">http://localhost/apache/test pdf with spaces.pdf</a> <small>-- rank: <b>1000</b></small></dt>
<tr><td><small>Document Path:</small></td><td><small> <b>http://localhost/apache/test pdf with spaces.pdf</b></small></td></tr>

Bill Moseley
Received on Tue Jan 6 21:55:26 2004