Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Not working: FileRules filename (regex or contains)

From: Dr Michael Daly <"Dr>
Date: Mon, 19 Mar 2012 23:53:02 +1100 (EST)
Hi Peter
I went on a bit of a search with your clue and its now solved!

Simply by adding:
AddDefaultCharset utf-8
Below my <VirtualHost *:104> setting

This one from the archives also helped:
http://swish-e.org/archive/2007-08/11559.html
Entitled:
The old encoding/length problem with spider.pl

I am not sure what I would do if it was an external server that I had no
control over

Thanks again!

Michael

Dr Michael Daly wrote on 3/17/12 8:50 AM:

> ---- Response ---
> Status: 200 OK
> Date: Sat, 17 Mar 2012 13:32:15 GMT
> Accept-Ranges: bytes
> ETag: "3a40ada-303d5-4abc872fbe5f7"
> Server: Apache
> Content-Length: 197589
> Content-Type: application/pdf
> Last-Modified: Wed, 31 Aug 2011 07:55:17 GMT
> Client-Date: Sat, 17 Mar 2012 13:32:15 GMT
> Client-Peer: 127.0.0.1:104
> Client-Response-Num: 7
>
> ^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>>> +Fetched 4 Cnt: 802 GET
> http://localhost:104/Annette/Nehos%20bill%20may.pdf  200 OK
> application/pdf 197589 parent:http://localhost:104/Annette/ depth:4
> ?Testing 'filter_content' user supplied function #1
> 'http://localhost:104/Annette/Nehos%20bill%20may.pdf'
>
> Warning: Unknown header line: '/html>Path-Name:
> http://localhost:104/Annette/Nehos Bill_files/' from program spider.pl
> err: External program failed to return required headers Path-Name:
> .
>

That error indicates that swish-e read more (or less) bytes than expected,
usually because the file just prior to the one with the error indicated
the wrong content length. Often that is an encoding issue.

try spidering just the problem file and the one just before it, and make
sure they can both complete successfully. You might want to add one or
more of the swish-e -T debug options to see what headers are being read.


--
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users


_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Mar 19 2012 - 13:03:18 GMT