The same, note that there are no 'MD5 Duplicates', there are a lot of
'Duplicates', but I think problem is not in duplicates.
Maybe the problem is in 'Content-Encoding: gzip' or in 'Content-Type:
chunked'? I've tryed on various sites, some of them are being indexed
successfully, others are not, the only difference I see is in these
HTTP Response headers.
On Feb 4, 2008 4:03 PM, Peter Karman <firstname.lastname@example.org> wrote:
> On 02/04/2008 07:16 AM, Alexander Dolgarev wrote:
> > Yet again, I've reinstalled swish-e (version 2.4.5) and have the same
> > effect (or defect):
> > Summary for: <SOME_URL>
> > Connection: Close: 3 (0.0/sec)
> > Connection: Keep-Alive: 224 (1.2/sec)
> > Duplicates: 60 (0.3/sec)
> > Off-site links: 14 (0.1/sec)
> > Total Bytes: 74,442 (402.4/sec)
> > Total Docs: 226 (1.2/sec)
> > Unique URLs: 227 (1.2/sec)
> > text/html: 1 (0.0/sec)
> > All files are suggested by spider.pl to be duplicates. Note that now
> > I've tried also on 3rd party site. Any suggestions?
> try setting 'use_md5' to false ?
> Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
> Users mailing list
Users mailing list
Received on Mon Feb 4 09:20:10 2008