Skip to main content.
home | support | download

Back to List Archive

Re: FW: PDF indexing suddenly stopped working

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Dec 02 2005 - 18:35:34 GMT
On Fri, Dec 02, 2005 at 10:09:54AM -0800, Chad Day wrote:
> Sorry, should have provided more detail.. I was doing a swish-e presentation in 30 minutes and then this broke, hence the panicking.
> 
> It doesn't hang or anything, it just skips the PDFs when indexing via HTTP.

Are you using -S http method?  Don't use that.

I was going to say -S http method doesn't filter by default, but it
looks like it does.  But if you must, look at swishspider to see what
it's doing.

Why don't you use spider.pl?

moseley@bumby:~$ cat spider.conf
@servers = ( {
    base_url => 'http://swish-e.org',
    max_files => 4,
    use_default_config => 1,
    email => 'moseley@hank.org',
});

$ /usr/local/lib/swish-e/spider.pl spider.conf  > out
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.conf'
/usr/local/lib/swish-e/spider.pl: Max files Reached

Summary for: http://swish-e.org
     Connection: Close:      1  (1.0/sec)
Connection: Keep-Alive:      4  (4.0/sec)
            Duplicates:     71  (71.0/sec)
        Off-site links:     28  (28.0/sec)
           Total Bytes: 30,651  (30651.0/sec)
            Total Docs:      4  (4.0/sec)
           Unique URLs:      5  (5.0/sec)
             text/html:      4  (4.0/sec)


moseley@bumby:~$ swish-e -S prog -i stdin < out
Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 312 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
312 unique words indexed.
4 properties sorted.                                              
4 files indexed.  30,651 total bytes.  1,022 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!





-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Dec 2 10:35:35 2005