I'm trying to index a client site, www.e-caps.com. I'm using 2.5.2, and
have tried 2.4.2, with the same results. Some pages are OK, but one is
confusing spider.pl. I get:
Parsing config file 'e-caps.conf'
Indexing Data Source: "External-Program"
External Program found: /usr/local/lib/swish-e/spider.pl
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
http://www.e-caps.com/za/ECP?PAGE=ABOUT_US - Using HTML2 parser - (470
http://www.e-caps.com/za/ECP?PAGE=HOME - Using HTML2 parser - (409 words)
http://www.e-caps.com/za/ECP?PAGE=PRODUCTS_MAIN - Using HTML2 parser - (140
http://www.e-caps.com/za/ECP?PAGE=KNOWLEDGE - Using HTML2 parser - (387
Warning: Unknown header line: 'tml>Path-Name:
from program spider.pl
err: External program failed to return required headers Path-Name:
The knowledge page passes html validation as far as structure, yet for some
reason, it's leaving the spider with the extraneous 'tml>' string.
My config is:
# Configuration file for spidering the e-caps site
# Use the "spider.pl" program included with Swish-e
# Define what site to index
SwishProgParameters default http://www.e-caps.com/za/ECP?PAGE=ABOUT_US
and the command is:
swish-e -S prog -c e-caps.conf -v9
Other pages on the site, as you can see in the first few, go OK, but for
some reason, the knowledge page makes it blow chunks. Anyone have any
ideas? If I run with -S http, it goes OK, but I need to use prog, as we
have a bunch of PDF files that we want to index.
| Mark Morgan
| Senior Programmer/Analyst
| T H E Z A N E R A Y G R O U P , I N C .
| 25 O'Brien Avenue
| Whitefish, MT 59937
Received on Tue Oct 5 09:21:16 2004