I have an intranet application with which I want to use swish-e.
This app uses some cookies for persistent authentication and state
management. I note that spider.pl already handles cookies.
It seems to me that the way to spider the site is to start indexing at
the login form, for example
http://myintranet.org/login.php?_function=checkpw&username=swishe&password=
spider
assuming username/password = swishe/spider for the spider.
This would cause swishe to login and receive its cookies. Thereafter it
would use cookies to maneuver throughout the site.
But this bombs out with
swish-e -S prog -c spider.config
Indexing Data Source: "External-Program"
Indexing "spider.pl"
sh: line 1: username=swishe: command not found
sh: line 1: .password=swishe: command not found
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
I've tried some escape sequences, but in glancing through the beginning
of spider.pl, it looks like you start with a base URI and only allow
search arguments, I presume, afterwards.
So rather than mess with your script, it probably makes more sense to
have a back door for the spider, so we start at:
http://myintranet.org/spider.php
this page does the login for the spider, returning the necessary cookies.
Of course, an .htaccess restriction is needed on this page.
So, if I do it that way, will swish-e accept a 302 redirect to index.html
or should it return the contents of index.html? The latter is a little
more involved, since we don't want spider.php to appear in the index.
Any helpful hints?
Bill Conlon
To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306
office: 650.327.2175
fax: 650.329.8335
mobile: 650.906.9929
e-mail: mailto:bill@tothept.com
web: http://www.tothept.com
Received on Thu Aug 21 21:42:59 2003