On Tue, Aug 19, 2003 at 01:18:38AM -0700, Bucharow Leonard wrote:
>
> Hi Bill and Co.,
>
> first I may not understand, what you mean with:
> > How do humans without javascript follow those links?
> anyway I unfortunately can't influence the humans to create links with
> HTML/XML or th. else then java-plug-in.
I see people use javascript for links where normal html links work fine.
I use one online (bill paying) service where they use javascript links
for a lot of their navigation. Half the time they don't work right and
my forward and back buttons don't work as expected. And I do turn off
javascript at times and it always takes me a few minutes figure out why
things are not working.
> Second I have two another questions:
>
> Can SWISH-E write IndexReport in a file (f.e. during executing a cron job)?
> If yes, how?
My opinion is that cron jobs are better if they only report errors.
Otherwise you start ignoring the logs.
So I use
swish-e -c config -v0
Otherwise, pipe swish-e's output to grep or awk or perl and extract out
the data you want logged. Swish writes \r to overwrite the percentage
complete, so just writing that to a file might not look too good --
which is why I suggest piping to some program to filter out the data you
want to keep.
>
> I'm trying to spider not the entire web server but only a web folder (f.e. I
> may not to spider the apache manual).
> In the SwishpiderConfig.pl I've set the option:
> base_url => http://host/intranet/
> But spider.pl indexes the entire web server! Do I something wrong?
> I've excluded the folder with robots.txt, but I don't understand, why can't
> I set up the folder to index?
The only limitation is that it only indexes one server (host name) at a
time (per section of the spider config file). If you set
base_url => http://host/foo_directory
there's nothing to keep it from indexing any other directory on "host".
But you can use robots.txt to limit what is indexed. You can also setup
a "test_url()" callback function to limit to, say, just the
"foo_directory" directory. See:
http://swish-e.org/dev/docs/spider.html#CALLBACK_FUNCTIONS
--
Bill Moseley
moseley@hank.org
Received on Tue Aug 19 12:39:46 2003