Dear Swish user,
I posted a message a while ago trying to see how to
get NTLM authentication to work in conjunction with
swish-e. It was pointed out to me that there is a Perl
module for this. It probably works, but unless some
one tells me step-by-step how to use it, it's a time
consuming route for someone who doesnt speak Perl - i
tried Perl a few years ago and decided there are many
better scripting languages around and stopped learning
and using it. Not trying to start a flaming war here,
but just an observation that if Swish becomes very
Perl focused in it's set-up, it might lose some users.
Anyway, for anyone else who is Perl-ignorant and wants
to do this, I got around the NTLM authentication
required to get to our intranet by using the ntlmap
proxy server. I downloaded the package from
sourceforge. My edited config file is attached below.
Test it by running:
/main.py &
and fetching something with wget:
setenv http_proxy=localhost:5865
and use wget to get a file wget --proxy=on
myintranet_serve/index.htm
To use with swish-e, in the spider config file
(spider.conf.pl), switch on proxy-ing:
# start of spider.conf.pl
my ($filter_sub, $response_sub) = swish_filter();
my %main_site = (
base_url =>
'http://hmx-bi35-s6/sitemap.html',
email =>
'root@bi35-sensorinfo.iac.honeywell.com',
keep_alive => 1, #
Try to keep the connection open
filter_content => $filter_sub, #
use SWISH filter
test_url => sub {
my ($uri, $server) = @_;
# enable proxy requests
unless ($::proxy_set++) {
my $ua = $server->{ua};
$ua->proxy('http',
'http://localhost:5865');
}
# return true if not an image,
otherwise false
return $uri->path !~
/\.(gif|jpeg|png)$/;
},
);
@servers = ( \%main_site);
# end of spider.conf.pl
And in the main config file I have my usual stuff
plus:
IndexDir spider.pl
SwishProgParameters spider.conf.pl
IndexFile "web.index"
The formatting in the above test might be wonky
because I am using yahoo's poor mail composer.
Cheers
Gertjan
(proxy server config file).
[GENERAL]
LISTEN_PORT:5865
PARENT_PROXY:
PARENT_PROXY_PORT:8080
PARENT_PROXY_TIMEOUT:15
ALLOW_EXTERNAL_CLIENTS:0
FRIENDLY_IPS:
URL_LOG:0
MAX_CONNECTION_BACKLOG:5
[CLIENT_HEADER]
[NTLM_AUTH]
NT_HOSTNAME:
NT_DOMAIN:honeywell
USER:e191564
PASSWORD:
LM_PART:1
NT_PART:0
NTLM_FLAGS: 06820000
NTLM_TO_BASIC:0
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Received on Sun Aug 6 18:42:42 2006