Skip to main content.
home | support | download

Back to List Archive

Getting link right in results . . .

From: John Almberg <jalmberg(at)not-real.identry.com>
Date: Thu Jan 23 2003 - 22:37:50 GMT
I am still having trouble getting the link right on the results page. 
I'm now spidering the pages, and the link associated with the pages is like:

http://www.domain.comhttp://www.domain.com/index.html

That is, the base URL is getting appended to the front of the actual 
page URL. I'm sure there's a simple config fix for this, but I can't 
figure it out.


My swish.conf file:

IndexDir ./spider.pl ./MySQL.pl

SwishProgParameters spider.conf

DefaultContents HTML

StoreDescription HTML <body> 200000

IndexContents HTML .htm .html .phtml

MetaNames swishdocpath swishtitle


My spider.conf file:

@servers = (

    {

        base_url    => 'http://www.domainname.org/index23.phtml',

        same_hosts  => [ qw/domainname.org/ ],

        email       => 'jalmberg@identry.com',

        # limit to only .html-like files

        test_url    => sub { $_[0]->path =~ /\.(phtml|shtml|html|htm)$/ },

        delay_min   => .0001,     # Delay in minutes between requests

        max_time    => 10,        # Max time to spider in minutes

        max_files   => 1000,       # Max Unique URLs to spider

        max_indexed => 1000,        # Max number of files to send to swish for indexing

        keep_alive  => 1,         # enable keep alives requests

        

        # debug => DEBUG_URL,

    },

);    

   
# Must return true...

1;
Received on Thu Jan 23 22:38:08 2003