Skip to main content.
home | support | download

Back to List Archive

Re: Getting link right in results . . .

From: John Almberg <jalmberg(at)not-real.identry.com>
Date: Fri Jan 24 2003 - 03:48:22 GMT
Okay . . . figured it out. Commented out prependpath.

John Almberg wrote:

>I am still having trouble getting the link right on the results page. 
>I'm now spidering the pages, and the link associated with the pages is like:
>
>http://www.domain.comhttp://www.domain.com/index.html
>
>That is, the base URL is getting appended to the front of the actual 
>page URL. I'm sure there's a simple config fix for this, but I can't 
>figure it out.
>
>
>My swish.conf file:
>
>IndexDir ./spider.pl ./MySQL.pl
>
>SwishProgParameters spider.conf
>
>DefaultContents HTML
>
>StoreDescription HTML <body> 200000
>
>IndexContents HTML .htm .html .phtml
>
>MetaNames swishdocpath swishtitle
>
>
>My spider.conf file:
>
>@servers = (
>
>    {
>
>        base_url    => 'http://www.domainname.org/index23.phtml',
>
>        same_hosts  => [ qw/domainname.org/ ],
>
>        email       => 'jalmberg@identry.com',
>
>        # limit to only .html-like files
>
>        test_url    => sub { $_[0]->path =~ /\.(phtml|shtml|html|htm)$/ },
>
>        delay_min   => .0001,     # Delay in minutes between requests
>
>        max_time    => 10,        # Max time to spider in minutes
>
>        max_files   => 1000,       # Max Unique URLs to spider
>
>        max_indexed => 1000,        # Max number of files to send to swish for indexing
>
>        keep_alive  => 1,         # enable keep alives requests
>
>        
>
>        # debug => DEBUG_URL,
>
>    },
>
>);    
>
>   
># Must return true...
>
>1;
>
>
>  
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~
Identry, LLC
www.identry.com
jalmberg@identry.com
Received on Fri Jan 24 03:48:37 2003