Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Title and Description URL is wrong

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Sep 18 2009 - 14:38:21 GMT
Ronny Rahardjo wrote on 09/18/2009 01:20 AM:

>  
> However, when we use the latest .idx file (we run a scheduled job 
> everyday to reindex the site), the search result is showing duplicate 
> result and some of the url is just wrong such as:
>  
> www.domainname.com//news/article1.html 
> <http://www.domainname.com//news/article1.html> (with extra '/')
>  
> So I assume that the problem is on the indexing configuration.

safe assumption.

>  
> swish.config
> ===========
>  
> # Include our site-wide configuration settings:
> IncludeConfigFile common.config
>  
> # Specify the program to run
> #IndexDir output.txt
> IndexDir spider.pl
> IndexFile d:/htdocs/www2/cgi-bin/indexdb.idx
> SwishProgParameters default http://www.domainname.com/index.html
>  

Try turning on the debugging output in spider.pl (see the docs for how) 
to discover how spider.pl is parsing the links it follows and the docs 
it creates. I suspect that will reveal something that changed in your 
site layout. Other things you'll want to read are:

http://swish-e.org/docs/spider.html#broken_relative_links
http://swish-e.org/docs/spider.html#debug
http://swish-e.org/docs/spider.html#use_md5

good luck.


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Sep 18 10:38:27 2009