Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Indexing offsite links

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Mar 19 2010 - 14:07:10 GMT
Paras Fadte wrote on 03/19/2010 12:52 AM:
> Hi,
> 
> Is it possible to index hyperlinks present on a webpage which would be
> referring to some other hosts ? Following is the example
> 
> 
> Example:
> 
> http://mysite.com/index.html  has say 3 hyperlinks viz.
> http://a.com/a.html , http://b.com/b.html , http://c.com/c.html . So
> when I index  "http://mysite.com/index.html" using spider.pl
> <http://spider.pl> and use swish.cgi to do a search by using "b.html" in
> search field with metaname selected as "swishdocpath" it should show a
> clickable "http://b.com/b.html" link.
> 
> Is this possible in swish-e ?

possible. but not exactly like you're describing.

In your example you were not interested in the contents of 'b.html' but
only that it is registered as a document. You could instead just tell
swish-e to index the contents of <a> href attribute values (link names)
with http://swish-e.org/docs/swish-config.html#htmllinksmetaname

Otherwise, if you really want to index the contents of files mysite
links to, you could abuse the same_hosts feature:
http://swish-e.org/docs/spider.html#same_hosts

But same_hosts won't really do what you want since it will index the
link under mysite.com rather than b.com.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 19 10:07:15 2010