Skip to main content.
home | support | download

Back to List Archive

Re: Adding files from external site - suggestions?

From: Rob de Santos AFANA <rdesantos(at)not-real.afana.com>
Date: Tue Mar 09 2004 - 21:56:39 GMT
Thought I would update where I am on this.  
To recap, I want to include data from another site in my index.  My
commission sales of the other sites products go thru the other site but
data on available products doesn't show up in my index.  

My current plan is as follows:

Use wget to mirror the section of the other site over to mine.  This
will give a set of files under
http://www.afana.com/www.othersite.com/afl/

Then run Swish-E against that.  Then on display of the index I will need
to transform the URL's, presumably with ReplaceRules??  e.g.:

I will have an URL such as:
http://www.afana.com/www.othersite.com/afl/video_detail.asp?vid_id=338 
and have to transform it to:
http://www.othersite.com/cgi-bin/at.pl?a=195711&e=/afl/video_detail.asp?
vid_id=338

If I have this configuration rule set that should do it I think:

ReplaceRules replace "afana.com/www.othersite.com"
"othersite.com/cgi-bin/at.pl?a=195711&e="

The portion /cgi-bin/at.pl...&e is the "affiliate" info and doesn't
change no matter what page I want included and insures that a user
clicking on the link in my index credits my site as the referring
seller. 

This should mean a search on my site will generate hits in swish-e and
produce the right link to the other site for the user. 

Sound reasonable?
-Rob 

> -----Original Message-----
> From: swish-e@sunsite.berkeley.edu 
> [mailto:swish-e@sunsite.berkeley.edu] On Behalf Of Rob de Santos AFANA
> Sent: Sunday, March 07, 2004 8:48 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: Adding files from external site - suggestions?
> 
> 
> Bill Moseley wrote: 
> > spider.pl just fetches web pages, indexes the content and extracts
out
> 
> > the links into a queue of other URLs to index.  Extracted links
pointing 
> > to other sites are just ignored, unless they are setup as
"same_hosts"
> 
> > -- although that's more for mapping www.foo.com and foo.com to the
same 
> > host name.
> 
> OK, understood.  Any reason why I couldn't map 
> www.othersite.com/video/ to my host?  
> Particularly if I set 
> up redirection in .htaccess on my site so that 
> www.afana.com/video/ sent users to the other 
> site's pages?
> 
> > 
> If what you want to do is insert the content of another page
> > into the page being indexed then I'd probably use 
> > filter_content to scan for the links to the other site, fetch 
> > that page or pages and extract the content and add it into 
> > the current page being indexed.
> 
> No, not really what I had in mind, though it *might* work.  
> I'm waiting to hear from the other site's web guru to see how 
> his pages are structured.  If they are "dynamic", e.g. 
> regenerated when needed that might complicate this. 
>  
> > The extracted links are not available to the filter so you
> > would have to extract them yourself.
> 
> Shouldn't be that hard, if needed.  Redirection seems simpler 
> though. I'm satisfied if I can simply include the appropriate 
> subset of pages from the other site in my index at this stage. 
> 
> Regards, 
> 
> -Rob
> http://www.afana.com
> 
> 
> 
Received on Tue Mar 9 13:56:39 2004