Skip to main content.
home | support | download

Back to List Archive

Re: HTTP indexing: clarification of 'intranet'

From: Ron Samuel Klatchko <rsk(at)not-real.corpmail.brightmail.com>
Date: Wed May 31 2000 - 19:28:38 GMT
arajamani@excite.com wrote:
>    What I would like SWISH-E to therefore do is to index these internal
> sites.Most of the links on the main page of this company 'intranet' lead to
> other sites/pages within the 'intranet' and VERY FEW of the links lead to
> pages that are part of the World Wide Web(and that are not a part of the
> company 'intranet'). I would like SWISH-E to access the 'intranet'
> sites/pages and ignore the WWW sites.
>    I must at this point mention that,as a part of testing, when I ran the
> HTTP spidering on my own web-site( which IS a part of the WWW and NOT a part
> of the company intranet) it worked like a charm. From the company, we would
> like SWISH-E to do exactly the opposite.

As I stated previously, when you start spidering a page, SWISH-E will
only index other pages on the same site (where a site is defined as the
combination of the method and server:port portion of the URL*).  If you
want to index multiple sites, you need to either give multiple starting
pages or use the EquivalentServer setting.

>From the point of view of a web client (which SWISH-E in spider mode can
be considered) there is absolutely no difference between an intranet and
the WWW.  In either case you get pages from an HTTP server.  Access
control can be configured to treat clients differently but that's not
something that you deal with on the SWISH-E side of things.

Anyway, it sounds to me like you haven't even tried to see if things
work correctly.  Please do that and if you have a problem, please post
your issue then.

* The server:port portion via a case insensitive compare.  This matching
was kept purposely simplistic due to the complications of virtual
hosting where different DNS names that map to the same physical machine
might still map to different logical HTTP servers.

moo
------------------------------------------------------------
        Ron Samuel Klatchko - Senior Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Wed May 31 15:31:20 2000