Skip to main content.
home | support | download

Back to List Archive

Re: Web Page listing software necessary when

From: AusAqua <ausaqua(at)not-real.yp-connect.net>
Date: Wed Jun 13 2001 - 11:45:14 GMT
Hi,

Thankyou for your considered response to my questions, Bill.

> From: Bill Moseley <moseley@hank.org>
> Subject: [SWISH-E] Re: Web Page listing software necessary when
> 
> Just a comment.  Chris is using a version of a CGI script that I wrote that
> required a bunch of modules.  After my dogma was run over by other
> developers, there's now a "lite" version that doesn't require as much work
> to install.

What is the address from which I could download the "lite" version please.
A pointer to any associated information on this "lite" version could also be
very helpful.
 
> To run the spider.pl you will need to install modules, true, but most of
> the same ones you would need for -S http method.
> 
>> "Do the modules need to be in the same directory as PERL (or Swish), in
>> order to be accessible to PERL & Swish, or can they in fact be loaded
>> anywhere within my web site and still be accessed as required by the web
>> host's Perl program?"
> 
> No.  First, if you have shell access then you should be able to install
> modules locally in your own directory (but not in "web space").  The perl
> modules do not need to be located anywhere special.  You install the
> modules using a PREFIX, and then add a 'use' line in the perl program.  So,
> yes, swish and perl can be located anyplace as long as they can be seen by
> a process running as the web server runs.
> 
OK.  The prospective ISP says I would have shell access so installation of
the modules seems "do-able".  However I'm waiting for a response as to
whether the ISP would add the 'use' lines to their perl program.


>> Another approach on which I'd appreciate comment, might be to load swish,
>> Perl, the modules described above, and a copy of my Web Site onto a Linux
>> platform hard drive, then do the indexing on that drive, before FTPing the
>> indexes and swish-e into appropriate directories on the hosted web site
>> (also Linux platform).
> 
> The index can be copied around, but not swish-e unless you are sure you
> share the same platform as the target system.

OK.  So is it enough to have both my local computer (containing the Web Site
copy and on which I would build swish-e) and the ISP's server on Linux
platform, or do I further need to ensure that the same Version of Linux (say
5.6.1) and the same server software is on both my local computer and the
ISP's server ?
> 
>> Now to my question to the group:
>> 
>> "Are the modules (described above) needed only at the time of swish's
>> indexing of the web site, or are they also needed at such times as someone
>> accesses the Web Page to do a "search" of the indexes on the site ?"
> 
> Most of those are for spidering.  And most are part of the LWP bundle of
> modules.  LWP is widely used, and thus is commonly installed on machines,
> so you may not need to install any modules.  It's needed for both the -S
> http and -S prog (with spider.pl) methods.  The MD5 module is not really
> needed.  The HTML:: parsing modules are needed to parse links from
> documents.  None of this is needed if you are not spidering your web site,
> and can use the -S fs method.
> 
> My experience is that although ISPs offer perl and many perl modules, it's
> less likely to find an ISP that keeps things up to date.  I tend to keep my
> own perl library, regardless of what the ISP offers.
> 
> Does that answer your question?

Almost.  I think the issue might be: "Do I need to use spidering to achieve
my aims".  Hopefully if I can answer this, then I might be able to determine
whether what the prospective ISP is able to offer will be sufficient to
enable me to work swish-e.  I'm still not quite clear as to whether the
spidering function is needed for what I want to do (owing mainly to a lack
of knowledge as to how spidering works and what its exact interaction is
with Swish), and am therefore not clear as to whether I will need to install
the modules.  My imagination suggests a likely role but is not to be relied
upon.

Perhaps if I can illustrate just what I want to be doing, by way of example
and with the help of a simple .gif diagram (DIRECTORIES.gif is attached and
should be visible at the bottom of this e-mail, without opening it), it may
be easy to comment about whether my aims can be achieved without spidering
(and therefore without the Perl modules).

With reference to the sample figure DIRECTORIES.gif, I would want to
generate (and be able to search) independent indecis & dictionaries at the
following directory levels:
1. DirO: inclusive of words within Title or MetaTag of files x.html, y.html,
z.html, 1.html, 2.html, 3.html, a.html, b.thml, c.html.
2. SubDir1: inclusive of words within Title or MetaTag of files y.html,
1.html, 2.html, 3.html.
3. SubDir2: inclusive of words within Title or MetaTag of files z.html,
a.html, b.thml, c.html.

nb Please note in the above example that the lack of a hyperlink to 1.html,
2.html, 3.html & b.html is intentional.

So the specific question I'm asking is "Is spidering required to create and
search the indices and dictionaries created at the various directory levels
shown above ?"

If my question is unclear, perhaps someone could give a pointer to a web
page that fully describes the function of the swishspider, whether it is
active only at such times as indexing is performed or whether it is also
active at such times as a search is initiated through the Web browser.


Andrew L.


Received on Wed Jun 13 12:05:52 2001