Hi,
want to share my notes, hope they will help other users of this great
search-engine, result of this setup is found at
http://search.statmaster.sdu.dk and thanks for Your (must be Bill :-) )
help!!
Best,
Peter
-------------------------------------------------------------------------
INSTALLING SWISH-E, APACHE AND MOD_PERL FOR WINDOWS 2000
1. PERL
- Must be installed before Swish-e, since the install manager of Swish-e
will
install some perl modules according to the current Perl version.
- Install into 'C:\Perl'
2. SWISH-E
- Install into 'C:\SWISH-E'
- To create the DB, use the Web-spider. This is run from the command line.
The textfile 'swish.cfg' defines how the DB should be created.
- Create the Dir 'C:\SWISH-E\web_index' and put the following text file and
name it swish.cfg
############################################################################
#
# run this cfg with: "swish-e -S http -c swish.cfg"
# to see what metanames your index are using: swish-e -f index.swish-e -T
INDEX_METANAMES
IndexFile myindex.tmp
IndexName myindex.tmp
IndexDir http://www.statmaster.sdu.dk
IndexOnly .html .htm
IndexReport 1
Delay 1
IndexContents HTML* .html .htm
StoreDescription HTML* <body> 10000
# create metaname for the docpath to search using 'select_by_meta'
MetaNames course
ExtractPath course regex !^.*/courses/([^/]+)/.*$!$1!
ExtractPathDefault course other
# create a metaname for headings to search in
MetaNames headings
MetaNameAlias headings h1 h2
MetaNames title description swishdocpath
PropertyNames title headings
MetaNamesRank 10 title
MetaNamesRank 8 headings
MetaNamesRank -5 wrongwords
############################################################################
#
3. WINDOWS 2000 SCHEDULER SERVICE
- To create a daily update of the DB use the Windows 2000 Scheduler service.
It is recommended to use the graphical version of the Scheduler found at
Start->Settings->Control Panel->Scheduled Tasks
- Put the following text file into the 'C:\SWISH-E\web_index' dir and name
it
'updatedb.cmd' then schedule it!
############################################################################
#
REM the cp and mv commands requires cygwin to be installed
REM run the swish-e spider using the config file
swish-e -S http -c swish.cfg
REM move current db to old
cp -f myindex myindex.old
cp -f myindex.prop myindex.old.prop
REM move newly created to current
mv -f myindex.tmp myindex
mv -f myindex.tmp.prop myindex.prop
############################################################################
#
4. TEMPLATE-TOOLKIT
- It is recommended to use the Template-Toolkit to define the HTML-output
created by 'swish.cgi'. Obtain it from http://www.template-toolkit.org/ and
use the Perl package manager (ppm) to install it.
- Edit the file 'search.tt' located in
'C:\SWISH-E\share\doc\swish-e\example'
to define your HTML-output. The SWISH-E config file should point out where
it
is located!
- here is my sample of 'search.tt' the result is seen at
http://search.statmaster.sdu.dk
############################################################################
#
[% WRAPPER page %]
[% PROCESS swish_header %]
[% title = PROCESS title %]
[% IF ! search.results %]
[% PROCESS search_form %]
[% PROCESS show_message %]
[% PROCESS swish_footer %]
[% ELSE %]
[% PROCESS search_form %]
[% PROCESS nav_bar %]
[% PROCESS nav_bar_pages %]
[% PROCESS results_list %]
[% PROCESS nav_bar_pages %]
[% PROCESS swish_footer %]
[% END %]
[% END %]
[% # This is just an example -- you would want your own "page" to wrap
around "swish" %]
[% BLOCK page %]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<body bgcolor=white alink=red link=blue vlink=purple onload="if
(document.forms[0]) {document.forms[0].elements[0].focus();}">
<title>
[% title %]
</title>
<style>
BODY { background: white fixed no-repeat left top;
font-family: Helvetica, Arial, sans-serif;
margin: 0;
color: black;
padding : 0px;
margin-left:0px;
margin-right:0px;
margin-top:0px;
margin-bottom:0px;
}
A:link {
font-family: Helvetica, Arial, sans-serif;
font-size: 100%;
text-decoration: none;
background: transparent;
color:#00319c;
}
A:visited {
font-family: Helvetica, Arial, sans-serif;
font-size: 100%;
text-decoration: none;
background: transparent;
color:#cc3399;
}
A:link:active {
font-family: Helvetica, Arial, sans-serif;
font-size: 100%;
text-decoration: none;
color:#00319c;
}
A:link:hover {
font-family: Helvetica, Arial, sans-serif;
text-decoration: underline;
color: #00319c;
}
.smalllink {
color: #00319c;
text-decoration: underline;
font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;
}
.smalltext {
font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;
}
.tinylink {
color: #00319c;
text-decoration: underline;
font-family: Helvetica, Arial, sans-serif;
font-size: 10px;
}
.tinytext {
font-family: Helvetica, Arial, sans-serif;
font-size: 10px;
}
body,td,a,p,.h{font-family:Helvetica, Arial, sans-serif;}
.h{font-size: 20px;}
.q{color:#0000cc;}
</style>
<body>
[% content %]
</body>
</html>
[% END %]
[% BLOCK title %]
[% IF ! search.results %]
[% IF ! search.query_simple %]
Search the Master of Applied Statistics Web Pages
[% ELSE %]
Search: [% search.query_simple | html %]
[% END %]
[% ELSE %]
Search: [% search.query_simple | html %]
[% END %]
[% END %]
[% BLOCK swish_header %]
<!-- Start of topmenu -->
<table width="100%" height="30" cellspacing=0 cellpadding=0 bgcolor=#003399>
<tr>
<td >
</td>
</tr>
</table>
<!-- End of topmenu -->
<br>
<center>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<h1>Search the Master of Applied Statistics Web Pages</h1>
</td>
</tr>
</table>
</center>
<br>
[% END %]
[% BLOCK swish_footer %]
</td>
<td width="8"> </td>
</tr>
</table>
<!-- End of search page -->
<BR><BR>
<!-- Start of bottom line -->
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0">
<TR>
<td width="8"> </td>
<TD colspan="2">
<HR size="1">
<a href=" http://statmaster.sdu.dk"><span
class="smalllink">HOME</span></a><span class="smalltext"> | </span><a
href="javascript:self.history.go(-1)"><span
class="smalllink">Back</span></a>
<P>
<span class="tinytext">Last modified May 6, 2004, <a
href=""><u>Webmaster</u></a></span>
</TD>
<td width="8"> </td>
</TR>
</TABLE>
<!-- End of bottom line -->
[% END %]
[% BLOCK show_message %]
[% IF search.errstr %]
Your search - <b>[% search.query_simple | html %]</b> - did not
match any documents.<br>
No pages were found containing <b>"[% search.query_simple |
html %]"</b>.
[% END %]
[% END %]
[% BLOCK search_form %]
<center>
<table cellspacing="1" border="0" cellpadding="0">
<tr>
<td valign=top colspan=4>
[% CGI.start_form( '-action' => CGI.script_name, '-method' => 'GET' ) %]
[% CGI.textfield( {
name => 'query',
size => 40,
maxlength => 200,
} ) %]
[% CGI.submit('submit',' Search ') %]
</td>
</tr>
<tr>
<td valign=top>
Limit search to:
</td>
<td valign=top colspan=3 >
[% search.get_meta_name_limits %]
</td>
</tr>
<tr>
<td valign=top>
Sort by:
</td>
<td colspan=4 valign=top>
[% search.get_sort_select_list %]
</td>
</tr>
<tr>
<tr>
<td valign=top>
Within:
</td>
<td colspan=4 valign=top>
[% search.get_limit_select %]
[% CGI.end_form.join('') %]
</td>
</tr>
<tr>
<td colspan=4 valign=top>
<br>
<center>
<a href="/docs/help"><font size="-1"><u>Search Help</u></font></a><font
size="-1"> - </font><a
href="http://search.statmaster.sdu.dk"><font size="-1"><u>Search
Home</u></font></a><font size="-1"> - </font><a
href="/docs/about"><font size="-1"><u>About</u></font></a>
</center>
</td>
</tr>
</table>
</center>
<!-- Start of search page -->
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0">
<TR>
<td width="8"> </td>
<TD colspan="2">
<BR><BR>
[% END %]
[% BLOCK nav_bar %]
[% search.stopwords_removed %]
<table cellpadding=0 cellspacing=0 border=0 width="100%">
<tr>
<td height=20 >
<b>[% search.navigation('from') %] to [%
search.navigation('to') %] of [% search.navigation('hits') %] matches on
search for "[% search.query_simple | html %]"</b>
</td>
<td align=right >
<font size="-1" color="#ffffff" face="Geneva, Arial,
Helvetica, San-Serif">
</font>
</td>
</tr>
</table>
[% END %]
[% BLOCK nav_bar_pages %]
[% IF search.navigation('hits') > 15 %]
<b>Result pages: </b>
[% END %]
[% IF search.navigation('prev_count') %]
<font size="-1" face="Arial, Helvetica, San-Serif">
<a style="text-decoration:none" href="[% search.query_href
%]&start=[% search.navigation('prev') %]"><u>[<b><<
Prev</b>]</u></a>
</font>
[% END %]
[% FOR page = search.navigation('page_array') %]
[% IF page.cur_page %]
<font size="-1" face="Arial, Helvetica, San-Serif">
<b>[% page.page_number %]</b>
</font>
[% ELSE %]
<font size="-1" face="Arial, Helvetica, San-Serif">
<a style="text-decoration:none" href="[% search.query_href
%];start=[% page.page_start %]"><u>[% page.page_number %]</u></a>
</font>
[% END %]
[% END %]
[% IF search.navigation('next_count') %]
<font size="-1" face="Arial, Helvetica, San-Serif">
<a style="text-decoration:none" href="[% search.query_href
%]&start=[% search.navigation('next') %]"><u>[<b>Next
>></b>]</u></a>
</font>
[% END %]
[% END %]
[% BLOCK results_list %]
[% FOREACH item = search.results %]
<dl>
<dt>
<font face="Arial, San-Serif">
<a href="[% item.swishdocpath_href %]"><u>
[% ( item.swishtitle || item.swishdocpath ) %]
</u></a>
</font>
</dt>
<dt>
<font size="-1" face="Arial, San-Serif" >
[% item.swishdescription %]
</font>
<br>
<font size="-1" face="Arial, San-Serif" color=#008000>
<i>
[% item.swishdocpath %]
- [% item.swishdocsize div 1000 %]k
</i>
</font>
</dt>
</dl>
</font>
[% END %]
[% END %]
############################################################################
#
5. SWISH-E CONFIG FILE '.searchcgi.conf'
- in the 'swish.cgi' tell to load the default config file, use the full path
important!
- this line: my $DEFAULT_CONFIG_FILE = 'C:\FULL-PATH\.searchcgi.conf';
############################################################################
#
use lib 'C:\SWISH-E\lib\swish-e\perl';
return {
title => 'Search',
swish_index => 'C:\SWISH-E\web_index\myindex',
template => {
package => 'SWISH::TemplateToolkit',
file => 'search.tt',
options => {
INCLUDE_PATH => 'C:\SWISH-E\share\doc\swish-e\example',
},
},
};
############################################################################
#
6. SWISH.CGI
- the following is edited in the user-config-section of 'swish.cgi'
- The HTTP interface to the DB is run through 'swish.cgi'
- remember full path to your DEFAULT_CONFIG_FILE
- comment out # use CGI ();
- must have full path to your DB
swish_index => 'C:\SWISH-E\web_index\myindex',
for some reason this is not loaded from the config file??
- edit
sorts => [qw/swishrank swishlastmodified swishtitle headings/],
if you want another sort order, note that these must be defined as metatags
when creating the DB using swish.cfg and the web-spider
- edit
metanames => [qw/ swishdefault title headings /],
if you want another 'limit search to' order, note that these must be defined
as metatags
when creating the DB using swish.cfg and the web-spider
- edit: name_labels => {
headings => 'Headings',
title => 'Title',
these will be shown as the labels in the 'limit search to'
- edit: select_by_meta => {
the value: metaname => 'course',
must match the metaname defined in 'swish.cfg' when creating the DB, e.g.
---
MetaNames course
ExtractPath course regex !^.*/courses/([^/]+)/.*$!$1!
ExtractPathDefault course other
---
- the select_by_meta makes you define to seach parts of the www-path that
you have spidered!!!
values => [qw/PATH1 PATH2 ... PATHN/],
- the values here must be values on the path following the regular
expression of 'course'!!!
- remember to set 'use_library => 1' to use the SWISH::API
- locate the perl-function 'sub handler {' (the mod_perl entry) and change
the following:
#return Apache::Constants::OK();
return Apache::OK;
7. EDIT TEMPLATETOOLKIT.PM
- if you are using the select_by_meta you should edit the following in
TemplateToolkit.pm, where you located your swish-perl-modules, e.g.
'C:\SWISH-E\lib\swish-e\perl\SWISH'
- the problem is when using the 'popup_menu' in select_by_meta your'll need
a default value to specify to search in the whole DB.
- edit the following function:
############################################################################
#
sub get_limit_select {
my ( $results ) = @_;
my $q = $results->CGI;
my $limit = $results->config('select_by_meta');
return '' unless ref $limit eq 'HASH';
my $method = $limit->{method} || 'checkbox_group';
my $labels = $limit->{labels} || {}; # new
$labels->{''} = 'All'; # new
my @options = (
-name => 'sbm',
-values => [ '', @{$limit->{values}}], # new
-labels => $labels, # new
# -name => 'sbm',
# -values => $limit->{values},
# -labels => $limit->{labels} || {},
);
push @options, ( -columns=> $limit->{columns} ) if $limit->{columns};
return join "\n",
#'<br>',
#( $limit->{description} || 'Select: '),
$q->$method( @options );
}
############################################################################
#
8. APACHE
- Obtain the win32 webserver from http://httpd.apache.org/
- Install into 'C:\Apache2'
9. MOD_PERL
- mod_perl is a module for Apache that let's you load perl persistently and
perlscripts
- Obtain it from http://perl.apache.org/ and follow the instructions for
win32
using the Perl package manager 'ppm'.
10. EDIT HTTPD.CONF
- locate the text file httpd.conf in '\Apache2\conf' and add the following:
############################################################################
#
LoadFile "C:/Perl/bin/perl58.dll"
LoadModule perl_module modules/mod_perl.so
PerlRequire "C:/Apache2/conf/extra.pl"
<Perl>
use lib "C:/Apache2/cgi-bin";
use lib "C:/SWISH-E/lib/swish-e/perl";
require "C:/FULL-PATH/swish.cgi";
</Perl>
Alias /cgi-bin/ "C:/Apache2/cgi-bin/"
<Location /cgi-bin/>
PerlSetVar Swish_Conf_File "C:/FULL-PATH/.searchcgi.conf"
allow from all
SetHandler perl-script
PerlHandler SwishSearch
</Location>
############################################################################
#
- this makes Apache load Perl and swish.cgi persistently running SwishSearch
from location '/cgi-bin/'
- changes in httpd.conf, swish.cgi, extra.pl, .swishcgi.conf etc. requires
a restart of Apache.
- check \Apache2\logs\error.log for errors at every restart
- the text file 'extra.pl' as the following
############################################################################
#
use Apache2 ();
use ModPerl::Util ();
use Apache::RequestRec ();
use Apache::RequestIO ();
use Apache::RequestUtil ();
use Apache::Server ();
use Apache::ServerUtil ();
use Apache::Connection ();
use Apache::Log ();
use Apache::Const -compile => ':common';
use APR::Const -compile => ':common';
use APR::Table ();
use Apache::compat ();
use ModPerl::Registry ();
use CGI ();
1;
############################################################################
#
Received on Wed May 12 03:26:52 2004