On Sunday 04 February 2001 11:57, Rainer.Scherg@rexroth.de wrote:
> Which version of swish did you try?
>
> If you tried the 1.3x version, please switch to the 2.0.x version.
Thanks!, the new version also has a much better install process, I
just got Swish 2.04 up and running, and searching a test-index using a Perl
script I found in my hosting provider's free CGI library. It now works with
the non-English characters, brilliant!.
Only one more detail; I would like some advise on how to include a meta
description of the page in the search results. Currently, this is what the
result page would look like when using the CGI script as it came:
-- example page --
Search Results
Keywords: word1
1000 <link>"Page title"</link> (1321 bytes)
--- end of page --
To enable the description, in my Swish config file, I added:
PropertyNames description
And the spider then picks up the info from a HTML page:
<meta name="description" content="this is a page description..">
And I now can return it using the -p option on the command line.
But unfortunately I'm not much of a Perl Wizard, and so I can't quite figure
out how to add it in the Perl script. Could someone kindly have a look at the
script let me how to use -p with it so the description could be placed under
the link of each search results.
The script comes in three parts:
1) search.pl, its the part I can't fix.
2) util.pl, header and footer stuff.
3) search.html, the HTML form.
But I guess it would only be necessary to change something in the first part,
below:
#!/usr/bin/perl
#
# search.pl
#
# simple interface to SWISH
#
require 'util.pl';
$| = 1; # unbuffer the data
# whereis swish
$swishexec = "/home/kati/public_html/swish/src/swish-e";
unless (-e $swishexec) {
&print_header_info("Cannot open $swishexec");
print <<ENDERROR;
<h2><u>Cannot open $swishexec</u></h2>
Cannot open \"$swishexec\". File not found or permission denied.
<p>
ENDERROR
&print_footer_info();
exit(0);
}
# get the form data
&parse_form_data(*array);
# required variable in the html form:
# --swishindex, keywords
# optional fields:
# --maxresults
if ($array{'swishindex'} eq "") {
# not happy crappy
&print_header_info("Swishindex Variable Not Specified");
print <<ENDERROR;
<h2><u>Form Incomplete</u></h2>
The form is incomplete.... no \"swishindex\" variable is
available. The \"swishindex\" variable specifies the
pathname to the swish index.
<p>
ENDERROR
&print_footer_info();
exit(0);
}
if ($array{'keywords'} eq "") {
# not happy crappy
&print_header_info("Data Incomplete");
print <<ENDERROR;
<h2><u>Data Incomplete</u></h2>
Your request to search has been
rejected due to insufficient information. To properly send
your search request, please provide one or more keywords.
<p>
search example 1: john and doe or jane<br>
search example 2: john and (doe or jane)<br>
search example 3: not (john or jane) and doe<br>
search example 4: j* and doe<br>
<p>
ENDERROR
&print_footer_info();
exit(0);
}
# everything is happy, open up a pipe to the swish executable
$command = "$swishexec -f $array{'swishindex'} -w \"$array{'keywords'}\"";
if ($array{'maxresults'} ne "") {
$command .= " -m $array{'maxresults'}";
}
&print_header_info("Search Results");
print "<h2>Search Results</h2>\n";
print "Keywords: <b>$array{'keywords'}</b>\n<p>\n";
open(SWISH, "$command|");
while (<SWISH>) {
# results of swish can be-
# line beginning with "#"
# line beginning with "."
# line beginning with "err"
# line beginning with "search words:"
# line beginning with relevance rank [0-9]
if (/^\./) {
last;
}
elsif (/^err:/) {
print "$_";
last;
}
elsif (/^[0-9]/) {
chop;
# can't simply split because spaces can exit in title
$firstspace = index("$_", "\ ", 0);
if ($firstspace == -1) {
next;
}
$secondspace = index("$_", "\ ", ($firstspace+1));
if ($secondspace == -1) {
next;
}
$lastspace = rindex("$_", "\ ");
if ($lastspace == -1) {
next;
}
$rank = substr($_, 0, $firstspace);
$url = substr($_, ($firstspace+1), ($secondspace-$firstspace-1));
$title = substr($_, ($secondspace+1), ($lastspace-$secondspace-1));
$numbytes = substr($_, ($lastspace+1));
print "$rank <a href=\"$url\">$title</a> ($numbytes bytes)<br>\n";
}
}
close(SWISH);
if ($ENV{'PATH_INFO'} ne "") {
print <<RETURNURL;
<p>
<a href=\"$ENV{'PATH_INFO'}\">Back to search form</a>
<p>
RETURNURL
}
print "<p>\n";
&print_footer_info();
##############################################################################
# eof search.pl
--------------------------------------------------------------------
Below part just does a header and footer, it isn't of much use, but the above
script doesn't run if I exclude it.
#
# util.pl
#
# utilities file with common subroutines
# used by pretty much all of the library CGI scripts
#
##############################################################################
# common subroutines
##############################################################################
################################################
# get the variables by calling parse_form_data
# for example, "&parse_form_data(*array)"
# thanks Stacey :)
sub parse_form_data
{
local (*FORM_DATA) = @_;
local ($request_method, $query_string, @key_value_pairs,
$key_value, $key, $value);
$request_method = $ENV{'REQUEST_METHOD'};
if ($request_method eq "GET") {
$query_string = $ENV{'QUERY_STRING'};
} elsif ($request_method eq "POST") {
read(STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
} else { # neither POST nor GET
$query_string = $ENV{'QUERY_STRING'};
}
@key_value_pairs = split(/&/, $query_string);
foreach $key_value (@key_value_pairs) {
($key, $value) = split (/=/, $key_value);
$key =~ tr/+/ /;
$value =~ tr/+/ /;
$value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex($1))/eg;
if (defined($FORM_DATA{$key})) {
$FORM_DATA{$key} = join("|", $FORM_DATA{$key}, $value);
} else {
$FORM_DATA{$key} = $value;
}
}
}
################################################
# print the footer information
sub print_footer_info
{
print "</td>\n";
print "</tr>\n";
print "</table>\n";
# print out the copyright footer
# NOTE TO RESELLERS/CLIENTS: this is a library specific file,
# delete or comment out for you own use
if (-e "/www/htdocs/includes/copyright.txt") {
open(COPYRIGHT, "/www/htdocs/includes/copyright.txt");
while (<COPYRIGHT>) {
print $_;
}
close(COPYRIGHT);
}
# print out the colorstrip footer
# NOTE TO RESELLERS/CLIENTS: this is a library specific file,
# delete or comment out for you own use
if (-e "/www/htdocs/includes/colorstrip.txt") {
open(COLORSTRIP, "/www/htdocs/includes/colorstrip.txt");
while (<COLORSTRIP>) {
print $_;
}
close(COLORSTRIP);
}
print "</body>\n";
# close it out
print "</html>\n";
}
################################################
# print the header information
sub print_header_info
{
local ($title) = @_;
print "Content-type: text/html\n\n";
# print out the title
print "<html>\n";
print "<head> \n";
print "<title>$title</title>\n";
if (-e "/www/htdocs/includes/javascript/main.js") {
print "<script Language=\"JavaScript\">\n";
print "<!--\n";
open(JS, "/www/htdocs/includes/javascript/main.js");
while (<JS>) {
print $_;
}
print "//-->\n";
print "</script>\n";
}
print "</head> \n";
# print out the header, which should include a <body> tage
if (-e "/www/htdocs/includes/body.txt") {
open(BODY, "/www/htdocs/includes/body.txt");
while (<BODY>) {
print $_;
}
close(BODY);
}
else {
print "<body bgcolor=\"#ffffff\">\n";
}
# print out the toolbar
# NOTE TO RESELLERS/CLIENTS: this is a library specific file,
# delete or comment out for you own use
if (-e "/www/htdocs/includes/toolstrip/support_sub.txt") {
open(TOOLBAR, "/www/htdocs/includes/toolstrip/support_sub.txt");
while (<TOOLBAR>) {
print $_;
}
close(TOOLBAR);
}
print <<ENDHEADER;
<table>
<tr>
<td width=600>
ENDHEADER
}
################################################
# print an error
sub return_error
{
local ($message) = @_;
print <<ENDERROR;
 <br>
<h2><u>Unknown Error</u></h2>
An unknown error has been encountered.
The error message is listed below:
<p>
<ul>
<b>$message</b>
</ul>
<p>
ENDERROR
&print_footer_info();
exit(1);
}
##############################################################################
# eof util.pl
1;
----------------------------------------
Below is the HTML form, it has the Swish index defined in one of its hidden
input fields, which is handy for later modifying with Javascript if using
different indexes.
<html>
<head>
<title>Search Swish-E Index</title>
</head>
<body>
<h1>Search Swish-E Index</h1>
<form method="GET" action="cgi-bin/search.pl">
<input type="hidden" name="swishindex"
value="/home/kati/public_html/swish/test.index">
<b>Search for the following keywords:</b><br>
<input name="keywords" size=40 maxlength=512>
<p>
<b>Maximum number of results:</b><br>
<input name="maxresults" size=5 value=40 maxlength=64>
<p>
<input type="submit" value="Search"> <input type="reset" value="Reset">
<p>
__________________________________________<p>
search example 1: john and doe or jane<br>
search example 2: john and (doe or jane)<br>
search example 3: not (john or jane) and doe<br>
search example 4: j* and doe<br>
<p>
</form>
</body>
</html>
--------------------------
Also, does anyone has any CGIs that I could test for the form and
processing?, I couldn't find much of that on the Swish-E site.
Thanks!
Kati
"I'm prepared for all emergencies but totally unprepared for everyday life."
>
>
> Development of 2.0.x is done at http://swishe.sourceforge.net (docs, etc).
> Download of latest source is also available at http://www.boe.es/swish-e/
> or via links from http://sunsite.berkeley.edu/SWISH-E/
>
>
> tschuess... Rainer
>
> > -----Original Message-----
> > From: Kati Gäbler [mailto:katigaebler@topmail.de]
> > Sent: Sunday, February 04, 2001 10:25 AM
> > To: Multiple recipients of list
> > Subject: [SWISH-E] Q: Swish-E foreign language character support
> >
> >
> > Hello Swish-E users,
> >
> > I just set up Swish-E for the first time, and I got it
> > working successfully
> > using the HTTP method, and doing command line, or CGI-form searching.
> >
> > The only feature I'm missing is support for some of the
> > foreign language
> > characters. For example, commonly used German characters are:
> > üäöß. And
> > various Spanish, Italian, and French characters are: ùèìòéí¿ó
> > ñ etc., they
> > don't get returned in the search results, I guess because they're not
> > indexed, as it states in the Swish-E config file, only use these ones:
> >
> > abcdefghijklmnopqrstuvwxyz0123456789_\|/-+=?!@$%^'"`~,.[]{}()
> >
> > In any case, I tested adding the characters üäö in the config
> > file and in the
> > HTML files, and re-indexed, but as I expected, it didn't
> > work. Although, I
> > also tested using German characters (üöä) in the META tags
> > ("PropertyNames"),
> > and notably in that situation they worked when I did a
> > command line search
> > with the -p option.
> >
> > I have a site in Spanish, Italian, French, German and
> > English, so for my
> > purpose its important to make these foreign characters work.
> > Does anyone know
> > a fix to this? or would much of the Swish-E code need to be
> > re-built to make
> > it work?. As such, maybe it would be better for me to find
> > another search
> > engine, any advise on this foreign-character problem would be
> > appreciated!
> >
> > Regards,
> > Kati
> >
> > PS: I just joined the list, so I'm not sure if its working,
> > please include a
> > CC of any replies to me at katigaebler@topmail.de. Thanks.
> >
> > --
> >
> > Rules:
> > (1) The boss is always right.
> > (2) When the boss is wrong, refer to rule 1.
> >
> >
> > -----------------------------------------------------------
> > This Mail has been checked for Viruses
> > Attention: Encrypted Mails can NOT be checked !
> >
> > ***
> >
> > Diese Mail wurde auf Viren ueberprueft
> > Hinweis: Verschluesselte Mails koennen NICHT geprueft werden!
> > ------------------------------------------------------------
Received on Mon Feb 5 00:46:20 2001