I have two seperate web servers with identical information (PDF and HTML
documents) that I am trying to index and subsequently view through
swish.cgi.
The first server is a Mandrake 7.2 server WITHOUT libxml2 installed. It is
an internal server and was set up with swish 2.4.1 first, and everything
worked great. The indexes were correctly created and a .prop file showed a
reasonable size (20MB), and search-results listings showed the first
couple-hundred characters of the files (swishdescription). My conf file is:
--------------------------------------------
IndexDir /var/www/html/swish/dc-opinions-prog.pl
IndexFile /var/www/html/swish/dc-opinions.index
UseStemming yes
MetaNames swishtitle swishdocpath
ReplaceRules remove /var/www
IndexContents HTML .pdf
IndexContents HTML .html
StoreDescription HTML* <body> 200000
--------------------------------------------
and the dc-opinions-prog.pl file is:
--------------------------------------------
#!/usr/bin/perl -w
use pdf2html;
my ($mtime,$size);
my @files =
`find /var/www/opinions -name '*' -print`;
for (@files) {
chomp();
if ($_ =~ /pdf$/) {
my $html_record_ref = pdf2html($_);
print $$html_record_ref;
} elsif ($_ =~ /html$/) {
$mtime=(stat())[9];
$size=(stat())[7];
print "Content-Length: $size\n";
print "Last-Mtime: $mtime\n";
print "Path-name: $_\n";
print "\n";
open(HTMLFILE, "$_") || die "Error opening $_";
while (read(HTMLFILE, $buffer, 16384)) #Print the Poll HTML
file
{
print $buffer;
}
close(HTMLFILE);
}
}
--------------------------------------------
I am creating the indexes using the following command:
"/usr/local/bin/swish-e -c /var/www/html/swish/dc-opinions.conf -S prog"
When I tried to duplicate this setup with the /exact same data/ on a
Mandrake 8.2 server that DID have libxml2 installed, the .prop file never
got properly populated (only a couple-hundred KB in size) and of course the
swishdescription was not displayed in the swish.cgi search results.
My fix was to re-configure, re-compile and re-install swish on this server
using "./configure --without-libxml2". Now the output on this server matches
exactly that of our internal server.
So, is there something wrong with my config or program that I need to change
to use libxmol2, or is this a feature/bug?
>From the INSTALL file - "Libxml2 is very strongly recommended. It is used
for parsing both HTML and XML files. Swish-e can be built and installed
without libxml2, but the HTML parser built into swish-e is not as accurate
as libxml2" - so obviously I'd like to use libxml2 if possible.
Thanks,
---
Brent DeShazer
Manager of Systems Engineering
U.S. District Court, Kansas
785.295.2574
Received on Thu Jan 22 19:31:07 2004