Skip to main content.
home | support | download

Back to List Archive

using config.pl

From: Lung.Allen <Allen.Lung(at)not-real.ftb.ca.gov>
Date: Thu Apr 15 2004 - 18:50:21 GMT
I'm taking small steps forward in working with swish-e.  I have multiple configuration files (file1.conf, file2.conf) that create index.file1, index.file1.prop etc.  

This works fine.  I then use swish.cgi to search, no problems.  This all works great.   The contents of file1.conf and file2.conf follow.

File1.conf
IndexDir /app/swish/lib/swish-e/spider.pl
SwishProgParameters default http://10.20.172.100/doc/redhat-config-bind-2.0.0/
IndexFile /var/www/index.file1
ParserWarnLevel 3
FileFilter .pdf pdf2html "'%p' -"
IndexOnly  HTML* .htm .html .asp
IndexContents HTML* .htm .html .shtml .pdf
IndexContents TXT* .txt .log .text
IndexContents XML* .xml
DefaultContents HTML

File2.conf
IndexDir /app/swish/lib/swish-e/spider.pl
SwishProgParameters default http://10.201.12.64
IndexFile /var/www/index.file2
Metanames swishtitle swishdocpath
StoreDescription TXT* 10000
StoreDescription HTML* <body> 10000
IndexContents HTML* .htm .html .asp
IndexContents TXT* .txt .log .text
IndexContents XML* .xml

My next step is to use is to use swishspider.conf like this 'swish-e -S prog -c swishspider.conf' The contents follow:

# Path to configuration file
SwishProgParameters /var/www/config.pl
# Path to spider.pl
IndexDir /app/swish/lib/swish-e/spider.pl
#
IndexOnly  HTML* .htm .html .asp
FileFilter .pdf pdf2html "'%p' -"
IndexContents HTML* .htm .html .shtml .pdf
#
IndexContents TXT* .txt .log .text
#
IndexContents XML* .xml
#
DefaultContents HTML

I then created a config.pl, the contents follow:
# use lib '/app/swish/prog-bin';
# use pdf2html;
# sub pdf {
#       my ( $uri, $server, $response, $content_ref ) = @_;
#       return 1 unless $response->content_type eq 'application/pdf';
#       $server->{counts}{'PDF transformed')++;
#      $$content_ref = ${pdf2html( $content_ref, 'title' )};
#     $$content_ref =~ tr/ / /s;
#    return 1;
# }

        my %serverA = (
                base_url        => 'http://10.201.12.64/',
                email           => 'allen.lung@ftb.ca.gov',
    debug           => DEBUG_URL | DEBUG_FAILED | DEBUG_SKIPPED,
#               link_tags       => [qw/ a frame /],
#               test_url        => \&foo,
        );
        my %serverB = (
                base_url        => 'http://10.20.172.100/doc/redhat-config-bind-2.0.0/',
                email           => 'allen.lung@ftb.ca.gov',
#               link_tags       => [qw/ a frame /],
#               test_url        => \&foo,
        );
@servers = ( \%serverA, \%serverB, );

#               test_url        => sub {
#                       my $uri->path =~ /\. (gif|jpeg|png|doc|pdf)$/;
#                       return 1;
#               },

Is this the proper way to use the config.pl?
This is actually attempting to index .pdf and .doc files!
I do want to index .pdf, .doc and many others.  The first files I want to index beyond what I'm doing now is .pdf!  I hope I'm making sense here.  I started this process with the code that has the #.  Is this the proper location to do the callback subroutines? 
Received on Thu Apr 15 11:50:22 2004