Bill,
The "test_response" call-back subroutine that you suggested to find the
lastmodified date worked perfectly for my setup. It was what I was going
to try to do myself, but since I am only a semi-competent perl
programmer, it would have been a lot uglier.
Thanks again!
Paul J. Kissman
Library Information Systems Specialist
Massachusetts Board of Library Commissioners
648 Beacon St.
Boston, MA 02215
paul.kissman@state.ma.us
www.mlin.lib.ma.us or www.mlin.org
617-267-9400 / 800-952-7403 (in-state)
Fax: 617-421-9833
-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Wednesday, November 26, 2003 10:45 AM
To: Kissman, Paul (BLC)
Subject: Re: [SWISH-E]
On Wed, Nov 26, 2003 at 10:04:17AM -0500, Kissman, Paul (BLC) wrote:
> Unfortunately, the Xbithack doesn't seem to work for me. My web server
> is running iPlanet (Netscape Enterprise 4.1) and after a couple of
tries
> and some digging around I find out that even with Netscape's version
of
> Xbithack (ObjectType fn=shtml-hacktype exec-hack=true in the obj.conf
> file) the server treats SSI files as categorically dynamic. (Netscape
> Knowledge Base Article 2246).
Ah, I just assumed you were running Apache.
> Bill Conlon suggested that I simply put a date meta tag in my document
> <head> area. I have several thousand documents and may write a perl
> script to automate that process or do what I originally suggested --
go
> out and look for the file and grab its modification timestamp at
> indexing time.
It depends on how many documents you have and how fast you want indexing
to happen. I would probably just stat the files.
I have this for spidering apache.org in a "test_response" call-back
subroutine in the spider.pl config file:
# Try and add dates if missing
if ( ! $response->last_modified ) {
my $path = $response->base;
$path =~ s!http://!/www/!;
if ( my $time = ( stat $path )[9] ) {
$response->last_modified( $time );
}
}
All their "sites" are in the /www directory, so that just simply replace
http:// with /www/.
--
Bill Moseley
moseley@hank.org
Received on Wed Nov 26 17:12:25 2003