Skip to main content.
home | support | download

Back to List Archive

RE:

From: Kissman, Paul (BLC) <Paul.Kissman(at)not-real.state.ma.us>
Date: Wed Nov 26 2003 - 17:12:08 GMT
Bill,

The "test_response" call-back subroutine that you suggested to find the
lastmodified date worked perfectly for my setup. It was what I was going
to try to do myself, but since I am only a semi-competent perl
programmer, it would have been a lot uglier.

Thanks again!

Paul J. Kissman 
Library Information Systems Specialist
Massachusetts Board of Library Commissioners
648 Beacon St.
Boston, MA  02215
paul.kissman@state.ma.us
www.mlin.lib.ma.us or www.mlin.org
617-267-9400 / 800-952-7403 (in-state)
Fax: 617-421-9833


-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org] 
Sent: Wednesday, November 26, 2003 10:45 AM
To: Kissman, Paul (BLC)
Subject: Re: [SWISH-E]

On Wed, Nov 26, 2003 at 10:04:17AM -0500, Kissman, Paul (BLC) wrote:

> Unfortunately, the Xbithack doesn't seem to work for me. My web server
> is running iPlanet (Netscape Enterprise 4.1) and after a couple of
tries
> and some digging around I find out that even with Netscape's version
of
> Xbithack (ObjectType fn=shtml-hacktype exec-hack=true in the obj.conf
> file) the server treats SSI files as categorically dynamic.  (Netscape
> Knowledge Base Article 2246).

Ah, I just assumed you were running Apache.

> Bill Conlon suggested that I simply put a date meta tag in my document
> <head> area. I have several thousand documents and may write a perl
> script to automate that process or do what I originally suggested --
go
> out and look for the file and grab its modification timestamp at
> indexing time.

It depends on how many documents you have and how fast you want indexing

to happen.  I would probably just stat the files.

I have this for spidering apache.org in a "test_response" call-back 
subroutine in the spider.pl config file:


    # Try and add dates if missing
    if ( ! $response->last_modified ) {
        my $path = $response->base;
        $path =~ s!http://!/www/!;
        if ( my $time = ( stat $path )[9] ) {
            $response->last_modified( $time );
        }
    }

All their "sites" are in the /www directory, so that just simply replace

http:// with /www/.


-- 
Bill Moseley
moseley@hank.org
Received on Wed Nov 26 17:12:25 2003