Skip to main content.
home | support | download

Back to List Archive

Re: Sorting by swishlastmodified...

From: <Rainer.Scherg(at)not-real.rexroth.de>
Date: Fri Apr 06 2001 - 17:44:55 GMT
We have to check this (more Bill and Jose, because
I'm a little occupied by my job right now...)


But I will do a quick change for DATE values (swishlastmodified),
so we don't get any confusion if  -x "<swishlastmodified fmr=/%d/>"
will return strftime "days"  or printf "seconds sice epoch".

I will change this to fmt=/%ld/.

Exactly this property format string will return "seconds since epoch"
for DATE type properties.

e.g.
  -x "<swishlastmodified fmr=/%d/>"    will return day in month (e.g. "6")
  -x "<swishlastmodified fmr=/%ld/>"   will return seconds since epoch
  "%ld"   will be a fixed string!
  which means    fmt="secs: %ld"   will _not_ return seconds since epoch.
  

cu - rainer


> -----Original Message-----
> From: David Wood [mailto:dwood@inter.nl.net]
> Sent: Friday, April 06, 2001 6:59 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: Sorting by swishlastmodified...
> 
> 
> Hi Bill and Rainer,
> 
> Bill, I did see your note about the new "prog" stuff and I'm 
> certainly 
> interested, but that new spider is more complex than the 
> previous one, and 
> we have some somewhat weird customisations to the previous 
> one, and I just 
> haven't had the chance to play around with the new one enough yet.
> 
> On the other hand, would the patch below fix the 'old' 
> spider?  The idea is 
> that if you get HTTP code 200 back in swishspider then you write the 
> Last-Modified date into the .response file as well, and write 
> it in seconds 
> since epoch format to save the C code having to muck around with date 
> formats, localisation, etc.
> 
> cheers,
> 
> David
> 
> 
> ---
> 
> 
> swishspider:
> 4a5
>  > use HTTP::Date;
> 32a34
>  >     print RESP str2time($response->header("last-modified")) . "\n";
> 
> 
> httpserver.c:
> 72a73
>  >     static time_t lastmodified=0;
> 149c150
> <               if (get(sw,contenttype, 
> &server->lastretrieval, buffer) == 
> 200) {
> ---
>  >               if (get(sw,contenttype, &lastmodified, 
> &server->lastretrieval, buffer) == 200) {
> 
> 
> http.c:
> 205c205
> < int get(SWISH *sw, char *contenttype_or_redirect, time_t 
> *plastretrieval, 
> char *url)
> ---
>  > int get(SWISH *sw, char *contenttype_or_redirect, time_t 
> *lastmodified, 
> time_t *plastretrieval, char *url)
> 257a258,263
>  >       if (code == 200) {
>  >       /* read last-modified
>  >       **/
>  >       fgets(buffer, lenbuffer, fp);
>  >       *lastmodified = atol(buffer);
>  >       }
> 372a379
>  > static time_t lastmodified=0;
> 406c413
> <               if ((code = get(sw, contenttype, 
> &server->lastretrieval, 
> item->url)) == 200) {
> ---
>  >               if ((code = get(sw, contenttype, &lastmodified, 
> &server->lastretrieval, item->url)) == 200) {
> 443c450
> <                       fprop->mtime = 0;  /* $$ see above */
> ---
>  >                       fprop->mtime = lastmodified;
> 
> 
> http.h:
> 13c13
> < int get (SWISH *sw, char *contenttype_or_redirect, time_t 
> *plastretrieval, char *url);
> ---
>  > int get (SWISH *sw, char *contenttype_or_redirect, time_t 
> *lastmodified, 
> time_t *plastretrieval, char *url);
> 
> 
> 
> At 11:35 06-04-01, you wrote:
> 
> 
> >On Thu, 5 Apr 2001, David Wood wrote:
> >
> > > Hi folks,
> > >
> > > Using '... -s swishlastmodified desc' _almost_ works 
> perfectly.  The only
> > > problem I've uncovered is that, if you've created your index via 
> > spidering,
> > > there's no swishlastmodified stored, I guess because the 
> files aren't 
> > local
> > > and stat'able, and so all dates for spidered content come 
> back as 31 Dec.
> > > 1969!  But don't nicely behaved web servers pass a Last 
> Modified HTTP
> > > header to clients?  If so, might we be able to use that to set
> > > swishlastmodified when creating a spider-generated index?
> >
> >You might have missed my last post.  If you use the "prog" 
> method with the
> >provided spider.pl you will get the last modified date, plus it will
> >probably spider faster.
> >
> 
> 
> 
> -----------------------------------------------------------
> This Mail has been checked for Viruses
> Attention: Encrypted Mails can NOT be checked !
> 
> ***
> 
> Diese Mail wurde auf Viren ueberprueft
> Hinweis: Verschluesselte Mails koennen NICHT geprueft werden!
> ------------------------------------------------------------
> 
Received on Fri Apr 6 17:46:23 2001