Skip to main content.
home | support | download

Back to List Archive

Re: Sorting by swishlastmodified...

From: David Wood <dwood(at)not-real.inter.nl.net>
Date: Fri Apr 06 2001 - 16:59:50 GMT
Hi Bill and Rainer,

Bill, I did see your note about the new "prog" stuff and I'm certainly 
interested, but that new spider is more complex than the previous one, and 
we have some somewhat weird customisations to the previous one, and I just 
haven't had the chance to play around with the new one enough yet.

On the other hand, would the patch below fix the 'old' spider?  The idea is 
that if you get HTTP code 200 back in swishspider then you write the 
Last-Modified date into the .response file as well, and write it in seconds 
since epoch format to save the C code having to muck around with date 
formats, localisation, etc.

cheers,

David


---


swishspider:
4a5
 > use HTTP::Date;
32a34
 >     print RESP str2time($response->header("last-modified")) . "\n";


httpserver.c:
72a73
 >     static time_t lastmodified=0;
149c150
<               if (get(sw,contenttype, &server->lastretrieval, buffer) == 
200) {
---
 >               if (get(sw,contenttype, &lastmodified, 
&server->lastretrieval, buffer) == 200) {


http.c:
205c205
< int get(SWISH *sw, char *contenttype_or_redirect, time_t *plastretrieval, 
char *url)
---
 > int get(SWISH *sw, char *contenttype_or_redirect, time_t *lastmodified, 
time_t *plastretrieval, char *url)
257a258,263
 >       if (code == 200) {
 >       /* read last-modified
 >       **/
 >       fgets(buffer, lenbuffer, fp);
 >       *lastmodified = atol(buffer);
 >       }
372a379
 > static time_t lastmodified=0;
406c413
<               if ((code = get(sw, contenttype, &server->lastretrieval, 
item->url)) == 200) {
---
 >               if ((code = get(sw, contenttype, &lastmodified, 
&server->lastretrieval, item->url)) == 200) {
443c450
<                       fprop->mtime = 0;  /* $$ see above */
---
 >                       fprop->mtime = lastmodified;


http.h:
13c13
< int get (SWISH *sw, char *contenttype_or_redirect, time_t 
*plastretrieval, char *url);
---
 > int get (SWISH *sw, char *contenttype_or_redirect, time_t *lastmodified, 
time_t *plastretrieval, char *url);



At 11:35 06-04-01, you wrote:


>On Thu, 5 Apr 2001, David Wood wrote:
>
> > Hi folks,
> >
> > Using '... -s swishlastmodified desc' _almost_ works perfectly.  The only
> > problem I've uncovered is that, if you've created your index via 
> spidering,
> > there's no swishlastmodified stored, I guess because the files aren't 
> local
> > and stat'able, and so all dates for spidered content come back as 31 Dec.
> > 1969!  But don't nicely behaved web servers pass a Last Modified HTTP
> > header to clients?  If so, might we be able to use that to set
> > swishlastmodified when creating a spider-generated index?
>
>You might have missed my last post.  If you use the "prog" method with the
>provided spider.pl you will get the last modified date, plus it will
>probably spider faster.
>
Received on Fri Apr 6 17:02:01 2001