Skip to main content.
home | support | download

Back to List Archive

Re: Proposed changes to pp2html.pm and XLtoHTML.pm

From: Nick <newsgroups(at)not-real.2thebatcave.com>
Date: Tue May 10 2005 - 21:44:51 GMT
I was thinking that, but I didn't know how to do it right.  I'm not that
familiar with the perl regex, what is this doing to split it?  My concern
was that the filename might contain a '/' or a '\' char in it and I didn't
know how to reliably split it.  So I thought it was good enough that
swish-e did it for me if the title was blank.

But as long as that is sure to work reliabily, then your way would be best
I think.

> how about retaining at least the file name without the leading path?
>
>      my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
> +   $content =~ s,<title>(.+?)/([^<]+)</title>,<title>$2</title>,i;
>
>
>
>
> Nick scribbled on 5/10/05 9:12 AM:
>> These two modules create titles inconsistent with the other ones.  This
>> is
>> due to the filtering programs using the full path as the title.
>>
>> Obviously it would be best to have a "real" document title, but if we
>> can't have that I think that it would be better to use only the name of
>> the file itself, not the full path.  This way it would be consistent
>> between all the modules.
>>
>> I see this comment in pp2html.pm so I don't think I'm too off base here:
>>
>> Currently produces document titles like /tmp/foo1234.  Need to alter
>> to pass actual document title.
>>
>>
>> Below are diffs for both modules.  I realize that this isn't best (it
>> would be nice to have a "real" title), but I think it is better than it
>> was before.
>>
>>
>> --- XLtoHTML.pm 2004-10-02 18:09:14.000000000 -0500
>> +++ XLtoHTML.pm.patched 2005-05-10 09:08:18.000000000 -0500
>> @@ -37,6 +37,9 @@
>>      # update the document's content type
>>      $doc->set_content_type( 'text/html' );
>>
>> +    # remove the full path in the title
>> +    $content_ref =~ s/<title>.*<\/title>/<title><\/title>/i;
>> +
>>      # If filtered must return either a reference to the doc or a
>> pathname.
>>      return \$content_ref;
>>
>>
>> --- pp2html.pm  2005-03-23 23:55:06.000000000 -0600
>> +++ pp2html.pm.patched  2005-05-10 09:08:11.000000000 -0500
>> @@ -15,6 +15,10 @@
>>     my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
>>     # update the document's content type
>>     $doc->set_content_type( 'text/html' );
>> +
>> +   # remove the full path in the title
>> +   $content =~ s/<title>.*<\/title>/<title><\/title>/i;
>> +
>>     return \$content;
>>  }
>>
>
> --
> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
>
Received on Tue May 10 14:44:51 2005