I was thinking that, but I didn't know how to do it right. I'm not that
familiar with the perl regex, what is this doing to split it? My concern
was that the filename might contain a '/' or a '\' char in it and I didn't
know how to reliably split it. So I thought it was good enough that
swish-e did it for me if the title was blank.
But as long as that is sure to work reliabily, then your way would be best
I think.
> how about retaining at least the file name without the leading path?
>
> my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
> + $content =~ s,<title>(.+?)/([^<]+)</title>,<title>$2</title>,i;
>
>
>
>
> Nick scribbled on 5/10/05 9:12 AM:
>> These two modules create titles inconsistent with the other ones. This
>> is
>> due to the filtering programs using the full path as the title.
>>
>> Obviously it would be best to have a "real" document title, but if we
>> can't have that I think that it would be better to use only the name of
>> the file itself, not the full path. This way it would be consistent
>> between all the modules.
>>
>> I see this comment in pp2html.pm so I don't think I'm too off base here:
>>
>> Currently produces document titles like /tmp/foo1234. Need to alter
>> to pass actual document title.
>>
>>
>> Below are diffs for both modules. I realize that this isn't best (it
>> would be nice to have a "real" title), but I think it is better than it
>> was before.
>>
>>
>> --- XLtoHTML.pm 2004-10-02 18:09:14.000000000 -0500
>> +++ XLtoHTML.pm.patched 2005-05-10 09:08:18.000000000 -0500
>> @@ -37,6 +37,9 @@
>> # update the document's content type
>> $doc->set_content_type( 'text/html' );
>>
>> + # remove the full path in the title
>> + $content_ref =~ s/<title>.*<\/title>/<title><\/title>/i;
>> +
>> # If filtered must return either a reference to the doc or a
>> pathname.
>> return \$content_ref;
>>
>>
>> --- pp2html.pm 2005-03-23 23:55:06.000000000 -0600
>> +++ pp2html.pm.patched 2005-05-10 09:08:11.000000000 -0500
>> @@ -15,6 +15,10 @@
>> my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
>> # update the document's content type
>> $doc->set_content_type( 'text/html' );
>> +
>> + # remove the full path in the title
>> + $content =~ s/<title>.*<\/title>/<title><\/title>/i;
>> +
>> return \$content;
>> }
>>
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
>
Received on Tue May 10 14:44:51 2005