Should the (.+) be a (.*)? What if you had a file in the root dir like so:
/my document.doc
That would be rare but possible, and in this case I don't think it would
strip the /.
What do you think about the possiblity of a '/' or a '\' in the name of
the file? That seems highly unlikely (I don't even think windows will let
you do it), but as far as I know it is possible under *nix.
>
>
> Nick scribbled on 5/10/05 4:43 PM:
>> I was thinking that, but I didn't know how to do it right. I'm not that
>> familiar with the perl regex, what is this doing to split it? My
>> concern
>> was that the filename might contain a '/' or a '\' char in it and I
>> didn't
>> know how to reliably split it.
>
> I see in a test that my original doesn't work for long path names.
>
> I don't have a windows box to test it on, so I don't know which path
> separator
> the filter uses under Windows: '\' or '/'.
>
> But this should catch either, I would think:
>
> $content =~ s,<title>(.+)[\\/]([^<]+)</title>,<title>$2</title>,i;
>
> that says, in english:
>
> match '<title>' followed by one or more characters, till you find a / or a
> \
> (escaping the \ since it is a special char), followed by one or more 'not
> <'
> characters, followed by '</title>'
>
> the .+ is greedy, so it should match multiple instances of .+[\\/] till it
> hits
> the end of the path name.
>
> try it out and see if it works for you. If it does, I'll make the change
> and
> check it in.
>
>
>
>
>>
>>>how about retaining at least the file name without the leading path?
>>>
>>> my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
>>>+ $content =~ s,<title>(.+?)/([^<]+)</title>,<title>$2</title>,i;
>>>
>>>
>>>
>>>
>>>Nick scribbled on 5/10/05 9:12 AM:
>>>
>>>>These two modules create titles inconsistent with the other ones. This
>>>>is
>>>>due to the filtering programs using the full path as the title.
>>>>
>>>>Obviously it would be best to have a "real" document title, but if we
>>>>can't have that I think that it would be better to use only the name of
>>>>the file itself, not the full path. This way it would be consistent
>>>>between all the modules.
>>>>
>>>>I see this comment in pp2html.pm so I don't think I'm too off base
>>>> here:
>>>>
>>>>Currently produces document titles like /tmp/foo1234. Need to alter
>>>>to pass actual document title.
>>>>
>>>>
>>>>Below are diffs for both modules. I realize that this isn't best (it
>>>>would be nice to have a "real" title), but I think it is better than it
>>>>was before.
>>>>
>>>>
>>>>--- XLtoHTML.pm 2004-10-02 18:09:14.000000000 -0500
>>>>+++ XLtoHTML.pm.patched 2005-05-10 09:08:18.000000000 -0500
>>>>@@ -37,6 +37,9 @@
>>>> # update the document's content type
>>>> $doc->set_content_type( 'text/html' );
>>>>
>>>>+ # remove the full path in the title
>>>>+ $content_ref =~ s/<title>.*<\/title>/<title><\/title>/i;
>>>>+
>>>> # If filtered must return either a reference to the doc or a
>>>>pathname.
>>>> return \$content_ref;
>>>>
>>>>
>>>>--- pp2html.pm 2005-03-23 23:55:06.000000000 -0600
>>>>+++ pp2html.pm.patched 2005-05-10 09:08:11.000000000 -0500
>>>>@@ -15,6 +15,10 @@
>>>> my $content = $self->run_ppthtml( $doc->fetch_filename ) || return;
>>>> # update the document's content type
>>>> $doc->set_content_type( 'text/html' );
>>>>+
>>>>+ # remove the full path in the title
>>>>+ $content =~ s/<title>.*<\/title>/<title><\/title>/i;
>>>>+
>>>> return \$content;
>>>> }
>>>>
>>>
>>>--
>>>Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
>>>
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
>
Received on Tue May 10 15:00:11 2005