Skip to main content.
home | support | download

Back to List Archive

Re: Ignore Question

From: Gentile, Jeff <GentileJ(at)not-real.netscout.com>
Date: Fri Feb 28 2003 - 16:24:14 GMT
Bill,
	I ended up getting around my header problem by subtracting the 
header size from the file size. However now that I've got my tech
notes indexing, I wanted to get the attachments indexed, and since
you so generously provided the "DirTree.pl" and "pdf2html.pm" code
I figured I'd use it. Unforuntely I've run into the same "Content-
length" problem in pdf2html.pm:
-----------------------------
    my $txt = <<EOF;
<html>    
<head>
$headers
</head>
<body>
<pre>
$$content_ref
</pre>
</body>
</html>
EOF

    if ( ref $file_or_content ) {
        unlink $file;
        return \$txt;
    }

    my $mtime  = (stat $file )[9];
--->use bytes;
    my $size = length $txt;
--->no bytes;
    my $ret = <<EOF;
Content-Length: $size
Last-Mtime: $mtime
Path-Name: $file

EOF

$ret .= $txt;

    return \$ret;
------------------------

Notice the two lines I pointed to; I looked into the "length" function, 
and it's a "known" issue that even though it says it reports "bytes" it
reports characters, however "use bytes" is supposed to fix that. I've got
a pdf that is 787,244 bytes in size before conversion. The text file size
of the output is 208,786. The content length (either way... not sure if
"use bytes" works properly... still in experimental phase) returns as
208,573.

Have you experienced this with pdf2html.pm? Do you think that I am in a situation
where I should just write it out to a temp file and get the size of that
in pdf2html.pm? or am I missing something again?

Thanks.

Jeff

-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Monday, February 24, 2003 11:14
To: Multiple recipients of list
Subject: [SWISH-E] Re: Ignore Question

So maybe you are right you could print the trimmed doc to a file and see
what stat() reports for the file size.  If perl's length is returning
chars and you have multi-byte chars then you will either need to convert
to 8859-1 in perl or get perl to report the correct length.

I do not know enough about Perl's support of Unicode to know if that's the
likely problem or not.

-- 
Bill Moseley moseley@hank.org
Received on Fri Feb 28 16:24:46 2003