Bill,
Thanks, using the file stat was indeed giving me the wrong content
length, instead for Windows I can read <FH> filehandle to a $variable
and and use length($variable) for the content-length header.
Thanks for the quick and helpful answer!
Lee Thompson
-----Original Message-----
From: swish-e@sunsite.berkeley.edu [
<mailto:swish-e@sunsite.berkeley.edu>
mailto:swish-e@sunsite.berkeley.edu] On Behalf Of moseley@hank.org
Sent: Saturday, November 15, 2003 10:53 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: dirTree.pl
On Sat, Nov 15, 2003 at 07:17:14AM -0800, Lee Thompson wrote:
> Hi,
>
> Has anyone tried modifying dirTree.pl for use on Windows? It does
> find all files, but swish-e doesn't seem to be able to tell where one
> file ends and the next file starts.
Make sure the version you are using does NOT use binmode (swish-e reads
in text mode). Does windows have a standard tool like "od" or "file" to
look at the output from DirTree.pl to see what kind line endings it has?
If that's not it, maybe you are using utf_8 and the
content length is wrong. For that you could output one file with
DirTree.pl, edit it and note the content-length. Then cut all the
header lines, including the blank line between the header and content
and save the file. The resulting file size should be what the
content-lenght header said. That's assuming you have an editor that
won't screw things up and add a line ending at the end if there isn't
already one there.
> Should the data from dirTree.pl have
> something specific that indicates where one file ends and the next
> starts?
No. It knows the end by the content-length.
> It does put in the same headers as spider.pl, spider.pl works fine
> here.
spider.pl uses this to determine the content length (in the event that
the content ends up in utf-8 with multi-byte chars:
# ugly and maybe expensive, but perhaps more portable than "use bytes"
my $bytecount = length pack 'C0a*', $$content;
But DirTree.pl uses the length from the stat command. Hard to imagine
that would be wrong.
The errors show are:
>
> ----------------------------------------
> C:\KaTS\SWISH-E>swish-e -S prog -c conf/filetree.config -i
> ./prog-bin/DirTree.pl -f i:\Data\Taxonomy\mydrive.swish-e Indexing
> Data Source: "External-Program" Indexing "./prog-bin/DirTree.pl"
> External Program found: ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Logging information for IE6Setup.exe
> ...' from program ./prog-bin/DirTree.pl /WINNT/Active Setup Log.txt -
> Using TXT2 parser - (2819 words)
>
> Warning: Unknown header line: 'b.dll' from program
> ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Search fixed drives = FALSE' from
> program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Search remote drives = FALSE' from
> program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Search removable drives = FALSE' from
> program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Search CD-ROM drives = FALSE' from
> program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Search specific directories = TRUE'
> from program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Custom directories =
> C:\WINNT\Microsoft.NET\Framework' from program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Recurse custom dirs = TRUE' from
> program ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'Result = 0' from program
> ./prog-bin/DirTree.pl
>
> Warning: Unknown header line: 'END: Perform action: Search for File'
> from program ./prog-bin/DirTree.pl
> err: External program failed to return required headers Path-Name:
> .----------------------------------------
>
> If I run dirTree.pl on it's own I do get all the correct swish-e
> header lines, for example:
>
> Path-Name: /WINNT/Active Setup Log.txt
> Content-Length: 21382
> Last-Mtime: 1062197811
> Document-Type: TXT*
>
>
>
>
> Lee Thompson
>
>
>
>
>
>
>
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there to
> a plain text message.
> *********************************************************************
>
--
Bill Moseley
moseley@hank.org
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Nov 17 00:45:31 2003