On Fri, 21 Feb 2003, Gentile, Jeff wrote:
> You had pointed me in the "prog" direction to ignore my "header"
> section in my technotes, which is the first 8 lines... I looked that
> the "DirTree.pl" and the example in SWISH-RUN... and when I index the
> files by the normal method, I get 41917 words, however, when I use the
> prog method, I get 161 words. Even accounting for the words in my
> header, there can't be more then 100 unique words there...
>
> Any ideas what I'm doing wrong?
Not really. But you can run you "prog" program without swish and see
exactly what it is generating and then use -T indexed_words when indexing
to see what words are indexed.
> my @line=<FH>;
>
> for (1..8) {
> shift @line;
> }
>
> foreach (@line) {
> $docsize .= $_;
> }
TIMTOWTDI, but I guess I'd do something like:
my @doc = <FH>;
my $doc = join "\n", splice( @doc, 8 );
my $size = length $doc;
if you are sure it will always be the first eight lines.
--
Bill Moseley moseley@hank.org
Received on Fri Feb 21 22:35:25 2003