Skip to main content.
home | support | download

Back to List Archive

Re: Word Frequency List

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Feb 16 2005 - 19:13:29 GMT
Bill Moseley scribbled on 2/16/05 12:38 PM:

> 
> How about:
> 
> $ swish-e -T index_words | grep . | perl -nae 'print "$F[3] $F[0]\n"' | sort -rn
> 


this doesn't quite work, at least not the way my script does.

karpet@cartermac 16% perl countwords
   word  count  unique docs
==========================
  ocean  2      2
    the  2      2
   over  2      2
   lies  2      2
     my  2      2
foobar  1      1
  sarah  1      1
karpet@cartermac 17% /usr/local/swish-e/latest/bin/swish-e -T index_words | grep 
 | perl -nae 'print "$F[3] $F[0]\n"' | sort -rn
1 the
1 sarah
1 over
1 ocean
1 my
1 lies
1 foobar
in ----->


the script records the total count across the index and the number of docs each 
word is in.

I was using a BTREE version of swish-e, fwiw.
-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Wed Feb 16 11:13:30 2005