Skip to main content.
home | support | download

Back to List Archive

Re: Applicability of Swish-E... Thoughts?

From: Roman Chyla <chyla(at)>
Date: Fri Jul 08 2005 - 13:49:16 GMT

if I understand it, Greg would like to have something as browsable index 
of categories (at least, something that summarizes categories)
a   15
a.b 20
a.c 25

I was trying to do something similar, look for example here (testing)

it is an external script, that simply counts the number of occurences 
for later browsing/searching

I think this information can be collected from swish-e index too, 
something like dumping metadata out of the index and then counting it

however, we would need an ability to dump only certain parts of index, 
sounds that normal?


Peter Karman napsal(a):
> Net Virtual Mailing Lists scribbled on 7/7/05 4:46 AM:
>>. Yet I can do things like "category=bc" and get a result....
>>I originally tried doing:
>>  <id>278232</id>
>>  <category>a</category>
>>  <category>a.b</category>
>>  <category>a.b.b</category>
>>  <category>a.b.b.g</category>
>>  <category>a.d</category>
>>  <category>a.d.c</category>
>>  <category>a.d.c.bc</category>
>>. but this didn't seem any better....  I feel as though I am missing
>>something very basic here, might you know what it is?....
> you need to add a period as a valid WordCharacters -- the the *Characters config 
> params.
>>What I would really like is a way to say something like "swish-e -w UNIX'
>>and have it return to me something like this:
>>a       15
>>a.b     15
>>a.b.b   5
>>a.b.b.g 2
>>a.b.b.h 3
>>a.b     10
>>a.b.g   10
>>a.b.g.b 10
>>.. where the number to the right is the total count of matching records
>>for each category.
>>Is what I am after here possible with Swish-E?  I know that I can feed
>>the output of it into a script to generate this summary, but this is slow
>>work...   I know nothing about Swish-E is architected at this point, but
>>it almost seems like Swish-E would need to have everything it needs to
>>internally generate this summary very quickly.
> Swish-e is just a text indexer. It can keep track of text, and the context 
> (MetaNames) in which the text is found, and can even store the text itself (as a 
> Property). But it doesn't have any features for summarizing results like you're 
> describing.
> However, I can imagine some ways to still get what you want. If you knew all the 
> possible categories you were interested in, you can use the API to perform a 
> series of searches on an open index (or indexes) and still make it go pretty fast.
> Example (in Perl) (UNTESTED!):
> use SWISH::API;
> my $swish = SWISH::API->new( 'index.swish-e' );
> my $q = 'UNIX';
> my @categories = qw( a a.b a.b.b a.b.b.g a.b.b.h a.b.g );
> my %count;
> for my $c (@categories)
> {
>      my $results = $swish->Query( "$q and category=$c" );
>      $count{$c} = $results->Hits || 0;
> }
> # do something with the count
> for my $c (@categories)
> {
>      print "$c    $count{$c}\n";
> }
Received on Fri Jul 8 06:49:29 2005