Skip to main content.
home | support | download

Back to List Archive

Re: Observations, problems, etc......

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Aug 11 2005 - 07:56:19 GMT
On Wed, Aug 10, 2005 at 11:20:47PM -0700, Net Virtual Mailing Lists wrote:
> #2. The PERL API seems to be quite robust, but it looks like the only
> real way to interface this to my PHP script is through system commands
> and if I'm going to do that, it seems better to just call the swish-e
> executable directly rather then deal with the overhead of Perl
> mothership.

In general, I've heard people argue that you use the tools that work
best for the situation -- mix your languages for what each does best.
In this situation, though, I think you just need someone with PHP
experience at linking to C libraries.

> to search via a zip code radius.  So first I compile a list of all the
> zip codes and then do a (god awful) -w 'zip_code=(11111 OR 22222 OR 33333
> OR ...)'.  I'm not even sure what sort of performance implication this
> would have, because I can't get it to work (more about this later).

I can tell you that it will kill your performance.  Swish does a
complete search for each one.  A database will be much better for
doing that kind of search.

> don't know what I'm asking for here... An internal ZipCode datatype?...
> Perhaps passing in a latitude, longitude, and radius and having it return
> records that fall within it?  I am probably mentioning two subjects in
> one here.

Yes, if you want to find something within X distance of a place you
would likely want to use another method.  Again, some databases may be
specialized to do this kind of work -- I think Postgresql has this
feature.

(BTW -- Isn't it amazing how fast maps.google.com can figure out a route
between two distant and complex locations)


> 
> #3. It would be really nice if it were possible to just output all the
> records without specifying a search of any sort.

Think of something that isn't in the index and search for NOT that
word.


> The reason for this is:
> easy of integration.  It would be very nice to just be able to go to
> Swish-E all the time without first having to decide if its appropriate to
> find data from the database system or Swish-E and then have to write
> separate queries for each.

Swish is a full-text search engine, not a database.


> Basically what it is:
> 
> I have a PropertyName/MetaName which is "category" and it is of the form
> "A.B.C.D".  Each item is assigned to a category (or multiple categories).
>  What I need to get is a count of the number of records that fall into
> each category.

Basically it's an aggregate function.

It would be interesting to see how much slower it is to do that in
SWISH::API compared to your C version.  In perl you would save it in a
hash:

    $counts{ $result->property('category') }++;

Which is probably quite fast.

> This runs only hundredths of a second slower then before I added this
> summarization code.  However, if I implement it using the Perl API, the
> runtime climbs up to about 3 seconds (30 times longer) because the Perl
> script has to process each result and build the summarization.

Is that 3 seconds include the time to start Perl?  Or just do the
counting?  No doubt that it's faster in C, but SWISH::API is a rather
thin layer on top of the C code.  I guess I need to try.  Too late
now, though.

I'd be curious to see how you were timing this.


> #5. Up until this point, I think, I've been talking mostly about "feature
> requests".  But, all these considerations aside, I am now stuck.  I'm
> hoping someone can help.  It seems I can't find records which have a
> MetaName that is numeric?
> 
> su-2.05a$ /usr/local/bin/swish-e -m 1 -w 'zip=55'
> # SWISH format: 2.4.3
> # Search words: zip=55
> # Removed stopwords: 
> err: No search words specified

Sounds like you have been messing with wordcharacters and left out
digits.

moseley@bumby:~$ swish-e -w 'zip=55'         
# SWISH format: 2.5.4
# Search words: zip=55
# Removed stopwords: 
err: no results
.

> In the first case when I specified a number it doesn't seem to get past
> isMetaNameOpNext. I don't think the problem is here, but somewhere the
> code seems to be removing numbers from the search words?  Any idea why or
> what I can to prevent this?

Use -H9 and see what Parsed Words: and wordcharacters shows.

moseley@bumby:~$ swish-e -w 'zip=55' -H9 | grep Parsed
# Parsed Words: zip = 55





-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Aug 11 00:56:30 2005