Skip to main content.
home | support | download

Back to List Archive

Re: Getting Results faster? (libswish-e)

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Dec 15 2004 - 14:47:31 GMT
On Wed, Dec 15, 2004 at 12:43:53AM -0800, Gunnar Mätzler wrote:
> 1. I use a results list, which fills itself by using a callback function. So
> it fetches only the results which have to be displayed at a given time.
> But i am a bit puzzled how to do this. What i do right now is:
> 
> SwishSeekResult( results, number_of_result_to_display ) - to set the results
> pointer to the wanted result
> result = SwishNextResult( results ); - to get the result.
> string = SwishResultPropertyStr( result, "swishdocpath" ); - to get the
> docpath.

Yes, I think that looks correct.  

If you don't display all properties, you might just store the result
and let your output generation code call functions on the result using
getResultPropValue() and freeing after calling with
freeResultPropValue().  That might save reading properties you don't
need.

> Is this correct? I am not so sure whether i am skipping a result or not. If
> for example i set the results pointer to result number 500 with
> "SwishSeekResult", doesn't "SwishNextResult" give me result number 501?

Yes -- but that's because SwishSeekResult() takes a zero-based offset,
but the results are numbered starting from 1.

SwishNextResult() actually returns the "current" result as set by
SwishSeekResutl() and then moves the pointer to the next result.

    /* Check for a unique index file */
    if (!results->db_results->next)
    {
        if ((res = results->db_results->currentresult))
        {
            /* Increase Pointer */
            results->db_results->currentresult = res->next;
        }
    }
    else -- a bit more complex when searching multiple indexes.


> 2. This method seems a bit slow. I can actually see the list filling itself
> line by line (while hearing a lot of hard disk access). It gets better after
> i scrolled throu the complete list a few times. Is there a faster way to get
> a specific result out of the results list? I will definitely have to speed
> it up somehow.

The result list is in memory, and is a linked list.  Accessing it is
fast.  Reading a result's properties is slower since it has to go to
disk.  First time swish reads a property for a given file it reads a
table of file offsets into memory that tell swish where the individual
props are stored on disk.  Reading subsequent properties just requires
going to disk again.  All the properties are stored at the same place
on disk so you should get some OS buffering.  Larger properties are
compressed to help reduce the number of times the disk must be read.

It might be worth profiling your code to see where it's going slowly.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Dec 15 06:47:39 2004