Skip to main content.
home | support | download

Back to List Archive

Re: Grouping and caching results

From: <brad(at)not-real.auroraquanta.com>
Date: Sun Dec 21 2003 - 17:44:08 GMT
I wouldn't expect swish-e to do this, it is something that we do in our
search cgi using a db table for query results.

Is there an issue with swish-e search speed that I am not aware of?

I would assume that running your stuff on a cluster of a few hundred
servers would make it very fast also. And since google does that, it is
obviously a good idea ;).

Brad
------------------------------------------------------------
 Brad Miele
 Technology Director
 IPNStock
 (866) 476-7862 x902
 bmiele@ipnstock.com

 Cauliflower is nothing but Cabbage with a College Education.
		-- Mark Twain


On Sun, 21 Dec 2003, John Angel wrote:

> Hi Bill,
>
> I think grouping results by site is a great feature (obviously Google
> wouldn't have it if it's not) and swish-e should have it.
>
> I understand the problem is in reading the results. So, we come to another
> great idea - caching of results. Caching would drastically improve search
> speed.
>
> BTW, Mnogosearch already has both grouping results and caching.
>
> Regards,
> Ivan
>
> ----- Original Message -----
> From: "Bill Moseley" <moseley@hank.org>
> To: "John Angel" <angel_john@hotmail.com>
> Cc: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
> Sent: Sunday, December 14, 2003 15:29
> Subject: Re: [SWISH-E] Re: Grouping results
>
>
> > On Sun, Dec 14, 2003 at 12:05:05AM -0800, John Angel wrote:
> > > Bill, is this added to official to-do list? :)
> >
> > For internal to swish?  No.
> > For swish.cgi? No, because it works with the swish-e binary -- to fill
> > out the pages correctly (when there's duplicates) you need to be able to
> > continue to read results.
> >
> > >
> > >
> > > ----- Original Message -----
> > > From: "Bill Moseley" <moseley@hank.org>
> > > To: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
> > > Sent: Monday, December 01, 2003 23:37
> > > Subject: [SWISH-E] Re: Grouping results
> > >
> > >
> > > > On Mon, Dec 01, 2003 at 01:47:15PM -0800, John Angel wrote:
> > > > > That way there will be less than 10 results per page.
> > > > >
> > > > > E.g. what if all 10 results on page are from the same site, there
> will
> > > be
> > > > > only 2 results displayed?
> > > >
> > > > Well, that's what I meant when I said you would need to do some post
> > > > processing.  So instead of saying pages start at 0, 10, 20,... you
> would
> > > > have to track better and just offer previous and next.
> > > >
> > > > So on the first page you fetch enough results to make a complete page.
> > > > Then look ahead for the first record on the "next" page and then pass
> > > > that as the starting location in your links (to the next page).
> > > > "Previous Page" would also need to be tracked in links because you
> can't
> > > just
> > > > subtract 20 from the current location.
> > > >
> > > > Regardless, you would want to use the API so you can easily scan
> through
> > > > all the results.
> > > >
> > > > BTW -- the result list that swish maintains doesn't have backwards
> > > > links, IIRC.  SwishSeek() just starts at the beginning of the linked
> > > > list and walks (runs?) the list looking for the requested entry.  When
> > > > seaching multiple indexes (and sorting by path) swish has to read all
> > > > the pathnames off disk when sorting). So, in other words, you
> > > > may want to avoid seeking too many times.
> > > >
> > > >
> > > > >
> > > > >
> > > > > >From: Bill Moseley <moseley@hank.org>
> > > > > >Reply-To: moseley@hank.org
> > > > > >To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
> > > > > >Subject: [SWISH-E] Re: Grouping results
> > > > > >Date: Tue, 25 Nov 2003 13:28:52 -0800 (PST)
> > > > > >
> > > > > >On Tue, Nov 25, 2003 at 01:24:41PM -0800, Bill Moseley wrote:
> > > > > > > On Sun, Nov 23, 2003 at 12:45:23PM -0800, John Angel wrote:
> > > > > > > > Is it possible to group results by site like on Google (to
> display
> > > > > >only 2
> > > > > > > > hits from the same site, not all of them)?
> > > > > > >
> > > > > > > Did I already respond to this?
> > > > > > >
> > > > > > > You would have to post-process;  Need to think about what to do
> if
> > > > > > > showing a page of results at a time -- you might come up short.
> > > > > > >
> > > > > > > Fake code:
> > > > > > >
> > > > > > > my %seen;
> > > > > > > while ( my $result = next_result() ) {
> > > > > > >     my $uri = URI->new( $result->swishdocpath );
> > > > > > >     next if $seen{ $uri->host }++ == 2;
> > > > > >
> > > > > >I assume you want something more like >= 2.
> > > > > >
> > > > > >
> > > > > > >     show_result( $result );
> > > > > > > }
> > > > > > >
> > > > > > > --
> > > > > > > Bill Moseley
> > > > > > > moseley@hank.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >--
> > > > > >Bill Moseley
> > > > > >moseley@hank.org
> > > > > >
> > > > >
> > > > > _________________________________________________________________
> > > > > Tired of spam? Get advanced junk mail protection with MSN 8.
> > > > > http://join.msn.com/?page=features/junkmail
> > > > >
> > > > >
> > > >
> > > > --
> > > > Bill Moseley
> > > > moseley@hank.org
> > > >
> > > >
> > >
> >
> > --
> > Bill Moseley
> > moseley@hank.org
> >
> >
>
Received on Sun Dec 21 17:44:16 2003