Skip to main content.
home | support | download

Back to List Archive

Applicability of Swish-E... Thoughts?

From: <mailinglists(at)not-real.net-virtual.com>
Date: Wed Jul 06 2005 - 21:59:46 GMT
Hello,

I am wondering if Swish-E would be well-suited for use for my application,
any "gotchas" I might run into, and possible solutions to some things I
already know about.

I currently am running a web site fed entirely from Postgres.  That has
performed fine but as the database begins to grow close to about 50,000
records performance of one area is a growing concern (I'll explain this in
a moment).  Two important components of this are:

#1. Using the ltree module for handling data in "categories" (more about
this later).
#2. Using the tsearch module for full-text searching/indexing.

I'm quite sure that Swish-E can accomodate #2 with no difficulty.

As for #1....  All these rows fall into a heirachal category "a.b.c.d",
"a.b.c.e", it is not possible for items to be in a sub-category ("a.b" for
example).  One of the important things is to be able to get the
information, based on the result of the query, as to how many items fall
into each category.  I think it is too length of a process to attempt to
summarize this in the script because it involves iterating through each
one of the results, even for categories which are not currently selected
for viewing.

The result I currently get from the database is something like:

category|count
--------|-----
a       |22
a.b     |10
a.b.a   |2
a.b.a.a |1
a.b.a.b |1
a.b.b   |8
a.b.b.a |6
a.b.b.b |2
a.c     |12
a.c.a   |4
a.c.a.a |2
a.c.a.b |2
a.c.b   |8
a.c.b.a |6
a.c.b.b |2

. This involves a join between the data table and the category table,
doing a group by on that result.  Although it probably doesn't apply much
to this discussion, the actual query looks something like:

SELECT
  count(*) as count,
  category.category,
  nlevel(category.category) AS level,
  subpath(category.category,0,nlevel(category.category)-1) as parent,
  category.head_title,
  category.cat_title,
  category.subcat_title
FROM
  data, category
WHERE
  data.category <@ category.category
GROUP BY
  category.category,
  category.head_title,
  category.cat_title,
  category.subcat_title

.. Using this, Postgres will summarize the counts for all the levels
above 4.  I hope this makes sense.  I tried just selecting the category
from the data table and processing them in my script, but this was slower
then letting Postgres do it.  This is the fundamental area where I need to
get a performance boost.  Can Swish-E help me out here?

As a second (somewhat related) matter, I understand that it is not
possible to add a "record/file" to the index.  The suggested method seems
to be to create an incremental index and search that as well.  I can
imagine a solution where I might put each category into a separate index
(there are 202 categories) and this would seem to imply I would need to
create (possibly) up to 202 incremental files.  How severe, if at all,
would the performance be of searching 202 medium sized indexes and up to
202 very small indexes as opposed to one large index?

As a third matter, throughout the day records are deleted from the
database.  I need these records to come out of the index ASAP.   I am okay
in the user-interface just saying "This record no longer exists" until a
full-index is built, but I'm wondering if there is some way to tell
Swish-E to exclude certain records from the results it gives me?  Would it
be as simple as maintaining a table of "deleted ids" and in adding
something to the query string tell Swish-E to not include records which
contain those IDs?

Thanks!


- Greg
Received on Wed Jul 6 14:59:51 2005