Hi,
we use swish-e for basic website indexing at ZyNet. It works - we are happy!
But since we don't use it for much that is clever, I've only a minimal grasp
of what it can do.
I'm looking at a project that needs a search engine like component, that will
extract and store metadata from pages (Lastmodified, Metanames, properties
etc). The basic input will be a URI with some associated custom metadata that
doesn't appear in the URI contents (although I keep suggesting this is a
really good place to put it!). A key use of this will be generation of
HTML/XML from the search results, including RSS feeds, but I have to build a
lot of different types of queries.
However we plan to do a fair bit with searching metanames/properties, and we
also want prompt insertion and deletion(!) of records. For deletion I don;t
think rebuilding will be acceptable (but hey the plans are fairly fluid
still).
I assume that deletions (or disabling) isn't supported yet, couldn't see it in
the docs?
After some thinking about what we needed, some Googling, I decided I was
specifying something that looks a lot like what swish-e does already with
some extra bits (although I'm not sure precisely how much swish-e can do).
I guess part of it is I'm familiar with SQL relational databases, and have a
vague feel for what I can force into an index, and a good idea of what I can
query. But I don't have that "comfort" or knowledge of swish-e although
(superificially at least) it looks a lot closer to what I want to achieve
(indeed it may do all of it bar a little configuring and a couple of scripts,
and with lots of Perl bits which is a plus point for us).
Perhaps if I saw a few more complex examples of swish-e searches, rather than
me feebly struggling to figure out the syntax for "-s swishlastmodified
desc", I might feel more confident.
Of course the killer is I know that the SQL databases we use will do whatever
I asked of them, including cascading deleted. I've no idea how efficiently
they will do it, but I know that I can get them to do the task. But I've not
used the swish properties and metanames so I'm not sure whether something is
possible.
I guess I can always store some additional metadata in (Postgres probably)
along side the list if URI to index, if my needs for additional properties
get too great, but I'd really prefer it if my queries only included one type
of system at a time! Anyone gone with swish-e and ended up finding they had
to do extra stuff in another database? Or am I unduely pessimistic on this
point?
I'm thinking queries like;
URI starts with 'http://www.example.com/somesubsystem/', and author is 'Joe
Bloggs', the most recent N sorted by date, and some other property isn't set,
and maybe after all that require a keyword.
Or is that madness, and should I run back to an SQL database straight away.
Simon
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 7 12:49:17 2007