I'd like to glean metadata from the documents I'm indexing. The
documents have a predictable format:
...
<BODY BGCOLOR="#ffffff">
<H1>[list-name] title of message</H1>
<B>name of message author</B>
<A HREF="..."
TITLE="...">username at email.host
</A><BR>
...
I'd like to be able to search these documents with "swish-e -w
authorname=foo" or "swish-e -w authoremail=bar".
At what point during the process of indexing would it be possible to
manipulate things so that I can do this? Can I, for example, add a
directive somewhere saying:
@metanames{qw( msgtitle authorname )}
=~ /<H1>[list-name] (.*)</H1>\w+<B>(.*)</B>/g;
or something like that?
Ben
--
"Don't get suckered in by the comments;
they can be terribly misleading.
Debug only code." -- Dave Storer
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Sep 7 17:30:59 2007