Hi there,
I've used Swish-e in the past for indexing XML content. I used the -S prog
along with a Python script to extract dynamic HTML documents from a large XML
collection and it all worked well.
I'm now trying to index a collection of XML documents where the hit
granularity is the same as the file dispertion, i.e. one XML document per
hit. Therefore, I am attempting to use -S fs rather then -S prog. However,
I'm having problems with describing the content of my XML documents to
Swish-e.
My XML documents are quite simple and follow this format:
<section id="facilities" name="Facilities">
<!-- content -->
</section>
or
<subsection id="studios" name="Studios" section-id="facilities"
section-name="Facilities">
<!-- content -->
</subsection>
So far I have the following Swish-e configuration file:
IndexFile site.index
IndexDir .
IndexOnly .xml
IndexContents XML* .xml
# exclude the "index.xml" file
FileRules filename is index\.xml
# attempting to index the attribute values
UndefinedXMLAttributes index
# alter the path names to remove the leading "." and remove
# the trailing ".xml"
ReplaceRules remove \\.xml
ReplaceRules remove \\.
What I want to be able to do is use the @name attribute as the "swishtitle"
property, but I can't work out how to do this.
(I know I could do it using the -S prog method and transforming the XML
documents into HTML on-the-fly.)
There are also some other things I can't work out (maybe they could be added
to the FAQ?)
*) How do I query an index to find its available properties?
*) What are the names of Swish-e's default properties? (I know these are in
the documentation somwhere but they're difficult to find.)
*) How do I assign an XML attribute to a property? And what if I want it to
have a different name?
Any help with this would be great!
Cheers,
Richard
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard Lewis
Sonic Arts Research Archive
http://www.sara.uea.ac.uk/
JID: ironchicken@jabber.earth.li
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Received on Wed Sep 6 06:06:58 2006