StoreDescription for XML, indexing Powerpoint files

From: Andrew Smith <asmith(at)>
Date: Thu Jan 23 2003 - 23:03:34 GMT
To use StoreDescription for XML, you need to give a tag in the XML from 
which to extract the description text; and the same is true for storing 
descriptions of HTML. This makes sense for HTML (which is a single 
standard where you can use e.g. <body> as the StoreDescription tag), but 
doesn't seem to for XML (which is extensible and thus you define your own 
tags and format). I.e. the files you are indexing could contain many 
different types of XML files and there will be no single XML tag that they 
all share which could be used as the common StoreDescription tag. So it 
seems StoreDescription should be changed for XML files to either allow 
entire (up to some number of characters, as TXT descriptions are 
specified) XML files to be stored or to allow multiple tags to be 
specified. Is there any way to get around this in the current Swish-e to 
store entire XML file contents as descriptions?

Finally, this has probably been asked, but is there a Linux filter to use 
for filtering and indexing MS Powerpoint files (i.e. something like 
pdftotext for pdf)? I haven't been able to find a good free one, and was 
thinking of just using the "strings" command to extract printable strings 
from a file, but just want to know if there is anything better.


Andrew Smith
