Skip to main content.
home | support | download

Back to List Archive

Re: XML

From: Roy Tennant <roy.tennant(at)not-real.ucop.edu>
Date: Fri Nov 17 2000 - 16:48:10 GMT
Jose,
Thanks, but I'm not sure I completely understand. By "document" do 
you mean a particular file segment, say between a beginning and 
ending XML tag? If so, then I think I get how that could solve my 
problem. For any given SWISH-E hit, there would be noted the 
beginning and length of the file segment desired, which could then be 
extracted from the file in question and displayed. If that's true, 
then that's exactly the kind of function I want, and having it 
written in C would make it fast enough to make SWISH-E my software of 
choice for indexing my XML file store. It would also place SWISH-E at 
the head of the pack in terms of indexing a rich set of content, from 
PDFs (with file filters) to XML. Any chance of this happening with a 
near-term release? (he asks, knowing full well that no one is getting 
paid to do this...). Thanks,
Roy

>Hi Roy,
>
>On 16 Nov 2000, at 12:42, Roy Tennant wrote:
>
>>  To me, the problem with SWISH-E and XML is not the searching, but the
>>  results. What you would get back is that a given file matches your
>>  search, *not* each XML segment that matches and the URL of the file
>>  from which it was extracted (which is more like what I want). So
>>  that's why I'm looking at other things to search XML content (like
>>  sgrep) rather than use SWISH-E. To make SWISH-E really work the way I
>>  want it to, there would need to be a module that could extract
>>  relevant segments from files that match. Roy
>>
>>
>
>Well, this problem can be solved using a workaround. I introduced
>in 2.1.x a new value in the result list, only visible by the extended
>info output (option -x): The offset of the document inside the file (one
>file can contain several documents). Now this value is always 0 and
>the size is the total length of the file. But in a future this values can
>be different to delimite a document inside the file (offset + length).
>
>So, reading only the bytes starting at the offset upto the length from
>the file, will give you just the document you need. Unfortunately, this
>cannot be applied to filtered documents (Rainer's Filter option).
>
>cu
>Jose
Received on Fri Nov 17 16:49:49 2000