-----Original Message-----
From: jmruiz@boe.es [mailto:jmruiz@boe.es]
Sent: Wednesday, November 15, 2000 11:36 AM
To: Rainer.Scherg@rexroth.de
Cc: swish-e@sunsite.berkeley.edu
Subject: Re: [SWISH-E] RE: New alpha version swish-e-2.1.4
Hi, Rainer
On 15 Nov 2000, at 1:00, Rainer.Scherg@rexroth.de wrote:
>> I have to look on the code, but - as a first guess - the
>> descr. field can be stored along with the path data. So
>> we only need a routine to gather and store the descr. info.
>>
>
>Sure, it may not be difficult to add.
>
>
>So, a document may contain both title and description, right?
>I would also like the possibility that the description can be a field
>(Metaname). What about:
>
>StoreDescription <field>|size
>
>For a field (the filed may be enclosed by <>). Eg:
>
>StoreDescription <myfield>
>Just for size. Eg:
>StoreDescription 400
>
>What do you think?
IMO there are standard ways to provide a description (HTML):
- META DESCRIPTION - Field (as used by e.g. Microsoft)
- First xxx Characters of the text/content stream...
For the first step, we should handle this.
Of course it would be better to specify the description fields
- especially, when using XML.
But I see some collissions, when you want to index different doc types.
Following your thoughts, we might have something like:
StoreDescription 200
# Standard is TEXT_STREAM
StoreDescriptionType HTML <META http-eqiv="DESCRIPTION">|TEXT_STREAM
StoreDescriptionType TXT TEXT_STREAM
StoreDescriptionType XML <description>
Also possible, when a specific TAG is not found, TEXT_STREAM is used...
A size parameter for each TAG is (IMO) hardly used...
>> Since filtering has been implemented I got the still existing
>> problem, that swish-e cannot retrieve the file size.
>> This is because filtering is implemented as PIPE stream.
>>
>
>Well, in fact that was the filter bug of 2.1.4: it read the size and
>then, if 0, did nothing (I applied a quick patch to read_stream
>function to avoid this situation).
Yep, I also guessed that vsize() might be the problem... ;-)
>> This would fix the bug, but brings a small performance penalty
>> due the extra request for file information. This routine
>> could also be used e.g. to retrieve and store last modification
>> dates, etc.
>>
>> Any opinions?
>>
>I totally agree with you. There must be an additional parameter to
>countwordsXXX-routines (the size of the data). The modification is
>simple for normal files but need some extra overhead for filtered
>files.
>
>I can modify it for the next release.
We should pass a structure to the routines instead. This will prevent
further modification of the subroutine interfaces and is more flexible.
struct FileInfo {
FILE *f; # may be also a filter
stream or tmpfile
char *s filepath_url; # Path/URL to indexed file
long fsize; # size of original file
time_t last_modification_time; # size of last mod of or. file
};
This may involve some changes to the code...
To prevent double work - what are you on?
cu - rainer
----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !
* * *
Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Wed Nov 15 12:00:43 2000