Skip to main content.
home | support | download

Back to List Archive

RE: New alpha version swish-e-2.1.4

From: <Rainer.Scherg(at)not-real.rexroth.de>
Date: Wed Nov 15 2000 - 11:59:12 GMT
-----Original Message-----
From: jmruiz@boe.es [mailto:jmruiz@boe.es]
Sent: Wednesday, November 15, 2000 11:36 AM
To: Rainer.Scherg@rexroth.de
Cc: swish-e@sunsite.berkeley.edu
Subject: Re: [SWISH-E] RE: New alpha version swish-e-2.1.4



Hi, Rainer

On 15 Nov 2000, at 1:00, Rainer.Scherg@rexroth.de wrote:


>> I have to look on the code, but - as a first guess - the 
>> descr. field can be stored along with the path data. So
>> we only need a routine to gather and store the descr. info.
>> 
>
>Sure, it may not be difficult to add. 
>
>
>So, a document may contain both title and description, right?
>I would also like the possibility that the description can be a field 
>(Metaname). What about:
>
>StoreDescription <field>|size
>
>For a field (the filed may be enclosed by <>). Eg:
>
>StoreDescription <myfield>
>Just for size. Eg:
>StoreDescription 400
>
>What do you think?


IMO there are standard ways to provide a description (HTML):

 - META DESCRIPTION - Field (as used by e.g. Microsoft)
 - First xxx Characters of the text/content stream...

For the first step, we should handle this.
Of course it would be better to specify the description fields
- especially, when using XML.

But I see some collissions, when you want to index different doc types.
Following your thoughts, we might have something like:

  StoreDescription       200
  # Standard is TEXT_STREAM
  StoreDescriptionType   HTML  <META http-eqiv="DESCRIPTION">|TEXT_STREAM
  StoreDescriptionType   TXT   TEXT_STREAM
  StoreDescriptionType   XML   <description>

Also possible, when a specific TAG is not found, TEXT_STREAM is used...
A size parameter for each TAG is (IMO) hardly used...



 
>>   Since filtering has been implemented I got the still existing
>>   problem, that swish-e cannot retrieve the file size.
>>   This is because filtering is implemented as PIPE stream.
>> 
>
>Well, in fact that was the filter bug of 2.1.4: it read the size and
>then, if 0, did nothing (I applied a quick patch to read_stream 
>function to avoid this situation).


Yep, I also guessed that vsize() might be the problem... ;-)


>>   This would fix the bug, but brings a small performance penalty
>>   due the extra request for file information. This routine
>>   could also be used e.g. to retrieve and store last modification
>>   dates, etc.
>> 
>>  Any opinions?
>> 
>I totally agree with you. There must be an additional parameter to
>countwordsXXX-routines (the size of the data). The modification is 
>simple for normal files but need some extra overhead for filtered 
>files.
>
>I can modify it for the next release.


We should pass a structure to the routines instead. This will prevent
further modification of the subroutine interfaces and is more flexible.

   struct FileInfo {
	 FILE		  *f;			# may be also a filter
stream or tmpfile
       char *s	  filepath_url;	# Path/URL to indexed file
	 long		  fsize;		# size of original file
	 time_t	  last_modification_time; # size of last mod of or. file
   };

This may involve some changes to the code...

To prevent double work - what are you on?


cu - rainer


----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !

* * *

Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Wed Nov 15 12:00:43 2000