Hi Rainer,
On 21 Nov 2000, at 4:49, Rainer.Scherg@rexroth.de wrote:
> Hi Jose!
>
> you have implemented the "read_stream" routine.
>
> We could use this feature to "scan" the content and
> include a contenttype "MAGIC" (which should be default)
> in the config.
>
> MAGIC could decide on contentbase, which type of doc
> has to be indexed...
>
> On HTTP, we could parse the response header to determine the
> content type...
>
>
OK for me.
>
> > IMO summary/description means "title" for html documents. Other
> > documents can have their own summary. So, any reference to title
> > should be removed outside the countwords_HTML routine.
>
> I don't think so. IMO the definition could look like follow:
>
> HTML:
> title = <Title>-Tag (or path, see below...)
> Description= <META http-equiv="Description"> | first xx chars of
> <BODY>
>
> TXT:
> title = empty
> Description= first xx chars of file
>
> XML:
> WML:
> similar to HTML (has to be defined)
>
>
> IMO we should store an empty title field, if there is no title
> (which means: don't store the filepath twice).This will save space in
> the database.
>
In fact, this is already done in buildFileEntry (index.c):
if(len_title==len_filename)
{
if(memcmp(filename,title,len_title)==0)
{
len_title=0; /* Flag to indicate that filename
** and title are identical */
}
}
if(len_title)
{
compress3(len_title,p);
memcpy(p,title,len_title);p+=len_title;
} else *p++='\0'; /* Do no store title - Just a 0 */
> On retrieval, an empty title field should be returned as
> "real_path" (URL, or filepath).
>
Now I am doing an estrdup of the filepath. In readFileEntry (index.c)
uncompress3(len2,p); /* Read length of title */
/* If 0 then filename == title */
if(!len2) /* No title */
buf2=estrdup(buf1);
else {
buf2 = emalloc(len2);
memcpy(buf2,p,len2); /* Read title */
p+=len2;
}
cu
Jose
Received on Tue Nov 21 16:11:49 2000