Skip to main content.
home | support | download

Back to List Archive

RE: title and non html

From: <jmruiz(at)not-real.boe.es>
Date: Tue Nov 21 2000 - 16:10:09 GMT
Hi Rainer,

On 21 Nov 2000, at 4:49, Rainer.Scherg@rexroth.de wrote:

> Hi Jose!
> 
> you have implemented the "read_stream" routine.
> 
> We could use this feature to "scan" the content and
> include a contenttype "MAGIC" (which should be default)
> in the config.
> 
> MAGIC could decide on contentbase, which type of doc
> has to be indexed...
> 
> On HTTP, we could parse the response header to determine the
> content type...
> 
> 

OK for me.

> 
> > IMO summary/description means "title" for html documents. Other
> > documents can have their own summary. So, any reference to title
> > should be removed outside the countwords_HTML routine.
> 
> I don't think so. IMO the definition could look like follow:
> 
> HTML:  
>    title      = <Title>-Tag  (or path, see below...)
>    Description= <META  http-equiv="Description"> | first xx chars of
>    <BODY>
> 
> TXT:
>    title      = empty
>    Description= first xx chars of file
> 
> XML:
> WML:
>    similar to HTML (has to be defined)
> 
> 
> IMO we should store an empty title field, if there is no title
> (which means: don't store the filepath twice).This will save space in
> the database.
> 
In fact, this is already done in buildFileEntry (index.c):

        if(len_title==len_filename)
        {
                if(memcmp(filename,title,len_title)==0)
                {
                        len_title=0;  /* Flag to indicate that filename
                                        ** and title are identical */
                }
        }
        if(len_title)
        {
                compress3(len_title,p);
                memcpy(p,title,len_title);p+=len_title;
        } else *p++='\0';   /* Do no store title - Just a 0 */

> On retrieval, an empty title field should be returned as
> "real_path" (URL, or filepath).
> 

Now I am doing an estrdup of the filepath. In readFileEntry (index.c)

        uncompress3(len2,p);   /* Read length of title */
                /* If 0 then filename == title */
        if(!len2)     /* No title */
                buf2=estrdup(buf1);
        else {
                buf2 = emalloc(len2);
                memcpy(buf2,p,len2);   /* Read title */
                p+=len2;
        }




cu
Jose
Received on Tue Nov 21 16:11:49 2000