Hi Rainer,
On 16 Nov 2000, at 6:57, Rainer.Scherg@rexroth.de wrote:
> Hi!
>
> I' m just makeing some changes to get file properties
> (size, last modification date, etc.) and get a little bit
> confused about the structures and data types swish-e is using.
>
> Right now the structure look like:
>
>
> swish.h:
> --------
> /*
> -- FileProperties (similar to fileinfo)
> -- Structure uses as store for information about a file to be
> indexed... -- Unused items may be NULL (e.g. if File is not opened, f
> == NULL)
> */
>
> typedef struct {
> FILE *fp; /* may be also a filter stream or NULL if not
> opened
> */
> char *path; /* path to file to index (may be tmp)
> */ char *virt_path_url; /* org. path/URL to indexed file */
> long fsize; /* size of the original file (not
> filtered)
> */
> time_t mtime; /* size of last mod of or. file */ }
> FileProp;
>
>
> Also included could be DocType, indextitleonly, and some other flags
> for this file (to be discussed...). This would make the subroutines
> interfaces leaner and less complicated to handle...
>
Sounds good to me.
> -----------
>
> but as im going forward to do the coding, i realize that some of the
> information is also being stored in other structures.
>
>
> e.g. file fs.c (only some essential parts):
> ---------------
> /* Indexes the words in the file
> */
>
> void printfile(struct SWISH *sw, struct docentry *e)
> {
> int wordcount;
> --deleted-- FILE *fp;
> char *s;
> char *filterprog;
> char *filtercmd;
> int DocType;
> FileProp *fprop;
>
> [...]
>
> fprop = fs_file_properties ((char *)e->filename);
> if (! fprop) progerr ("Failed to alloc memory....");
BTW, there is no need for this line because in mem.h, if
no memory is available, the program exits.
>
> if ((filterprog = hasfilter (e->filename,sw->filterlist)) != NULL)
> {
> filtercmd=emalloc(strlen(filterprog)+3+strlen(e->filename)
> +1); sprintf(filtercmd, "%s
> \'%s\'",filterprog,e->filename); fprop->fp = popen
> (filtercmd,"r");
> } else {
> fprop->fp = fopen(e->filename, "r" );
> }
>
>
>
> if (fprop->fp) {
> /* 08/00 Jose Ruiz */
> /* get Doc Type as is in IndexContents or Defaultcontents
> */
> if((DocType=getdoctype(e->filename,sw->indexcontents))==
> NODOCTYPE
> && sw->DefaultDocType!=NODOCTYPE)
> DocType=sw->DefaultDocType;
>
> switch(DocType)
> {
> case TXT:
> if(sw->verbose == 3) printf(" - Using TXT
> filter - ");
> wordcount = countwords_TXT(sw, fp,
> e->filename, e->title, (isoksuffix(e->filename, sw->nocontentslist)
> && (sw->nocontentslist != NULL)));
>
> [...]
> }
>
> -------
> the countwords interface has to be changed to get less parameters and
> more structures types. The questions ar what should we pass to this
> routine and is this code the right one?
>
> My proposal:
>
> The code within the switch-statement is similar or equal to all
> indexing
> methods
> (at this moment file_system and http).
>
> We should have a common routine "index_file ()", which call the
> appropriate
> indexing-routine "countwords_ XML, HTML, TXT, etc."
>
I agree, it is easier to handle and maintain.
>
> the interface could look like
>
> int index_file ( (struct SWISH *)sw, FileProp *fprop, whatelse...) or
> int index_file (SWISH *sw, FileProp *fprop, ....)
>
Sounds good to me.
>
> ----------
>
> Routine:
> int countwords(sw, vp, filename, title, indextitleonly)
> struct SWISH *sw;
> void *vp;
> char *filename;
> char *title;
> int indextitleonly;
>
> can anybody explain, why countwords_XXX gets a "title" as parameter?
> because this could be the same as filename (if a title is missing in
> th document)?
>
The reason is very simple: I copied countwords into
countwords_XXX and, then I changed them (cut and paste). Anyway,
you are right, XML and TXT do not need title.
> The "indextitleonly" could be a flag in the structure FileProp...
>
OK for me.
BTW, now, I am changing the struct definitions to typedefs, adding
the fixes from Bas to compile in IRIX, and some other minor work.
cu
Jose
Received on Thu Nov 16 19:27:13 2000