Hi Jose,
I thought a little bit about following routine (as mentioned) in the
prior mails. The introduction of "read_stream" (read the file to be indexed
completley into memory) by Jose is a great idea. It might speed up the
indexing process, when there is a need to reposition in the indexing data
(also a good OS would do the same using a file cache...).
----------------
char *read_stream(FILE *fp,int filelen)
{
int c=0,offset=0,bufferlen=0;
unsigned char *buffer;
if(filelen)
{
buffer=emalloc(filelen+1);
vread(buffer,1,filelen,fp);
buffer[filelen]='\0';
} else { /* if we are reading from a popen call, filelen is 0 */
buffer=emalloc((bufferlen=MAXSTRLEN)+1);
while((c=fgetc(fp))!=EOF)
{
if(offset==bufferlen)
{
bufferlen+=MAXSTRLEN;
buffer=erealloc(buffer,bufferlen+1);
}
buffer[offset++]=(unsigned char)c;
}
buffer[offset]='\0';
}
return (char *)buffer;
}
-----------------
As I mentioned the routine is not optimized (fget, lots of possible
reallocs), when reading e.g. data from a file stream. So the routine
could look like (the vread-part with filelen might be better, but on most
filesizes below the initial READ_BUFFER_SIZE, does not make
any difference...):
-----------------
#define READ_BUFFER_SIZE (128 * 1024)
char *read_stream(FILE *fp)
{ unsigned char *buffer;
long n, rd_len;
buffer=emalloc(READ_BUFFER_SIZE);
rd_len = 0;
while (! feof(fp)) {
n = fread(buffer+rd_len,sizeof(unsigned
char),READ_BUFFER_SIZE,fp);
rd_len += n;
if (n == READ_BUFFER_SIZE) {
buffer = erealloc (buffer, rd_len+READ_BUFFER_SIZE);
}
}
buffer[rd_len]='\0';
return (char *)buffer;
}
--------------------------
also possible is a different interface, which returns the buffersize:
long (FILE *fp, unsigned char **buffer)
This would be the filesize or the size of the filter output
(depends on...).
The routine is not yet tested, but should work fine.
cu - rainer
----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !
* * *
Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Sun Nov 19 22:41:07 2000