Skip to main content.
home | support | download

Back to List Archive

Re: wondering why?, Updated

From: Weir James K Contr ASC/ENOI <James.Weir(at)not-real.wpafb.af.mil>
Date: Mon May 03 2004 - 10:28:42 GMT
Well I finished re-indexing the whole thing. There is a lot of data out there.
I ran the same search and it came back with the same results 
Warning: Failed to uncompress Property. zlib uncompress returned: -5.  uncompressed size: 14899107 buf_len: 1243796

I have two indexes setup one for Stemming_en2 and this one works
And one for Metaphone this one does not work.

Do I need to break up the indexes into smaller one?

I am indexing simple text files they are less than 20K, but there is about 3 million them 

Hope someone can help with this problem

Jim 



> -----Original Message-----
> From: swish-e@sunsite.berkeley.edu 
> [mailto:swish-e@sunsite.berkeley.edu] On Behalf Of Bill Moseley
> Sent: Wednesday, April 28, 2004 2:23 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: wondering why?
> 
> 
> On Wed, Apr 28, 2004 at 11:40:38AM -0400, Weir James K Contr 
> ASC/ENOI wrote:  > Are you using the same exact program for 
> both indexing and 
> > > searching?  I ask because it looks like a data offset error.
> > > Another possibility is that you copied the index file but are 
> > > using the wrong associated .prop file.  Did you copy or move 
> > > the index by chance?
> 
> > I move the index file and .prop to another folder after I index it.
> > Just incase I need to use the search while indexing.. Ie a 
> Temp folder, is 
> > This not a good idea;that is moving the files?
> 
> Swish-e does that for you.  When indexing it writes to .temp 
> files of the same name as the index and when done renames the 
> files.  It's not atomic so there's a slight chance that the 
> someone could open the index right in the middle of the 
> rename and open the old index and the new .prop file and fail.
> 
> I suspect it's a tiny bit faster if create the index in a 
> temporary directory and then rename the directories, but it 
> still isn't atomic since two files are opened.
> 
> > > You posted about this in early March.  Bill Schell found an
> > > off-by-one error that was causing problems, but I'm not sure 
> > > if that's related to your issue as I think that was only when 
> > > not using HTML2|XML2|TXT2 parser (i.e. using the old system 
> > > where the entire file is read into memory).  You are using 
> > > HTML* which will use libxml2.
> > 
> > I am only indexing TXT files, should I use some other 
> parser for that?
> 
> Yes, then you would be using the old system that reads the 
> entire file into memory and could get hit by that off-by-one 
> bug.  The TXT2 parser reads and indexes in chunks.
> 
> 
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
Received on Mon May 3 03:28:42 2004