Skip to main content.
home | support | download

Back to List Archive

Compression scheme (was Port of swish-e to Windows)

From: Reino Va'inaste <reino(at)not-real.postimees.ee>
Date: Wed Apr 15 1998 - 16:51:00 GMT
>I spent quite some time to find out what was happening.
>First I figured out how the numbers were stored in the index file, each
>number is stored in a 128 radix system,
>All 'digits' are incremented by 128, except the least significant
>'digit'. This way, it is possible to detect the start of a new number in
>a row of numbers. A row of numbers is ended with a zero byte.
>So far, so good, but...
>
>In the function compress ( index.c), there is a very subtile bug:
>Look at this loop:
>
>while ( i-- >= 0)
>       fputc(s[i] | (i ? 128 : 0), fp);
>
>Because the post-decrement is done after the comparison, i becomes -1 in
>the loop, so s[-1] is written after each number !
>Now, I was even more puzzled, because if this was the case , why did the
>program ever worked ?
>After studying the pieces of code that read the numbers, I figured out 
>that these random bytes were consistently skipped.


***  yes, there will be 4 additional bytes ( 0x80) with _every_ word stored
in index 
and they are skipped ***


>Only when s[-1] happens to be a zero byte, things go wrong, because zero
>signifies the end of a row of numbers and all subsequent numbers are
>ignored !


no, when s[-1] is zero, then compression works (with 4 unneeded bytes )

when i is -1 then program stores 0x80 (not 0x0!)  to file, but this byte
belongs to _next_
item in compressed valuse

when you compress  1, 2, 3  then you get hex output
01 80         02 80          03 80

but when you decompress it , then you use (differently grouped, as absence
of 8th bit
 marks end) input

01               80 02          80 03 

you may add 0x80 bytes before compressed string 
without changing result

greetings, Reino

>Furthermore, accessing s[-1] can give unpredictable effects too of 
>course.
>
>So, I changed the above loop to:
>
>while ( --i >=3D 0)
>       fputc(s[i] | (i ? 128 : 0), fp);
>
>But then I had to change all the places where the numbers are read too.
>After doing this , everything worked fine.

>Dr. Dirk Nerinckx
>Wave Research, Belgium
>
>Personal emails can be sent to:  enzo@skynet.be
>
Reino Všinaste ( reino@postimees.ee ) ph (372) 7 390 379 
Postimees Gildi 1 Tartu Estonia     GSM 251 75186
-- "Contact" ---------------
- Do you think there are people on other planets?
- I don't know. But if it's just us, it would be an awful waste of space.
Received on Wed Apr 15 09:57:34 1998