Skip to main content.
home | support | download

Back to List Archive

Re: Indexing differs for 2 lines swapped in file

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Oct 26 2003 - 17:19:02 GMT
I downloaded and unzipped the file.  I indexed with 

  swish-e -i BTTitle01312003-1.csv -v0 -T indexed_words > out

and was able to search:

  moseley@bumby:~$ swish-e -w j2ee -H0
  1000 BTTitle01312003-1.csv "BTTitle01312003-1.csv" 5252838

Now the one you sent did not have the j2ee on the last line, it was on
the second to last line.


Another thing to note is that your file does not end with a newline.
I don't know if that would cause a problem (for Windows) or not.  

moseley@bumby:~$ tail -1 BTTitle01312003-1.csv | od -c | tail
0000460   9   |   0   1   |   |   9   9   |   |   0   7   |   1   1   |
0000500   |   |   T   h   e       S   t   o   r   y       o   f       a
0000520       N   e   w       Z   e   a   l   a   n   d       S   h   e
0000540   e   p       S   t   a   t   i   o   n   |   |   W   e   y   e
0000560   r   h   a   e   u   s   e   r       E   n   v   i   r   o   n
0000600   m   e   n   t   a   l       C   l   a   s   s   i   c   s   |
0000620   N   F   |   0   0   0   3   2   5   2   5   8   5   |   1   9
0000640   9   9   /   0   1   /   0   6   |   2   0   0   2   /   0   2
0000660   /   1   6   |   Y   |   A

As you can see, it does't cause a problem when I tested, but something
to check on Windows.

moseley@bumby:~$ tail out
    Adding:[1:swishdefault(1)]   'nf'   Pos:962436  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '0003252585'   Pos:962437  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '1999'   Pos:962438  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '01'   Pos:962439  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '06'   Pos:962440  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '2002'   Pos:962441  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '02'   Pos:962442  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '16'   Pos:962443  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'y'   Pos:962444  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'a'   Pos:962445  Stuct:0x9 ( BODY FILE )

The other thing to try is downloading a current dev of swish-e and see
if that changes things.  I'm using 2.4.0-pr4. For Windows:

  http://www.webaugur.com/wares/files/swish-e/daily/swish-e-2.4.0-pr4-2003-10-26.exe

BTW -- why are you indexing this big file?  Doesn't seem like a very
useful thing for searching as a single file.  Might as well use grep --
it's faster for simple queries:

moseley@bumby:~$ time fgrep -i j2ee BTTitle01312003-1.csv  > /dev/null

real    0m0.044s
user    0m0.010s
sys     0m0.040s


moseley@bumby:~$ time swish-e -w j2ww BTTitle01312003-1.csv -H0 >/dev/null

real    0m0.059s
user    0m0.060s
sys     0m0.000s



-- 
Bill Moseley
moseley@hank.org
Received on Sun Oct 26 17:31:19 2003