I downloaded and unzipped the file. I indexed with
swish-e -i BTTitle01312003-1.csv -v0 -T indexed_words > out
and was able to search:
moseley@bumby:~$ swish-e -w j2ee -H0
1000 BTTitle01312003-1.csv "BTTitle01312003-1.csv" 5252838
Now the one you sent did not have the j2ee on the last line, it was on
the second to last line.
Another thing to note is that your file does not end with a newline.
I don't know if that would cause a problem (for Windows) or not.
moseley@bumby:~$ tail -1 BTTitle01312003-1.csv | od -c | tail
0000460 9 | 0 1 | | 9 9 | | 0 7 | 1 1 |
0000500 | | T h e S t o r y o f a
0000520 N e w Z e a l a n d S h e
0000540 e p S t a t i o n | | W e y e
0000560 r h a e u s e r E n v i r o n
0000600 m e n t a l C l a s s i c s |
0000620 N F | 0 0 0 3 2 5 2 5 8 5 | 1 9
0000640 9 9 / 0 1 / 0 6 | 2 0 0 2 / 0 2
0000660 / 1 6 | Y | A
As you can see, it does't cause a problem when I tested, but something
to check on Windows.
moseley@bumby:~$ tail out
Adding:[1:swishdefault(1)] 'nf' Pos:962436 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '0003252585' Pos:962437 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '1999' Pos:962438 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '01' Pos:962439 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '06' Pos:962440 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '2002' Pos:962441 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '02' Pos:962442 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] '16' Pos:962443 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'y' Pos:962444 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'a' Pos:962445 Stuct:0x9 ( BODY FILE )
The other thing to try is downloading a current dev of swish-e and see
if that changes things. I'm using 2.4.0-pr4. For Windows:
http://www.webaugur.com/wares/files/swish-e/daily/swish-e-2.4.0-pr4-2003-10-26.exe
BTW -- why are you indexing this big file? Doesn't seem like a very
useful thing for searching as a single file. Might as well use grep --
it's faster for simple queries:
moseley@bumby:~$ time fgrep -i j2ee BTTitle01312003-1.csv > /dev/null
real 0m0.044s
user 0m0.010s
sys 0m0.040s
moseley@bumby:~$ time swish-e -w j2ww BTTitle01312003-1.csv -H0 >/dev/null
real 0m0.059s
user 0m0.060s
sys 0m0.000s
--
Bill Moseley
moseley@hank.org
Received on Sun Oct 26 17:31:19 2003