Hi,
So, when we last left our intrepid (albeit somewhat clueless) hero, he was
running incremental + economy mode on his 900K++ records. Here is where we
ended up:
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,442,762 words alphabetically
Writing header ...
Writing index entries ...
Writing word data: 79%err: Ran out of memory (could not allocate
4296816 more bytes)!
.
soooo cloooooose...:(
this is good in a way from my perspective in that unlike 2.4.3, it is not
a segfault, but still leaves me with not much else helpful to offer. I
will note that smaller builds of the index seemed ok, but we are talking
way smaller, I could conceivably do testing on say 50% and if that passed
move up (or down if it fails) until i hit a magic number. I can also run
gdb, but would need some guidance there (happy to try). Finally, I can
bump the memory allotment in the kernel (as David mentioned that might be
a bottleneck) to try and eke passed this hump. Of course, that would leave
me having to do so again the next time memory hindered the build.
I can also, of course, be happy with non-incremental and get on with my
life.
Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com
On Thu, 19 Oct 2006, Bill Moseley wrote:
> On Thu, Oct 19, 2006 at 10:19:28PM -0400, Brad Miele wrote:
>> one question about debugging with gdb, what do i do :)? Sorry, i know how
>> to do gdb swish-e, and then run <switches and whatnot> but what do i do
>> after the crash to get more info?
>
> I'm way rusty.
>
> Depends on how hard it crashes. Basically, you get a backtrace (bt)
> where it segfaults. Then you look back though and try and see what
> was happening where and if it makes sense. Normally it doesn't. If
> it crashes hard then you may not even get a backtrace that makes any
> sense. From there you set breakpoints and watch variables to try and
> track down the problem. At one point I knew most of the indexing
> code, but I would need to completely relearn it to be able to make
> quick work of tracking down a segfault. The bummer is in your case it
> takes so long to happen.
>
>> finally, why do i need to use -e when i have so many resources? when
>> swish-e gave that out of memory error, i still had over 2G totally free
>> via top.
>
> 32bit limit in swish? I doubt there's correct integer overflow
> detection.
>
> Swish uses hash tables and the larger they get the slower access to
> the table is. Remember, swish was designed for indexing thousands or
> tens of thousands of files. It's very fast at that. The trade-off is
> it's not that scalable.
>
> I generated a million random docs once and -e was much slower at first
> but kept running at a reasonably steady pace, and without -e it was
> way faster for the first 100K files or so and then started slowing
> down as the hash tables filled. -e ended up being faster.
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
>
>
Received on Fri Oct 20 03:31:56 2006