Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] portable indexes for swish-e

From: Josh Rabinowitz <joshr-swishe(at)not-real.joshr.com>
Date: Fri Dec 14 2007 - 19:27:34 GMT
Re: [swish-e] portable indexes for swish-e
Follow up:

I updated the script to convert more types, and to rewrite files in subdirs of src/. It's appended below.

I also noticed that I have to leave the prototype of main like:
int main(int, char**) or it has problems at startup.

I was able to get a 64-bit swish-e exe on 32 bit systems that was able to start up. When I tried to build an index with it, though, I get this from 2.6 from SVN
http://rafb.net/p/xZyvpu13.html

From swish-e 2.4 from SVN with the int64 changes applied, I get:

% sman-update
...
*** glibc detected *** swish-e: realloc(): invalid next size: 0x0859c988 ***
======= Backtrace: =========
/lib/i686/nosegneg/libc.so.6[0xb7afea]
/lib/i686/nosegneg/libc.so.6(realloc+0x105)[0xb7be75]
/usr/local/lib/libswish-e.so.2(erealloc+0x2c)[0x4002c9cc]
swish-e[0x8068521]
swish-e[0x80694cd]
/usr/lib/libxml2.so.2(xmlParseCharData+0x1ad)[0x599c60d]
/usr/lib/libxml2.so.2(xmlParseChunk+0x9a0)[0x599edf0]
swish-e[0x806b894]
swish-e(parse_XML+0x43)[0x806ba83]
swish-e[0x805b237]
swish-e[0x8053c6c]
swish-e[0x805ee9a]
swish-e[0x804cec2]
/lib/i686/nosegneg/libc.so.6(__libc_start_main+0xdc)[0xb26dec]
swish-e[0x804b611]
======= Memory map: ========
00aef000-00b08000 r-xp 00000000 09:01 9633808    /lib/ld-2.5.so
00b08000-00b09000 r-xp 00019000 09:01 9633808    /lib/ld-2.5.so
00b09000-00b0a000 rwxp 0001a000 09:01 9633808    /lib/ld-2.5.so
00b11000-00c4e000 r-xp 00000000 09:01 9633802    /lib/i686/nosegneg/libc-2.5.so
00c4e000-00c50000 r-xp 0013d000 09:01 9633802    /lib/i686/nosegneg/libc-2.5.so
00c50000-00c51000 rwxp 0013f000 09:01 9633802    /lib/i686/nosegneg/libc-2.5.so
00c51000-00c54000 rwxp 00c51000 00:00 0
00c7c000-00c7e000 r-xp 00000000 09:01 9633816    /lib/libdl-2.5.so
00c7e000-00c7f000 r-xp 00001000 09:01 9633816    /lib/libdl-2.5.so
00c7f000-00c80000 rwxp 00002000 09:01 9633816    /lib/libdl-2.5.so
00c9b000-00cad000 r-xp 00000000 09:01 6547429    /usr/lib/libz.so.1.2.3
00cad000-00cae000 rwxp 00011000 09:01 6547429    /usr/lib/libz.so.1.2.3
00cfc000-00d21000 r-xp 00000000 09:01 9633842    /lib/i686/nosegneg/libm-2.5.so
00d21000-00d22000 r-xp 00024000 09:01 9633842    /lib/i686/nosegneg/libm-2.5.so
00d22000-00d23000 rwxp 00025000 09:01 9633842    /lib/i686/nosegneg/libm-2.5.so
00dad000-00db8000 r-xp 00000000 09:01 9633843    /lib/libgcc_s-4.1.2-20070626.so.1
00db8000-00db9000 rwxp 0000a000 09:01 9633843    /lib/libgcc_s-4.1.2-20070626.so.1
05968000-05a94000 r-xp 00000000 09:01 6539412    /usr/lib/libxml2.so.2.6.26
05a94000-05a99000 rwxp 0012b000 09:01 6539412    /usr/lib/libxml2.so.2.6.26
05a99000-05a9a000 rwxp 05a99000 00:00 0
08048000-0808b000 r-xp 00000000 09:01 6522803    /usr/local/bin/swish-e
0808b000-0808e000 rwxp 00042000 09:01 6522803    /usr/local/bin/swish-e
0808e000-080cf000 rwxp 0808e000 00:00 0
0857d000-085c0000 rwxp 0857d000 00:00 0
40000000-40001000 r-xp 40000000 00:00 0          [vdso]
40001000-40005000 rwxp 40001000 00:00 0
40011000-40054000 r-xp 00000000 09:01 6520839    /usr/local/lib/libswish-e.so.2.0.0
40054000-40063000 rwxp 00043000 09:01 6520839    /usr/local/lib/libswish-e.so.2.0.0
40063000-40065000 rw-p 40063000 00:00 0
40065000-40265000 r--p 00000000 09:01 6527227    /usr/lib/locale/locale-archive
40265000-405a1000 rw-p 40265000 00:00 0
40600000-40621000 rw-p 40600000 00:00 0
40621000-40700000 ---p 40621000 00:00 0
bfbbd000-bfbd3000 rw-p bfbbd000 00:00 0          [stack]
Broken pipe
               

At 12:48 PM -0500 12/14/07, Josh Rabinowitz wrote:
Hello, All:

So, I was talking to a swish-e developer about how nice it would be
if swish-e indexes were portable across OS's and architectures, and
he mentioned how one of the remaining barriers was that an 'int' is a
different size on different machines.

So I got to thinking, and wrote a script that tries to make almost
all swish-e integer types of the same size, regardless of the
platform. It's pasted below.

What I found was that it the resulting swish-e worked on the 64bit
system I tried, but not the 32bit system (with the caveats in the
script). It would be great to get the indexes fully portable, though!

Again, the script I used is below; hopefully my email program won't
mangle it (let me know if you need a copy directly). I'm very
interested to hear feedback from other users and developers!

  Josh Rabinowitz
  Author of "How To Index Anything" and "Indexing Arbitrary Data Using
Perl and Swish-e"
######### begin script swish-e-src-rewrite.pl #############



#!/usr/bin/perl -w
use strict;
use Getopt::Long;

# call main()
main();

# main()
sub main {
    #include <stdint.h> has the uint... and int... typedefs used below
        # note that we don't make exceptions for main() or waitpid(),
        # but need to.
    my @regexes = (
        q(s/ \bunsigned\s+long\s+long\s+int\b     /uint64_t/gx),
        q(s/ \blong\s+long\s+unsigned\s+int\b     /uint64_t/gx),
        q{s/ \bunsigned\s+long\s+int\b /uint64_t/gx},
        q{s/ \bunsigned\s+long\b       /uint64_t/gx},
        q(s/ \bunsigned\s+int\b        /uint64_t/gx),
        q(s/ \blong\s+long\s+int\b     /int64_t/gx),
        q(s/ \blong\s+long\b           /int64_t/gx),
        q(s/ \blong\s+int\b            /int64_t/gx),
        q(s/ \blong\b                  /int64_t/gx),
        q(s/ \bint\b                   /int64_t/gx),
    ); 
    my @files = glob( "src/*.c src/*.h src/*/*.c src/*/*.h src/*/*/*.h src/*/*/*.c");
    for my $file (@files) {
        print "$file\n";
        _apply_regexes( $file, @regexes );
    }  
}

#================================================================
# _apply_regexes( $file, @search_and_replace_regexes )
# backs up $file to $file.bak, and
# applies supplied regexes to the lines of a file,
sub _apply_regexes {
    my ($file, @regexes) = @_;
    # changes a file by applying the supplied regexes to each line
    my $tmpfile = "$file.tmp";
    open(my $rfh, "<", $file)    || die "$0: Can't open $file: $!";
    open(my $wfh, ">", $tmpfile) || die "$0: Can't open $tmpfile: $!";
        # clobber old $file.tmp
    print "Applying regexes to file $file\n"; # . join("\n", @regexes) . "\n";
    while(<$rfh>) {
        chomp();
        for my $r (@regexes) {
            # $r should operate on $_ !
            eval $r; 
            die "$0: Error in regex: $r: $@" if $@;
        }  
        print $wfh "$_\n";
    }  
    close($rfh) || die "$0: Can't open $file: $!";
    close($wfh) || die "$0: Can't close $tmpfile: $!";
    rename( $file, "$file.bak" );
    rename( $tmpfile, $file ) || die "$0: Can't rename $tmpfile to $file: $!";
}


-- 
----------------------------------------------------------------------
-- Josh Rabinowitz                           joshr-swishe@joshr.com --
-- SkateboardDirectory.com(tm)      http://SkateboardDirectory.com/ --
-- SkateTalk Chat Systems(tm)             http://www.skatetalk.com/ --
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Dec 14 14:27:45 2007