Ok, I traced out the problem - my install of libxml2 was hosed - re-compiled and installed that; and then re-compiled and installed swish-e, and IT WORKS!
I also found in the swish.cgi where I need to edit to get it to pull what I want.. Now I just have to figure out how to do a 'if - then - else' string in perl ;-) (for example; if 'flavor' = 'whatever' use 'strong', elseif 'flavor' = 'thisotherthing' use 'a', else use 'normal')
If you have any pointers I would love to hear them ;-)
Again, THANK YOU for the help!!!
Much appreciated!
josh
>any chance you compiled swish-e *without* libxml2 support?
>
>try reindexing with:
>
> swish-e -v 9 -W 3 -c index.cfg
>
>and see which parser is being used. I see HTML2 by default.
>
>On 10/25/2007 10:28 AM, josh@relativelysane.com wrote:
>>> On 10/25/2007 10:07 AM, josh@relativelysane.com wrote:
>>>
>>>> The weird thing is that its grabbing and populating flavor, and I know
>thats
>>>from the ProperyName string because when I remove it from there; flavor
>isn't
>>> in the dump like the one above.
>>> can you copy/paste what your config and example docs look like so we can
>try
>>> and duplicate what you are seeing?
>>>
>>> --
>>> Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
>>>
>>
>> Sure, they are literally identical to what you used in your example (with
>the exception of the IndexDir field in my config). Full dumps of the files,
>the indexing status, the -T INDEX_ALL, and the search query are below.
>>
>> [josh@josh]# cat index.cfg
>> IndexDir test
>> ExtractPath flavor regex !test/doc-(normal|strong|href)/.*$!$1!
>> PropertyNames flavor strong a
>>
>> [josh@josh test]# ls
>> doc-href doc-normal doc-strong
>>
>> [josh@josh doc-href]# cat docswith-ahref.html
>> <html>
>> <head><title>real title</title></head>
>> <body><a href="bar">title i want</a></body>
>> </html>
>>
>> [josh@josh doc-normal]# cat docsthatarenormal.html
>> <html>
>> <head><title>real title is the title i want</title></head>
>> <body><a href="bar">link text</a><strong>strong text</strong> blah </body>
>> </html>
>>
>> [josh@josh doc-strong]# cat docswith-strong.html
>> <html>
>> <head><title>real title</title></head>
>> <body><strong>title I want</strong></body>
>> </html>
>>
>>
>> [josh@josh]# swish-e -c index.cfg
>> Indexing Data Source: "File-System"
>> Indexing "test"
>> Removing very common words...
>> no words removed.
>> Writing main index...
>> Sorting words ...
>> Sorting 12 words alphabetically
>> Writing header ...
>> Writing index entries ...
>> Writing word text: Complete
>> Writing word hash: Complete
>> Writing word data: Complete
>> 12 unique words indexed.
>> 7 properties sorted.
>> 3 files indexed. 344 total bytes. 25 total words.
>> Elapsed time: 00:00:00 CPU time: 00:00:00
>> Indexing done!
>>
>>
>> [josh@josh]# swish-e -T INDEX_ALL
>> # Name:
>> # Saved as: index.swish-e
>> # Total Words: 12
>> # Total Files: 3
>> # Removed Files: 0
>> # Total Word Pos: 25
>> # Removed Word Pos: 0
>> # Indexed on: 2007-10-25 11:24:19 EDT
>> # Description:
>> # Pointer:
>> # Maintained by:
>> # MinWordLimit: 1
>> # MaxWordLimit: 40
>> # WordCharacters:
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # BeginCharacters:
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # EndCharacters:
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # IgnoreFirstChar:
>> # IgnoreLastChar:
>> # StopWords:
>> # BuzzWords:
>> # Stemming Applied: 0
>> # Soundex Applied: 0
>> # Fuzzy Mode: None
>> # IgnoreTotalWordCountWhenRanking: 1
>>
>>
>> -----> METANAMES for index.swish-e <-----
>> swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
>> swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
>> swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
>> swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
>> swishdbfile : id= 5 type=38 META_INTERNAL
>META_PROP:STRING(case:compare) SortKeyLen: 100
>> swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
>SortKeyLen: 100 *presorted*
>> swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore)
>SortKeyLen: 100 *presorted*
>> swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
>> swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
>> flavor : id=10 type= 1 META_INDEX Rank Bias= 0
>> flavor : id=11 type=70 META_PROP:STRING(case:ignore)
>SortKeyLen: 100 *presorted*
>> strong : id=12 type=70 META_PROP:STRING(case:ignore)
>SortKeyLen: 100 *presorted*
>> a : id=13 type=70 META_PROP:STRING(case:ignore)
>SortKeyLen: 100 *presorted*
>>
>>
>> -----> WORD INFO in index index.swish-e <-----
>>
>> blah
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:12/9
>>
>> href
>> Meta:10 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/1
>>
>> i
>> Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:4/9
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:6/7
>> Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:4/49
>>
>> is
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:3/7
>>
>> link
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:8/9
>>
>> normal
>> Meta:10 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/1
>>
>> real
>> Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/7
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/7
>> Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/7
>>
>> strong
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:10/49
>> Meta:10 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/1
>>
>> text
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:9/9,11/49
>>
>> the
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:4/7
>>
>> title
>> Meta:1 test/doc-href/docswith-ahref.html Freq:2 Pos/Struct:2/7,3/9
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:2/7,5/7
>> Meta:1 test/doc-strong/docswith-strong.html Freq:2 Pos/Struct:2/7,3/49
>>
>> want
>> Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:5/9
>> Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:7/7
>> Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:5/49
>>
>>
>> -----> FILES in index index.swish-e <-----
>> Dumping File Properties for File Number: 1
>> (No Properties)
>>
>> ReadAllDocProperties:
>> swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>> swishtitle: 7 ( 10) S: "real title"
>> swishdocsize: 8 ( 4) N: "98"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:21:41 EDT"
>> flavor:11 ( 4) S: "href"
>>
>> ReadSingleDocPropertiesFromDisk:
>> swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>> swishtitle: 7 ( 10) S: "real title"
>> swishdocsize: 8 ( 4) N: "98"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:21:41 EDT"
>> flavor:11 ( 4) S: "href"
>>
>> Dumping File Properties for File Number: 2
>> (No Properties)
>>
>> ReadAllDocProperties:
>> swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>> swishtitle: 7 ( 30) S: "real title is the title i want"
>> swishdocsize: 8 ( 4) N: "149"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:22:43 EDT"
>> flavor:11 ( 6) S: "normal"
>>
>> ReadSingleDocPropertiesFromDisk:
>> swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>> swishtitle: 7 ( 30) S: "real title is the title i want"
>> swishdocsize: 8 ( 4) N: "149"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:22:43 EDT"
>> flavor:11 ( 6) S: "normal"
>>
>> Dumping File Properties for File Number: 3
>> (No Properties)
>>
>> ReadAllDocProperties:
>> swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>> swishtitle: 7 ( 10) S: "real title"
>> swishdocsize: 8 ( 4) N: "97"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:23:35 EDT"
>> flavor:11 ( 6) S: "strong"
>>
>> ReadSingleDocPropertiesFromDisk:
>> swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>> swishtitle: 7 ( 10) S: "real title"
>> swishdocsize: 8 ( 4) N: "97"
>> swishlastmodified: 9 ( 4) D: "2007-10-25 11:23:35 EDT"
>> flavor:11 ( 6) S: "strong"
>>
>>
>> [josh@josh]# swish-e -w title AND flavor=strong -x '"<strong>"
>"<swishtitle>" "<flavor>"\n'
>> # SWISH format: 2.4.5
>> # Search words: title AND flavor=strong
>> # Removed stopwords:
>> # Number of hits: 1
>> # Search time: 0.000 seconds
>> # Run time: 0.009 seconds
>> "" "real title" "strong"
>>
>>
>>
>>
>> josh
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>
>--
>Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
>
>_______________________________________________
>Users mailing list
>Users@lists.swish-e.org
>http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 25 12:46:05 2007