Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Change the indexed 'title'

From: <josh(at)not-real.relativelysane.com>
Date: Thu Oct 25 2007 - 16:46:02 GMT
Ok, I traced out the problem - my install of libxml2 was hosed - re-compiled and installed that; and then re-compiled and installed swish-e, and IT WORKS!

I also found in the swish.cgi where I need to edit to get it to pull what I want.. Now I just have to figure out how to do a 'if - then - else' string in perl ;-)  (for example; if 'flavor' = 'whatever' use 'strong', elseif 'flavor' = 'thisotherthing' use 'a', else use 'normal')

If you have any pointers I would love to hear them ;-)

Again, THANK YOU for the help!!!

Much appreciated!


josh

>any chance you compiled swish-e *without* libxml2 support?
>
>try reindexing with:
>
> swish-e -v 9 -W 3 -c index.cfg
>
>and see which parser is being used. I see HTML2 by default.
>
>On 10/25/2007 10:28 AM, josh@relativelysane.com wrote:
>>> On 10/25/2007 10:07 AM, josh@relativelysane.com wrote:
>>>
>>>> The weird thing is that its grabbing and populating flavor, and I know 
>thats 
>>>from the ProperyName string because when I remove it from there; flavor 
>isn't 
>>> in the dump like the one above.
>>> can you copy/paste what your config and example docs look like so we can 
>try
>>> and duplicate what you are seeing?
>>>
>>> -- 
>>> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>>>
>> 
>> Sure, they are literally identical to what you used in your example (with 
>the exception of the IndexDir field in my config). Full dumps of the files, 
>the indexing status, the -T INDEX_ALL, and the search query are below. 
>> 
>> [josh@josh]# cat index.cfg
>> IndexDir test
>> ExtractPath flavor regex !test/doc-(normal|strong|href)/.*$!$1!
>> PropertyNames flavor strong a
>> 
>> [josh@josh test]# ls
>> doc-href  doc-normal  doc-strong
>> 
>> [josh@josh doc-href]# cat docswith-ahref.html
>> <html>
>> <head><title>real title</title></head>
>> <body><a href="bar">title i want</a></body>
>> </html>
>> 
>> [josh@josh doc-normal]# cat docsthatarenormal.html
>> <html>
>> <head><title>real title is the title i want</title></head>
>> <body><a href="bar">link text</a><strong>strong text</strong> blah </body>
>> </html>
>> 
>> [josh@josh doc-strong]# cat docswith-strong.html
>> <html>
>> <head><title>real title</title></head>
>> <body><strong>title I want</strong></body>
>> </html>
>> 
>> 
>> [josh@josh]# swish-e -c index.cfg
>> Indexing Data Source: "File-System"
>> Indexing "test"
>> Removing very common words...
>> no words removed.
>> Writing main index...
>> Sorting words ...
>> Sorting 12 words alphabetically
>> Writing header ...
>> Writing index entries ...
>>   Writing word text: Complete
>>   Writing word hash: Complete
>>   Writing word data: Complete
>> 12 unique words indexed.
>> 7 properties sorted.
>> 3 files indexed.  344 total bytes.  25 total words.
>> Elapsed time: 00:00:00 CPU time: 00:00:00
>> Indexing done!
>> 
>> 
>> [josh@josh]# swish-e -T INDEX_ALL
>> # Name:
>> # Saved as: index.swish-e
>> # Total Words: 12
>> # Total Files: 3
>> # Removed Files: 0
>> # Total Word Pos: 25
>> # Removed Word Pos: 0
>> # Indexed on: 2007-10-25 11:24:19 EDT
>> # Description:
>> # Pointer:
>> # Maintained by:
>> # MinWordLimit: 1
>> # MaxWordLimit: 40
>> # WordCharacters: 
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # BeginCharacters: 
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # EndCharacters: 
>0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçè
>éêëìíîïðñòóôõöøùúûüýþÿ
>> # IgnoreFirstChar:
>> # IgnoreLastChar:
>> # StopWords:
>> # BuzzWords:
>> # Stemming Applied: 0
>> # Soundex Applied: 0
>> # Fuzzy Mode: None
>> # IgnoreTotalWordCountWhenRanking: 1
>> 
>> 
>> -----> METANAMES for index.swish-e <-----
>>         swishdefault : id= 1 type= 1  META_INDEX  Rank Bias=  0
>>        swishreccount : id= 2 type=42  META_INTERNAL META_PROP:NUMBER
>>            swishrank : id= 3 type=42  META_INTERNAL META_PROP:NUMBER
>>         swishfilenum : id= 4 type=42  META_INTERNAL META_PROP:NUMBER
>>          swishdbfile : id= 5 type=38  META_INTERNAL 
>META_PROP:STRING(case:compare) SortKeyLen: 100
>>         swishdocpath : id= 6 type= 6  META_PROP:STRING(case:compare) 
>SortKeyLen: 100  *presorted*
>>           swishtitle : id= 7 type=70  META_PROP:STRING(case:ignore) 
>SortKeyLen: 100  *presorted*
>>         swishdocsize : id= 8 type=10  META_PROP:NUMBER *presorted*
>>    swishlastmodified : id= 9 type=18  META_PROP:DATE *presorted*
>>               flavor : id=10 type= 1  META_INDEX  Rank Bias=  0
>>               flavor : id=11 type=70  META_PROP:STRING(case:ignore) 
>SortKeyLen: 100  *presorted*
>>               strong : id=12 type=70  META_PROP:STRING(case:ignore) 
>SortKeyLen: 100  *presorted*
>>                    a : id=13 type=70  META_PROP:STRING(case:ignore) 
>SortKeyLen: 100  *presorted*
>> 
>> 
>> -----> WORD INFO in index index.swish-e <-----
>> 
>> blah
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:12/9
>> 
>> href
>>  Meta:10 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/1
>> 
>> i
>>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:4/9
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:6/7
>>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:4/49
>> 
>> is
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:3/7
>> 
>> link
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:8/9
>> 
>> normal
>>  Meta:10 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/1
>> 
>> real
>>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/7
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/7
>>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/7
>> 
>> strong
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:10/49
>>  Meta:10 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/1
>> 
>> text
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:9/9,11/49
>> 
>> the
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:4/7
>> 
>> title
>>  Meta:1 test/doc-href/docswith-ahref.html Freq:2 Pos/Struct:2/7,3/9
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:2/7,5/7
>>  Meta:1 test/doc-strong/docswith-strong.html Freq:2 Pos/Struct:2/7,3/49
>> 
>> want
>>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:5/9
>>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:7/7
>>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:5/49
>> 
>> 
>> -----> FILES in index index.swish-e <-----
>> Dumping File Properties for File Number: 1
>>  (No Properties)
>> 
>> ReadAllDocProperties:
>>           swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>>             swishtitle: 7 ( 10) S: "real title"
>>           swishdocsize: 8 (  4) N: "98"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
>>                 flavor:11 (  4) S: "href"
>> 
>> ReadSingleDocPropertiesFromDisk:
>>           swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>>             swishtitle: 7 ( 10) S: "real title"
>>           swishdocsize: 8 (  4) N: "98"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
>>                 flavor:11 (  4) S: "href"
>> 
>> Dumping File Properties for File Number: 2
>>  (No Properties)
>> 
>> ReadAllDocProperties:
>>           swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>>             swishtitle: 7 ( 30) S: "real title is the title i want"
>>           swishdocsize: 8 (  4) N: "149"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
>>                 flavor:11 (  6) S: "normal"
>> 
>> ReadSingleDocPropertiesFromDisk:
>>           swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>>             swishtitle: 7 ( 30) S: "real title is the title i want"
>>           swishdocsize: 8 (  4) N: "149"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
>>                 flavor:11 (  6) S: "normal"
>> 
>> Dumping File Properties for File Number: 3
>>  (No Properties)
>> 
>> ReadAllDocProperties:
>>           swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>>             swishtitle: 7 ( 10) S: "real title"
>>           swishdocsize: 8 (  4) N: "97"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
>>                 flavor:11 (  6) S: "strong"
>> 
>> ReadSingleDocPropertiesFromDisk:
>>           swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>>             swishtitle: 7 ( 10) S: "real title"
>>           swishdocsize: 8 (  4) N: "97"
>>      swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
>>                 flavor:11 (  6) S: "strong"
>> 
>> 
>> [josh@josh]# swish-e -w title AND flavor=strong -x '"<strong>" 
>"<swishtitle>" "<flavor>"\n'
>> # SWISH format: 2.4.5
>> # Search words: title AND flavor=strong
>> # Removed stopwords:
>> # Number of hits: 1
>> # Search time: 0.000 seconds
>> # Run time: 0.009 seconds
>> "" "real title" "strong"
>> 
>> 
>> 
>> 
>> josh
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>
>-- 
>Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>
>_______________________________________________
>Users mailing list
>Users@lists.swish-e.org
>http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 25 12:46:05 2007