Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Change the indexed 'title'

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Oct 25 2007 - 15:38:02 GMT
any chance you compiled swish-e *without* libxml2 support?

try reindexing with:

 swish-e -v 9 -W 3 -c index.cfg

and see which parser is being used. I see HTML2 by default.

On 10/25/2007 10:28 AM, josh@relativelysane.com wrote:
>> On 10/25/2007 10:07 AM, josh@relativelysane.com wrote:
>>
>>> The weird thing is that its grabbing and populating flavor, and I know thats 
>>from the ProperyName string because when I remove it from there; flavor isn't 
>> in the dump like the one above.
>> can you copy/paste what your config and example docs look like so we can try
>> and duplicate what you are seeing?
>>
>> -- 
>> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>>
> 
> Sure, they are literally identical to what you used in your example (with the exception of the IndexDir field in my config). Full dumps of the files, the indexing status, the -T INDEX_ALL, and the search query are below. 
> 
> [josh@josh]# cat index.cfg
> IndexDir test
> ExtractPath flavor regex !test/doc-(normal|strong|href)/.*$!$1!
> PropertyNames flavor strong a
> 
> [josh@josh test]# ls
> doc-href  doc-normal  doc-strong
> 
> [josh@josh doc-href]# cat docswith-ahref.html
> <html>
> <head><title>real title</title></head>
> <body><a href="bar">title i want</a></body>
> </html>
> 
> [josh@josh doc-normal]# cat docsthatarenormal.html
> <html>
> <head><title>real title is the title i want</title></head>
> <body><a href="bar">link text</a><strong>strong text</strong> blah </body>
> </html>
> 
> [josh@josh doc-strong]# cat docswith-strong.html
> <html>
> <head><title>real title</title></head>
> <body><strong>title I want</strong></body>
> </html>
> 
> 
> [josh@josh]# swish-e -c index.cfg
> Indexing Data Source: "File-System"
> Indexing "test"
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 12 words alphabetically
> Writing header ...
> Writing index entries ...
>   Writing word text: Complete
>   Writing word hash: Complete
>   Writing word data: Complete
> 12 unique words indexed.
> 7 properties sorted.
> 3 files indexed.  344 total bytes.  25 total words.
> Elapsed time: 00:00:00 CPU time: 00:00:00
> Indexing done!
> 
> 
> [josh@josh]# swish-e -T INDEX_ALL
> # Name:
> # Saved as: index.swish-e
> # Total Words: 12
> # Total Files: 3
> # Removed Files: 0
> # Total Word Pos: 25
> # Removed Word Pos: 0
> # Indexed on: 2007-10-25 11:24:19 EDT
> # Description:
> # Pointer:
> # Maintained by:
> # MinWordLimit: 1
> # MaxWordLimit: 40
> # WordCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
> # BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
> # EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
> # IgnoreFirstChar:
> # IgnoreLastChar:
> # StopWords:
> # BuzzWords:
> # Stemming Applied: 0
> # Soundex Applied: 0
> # Fuzzy Mode: None
> # IgnoreTotalWordCountWhenRanking: 1
> 
> 
> -----> METANAMES for index.swish-e <-----
>         swishdefault : id= 1 type= 1  META_INDEX  Rank Bias=  0
>        swishreccount : id= 2 type=42  META_INTERNAL META_PROP:NUMBER
>            swishrank : id= 3 type=42  META_INTERNAL META_PROP:NUMBER
>         swishfilenum : id= 4 type=42  META_INTERNAL META_PROP:NUMBER
>          swishdbfile : id= 5 type=38  META_INTERNAL META_PROP:STRING(case:compare) SortKeyLen: 100
>         swishdocpath : id= 6 type= 6  META_PROP:STRING(case:compare) SortKeyLen: 100  *presorted*
>           swishtitle : id= 7 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
>         swishdocsize : id= 8 type=10  META_PROP:NUMBER *presorted*
>    swishlastmodified : id= 9 type=18  META_PROP:DATE *presorted*
>               flavor : id=10 type= 1  META_INDEX  Rank Bias=  0
>               flavor : id=11 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
>               strong : id=12 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
>                    a : id=13 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
> 
> 
> -----> WORD INFO in index index.swish-e <-----
> 
> blah
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:12/9
> 
> href
>  Meta:10 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/1
> 
> i
>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:4/9
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:6/7
>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:4/49
> 
> is
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:3/7
> 
> link
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:8/9
> 
> normal
>  Meta:10 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/1
> 
> real
>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/7
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/7
>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/7
> 
> strong
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:10/49
>  Meta:10 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/1
> 
> text
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:9/9,11/49
> 
> the
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:4/7
> 
> title
>  Meta:1 test/doc-href/docswith-ahref.html Freq:2 Pos/Struct:2/7,3/9
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:2/7,5/7
>  Meta:1 test/doc-strong/docswith-strong.html Freq:2 Pos/Struct:2/7,3/49
> 
> want
>  Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:5/9
>  Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:7/7
>  Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:5/49
> 
> 
> -----> FILES in index index.swish-e <-----
> Dumping File Properties for File Number: 1
>  (No Properties)
> 
> ReadAllDocProperties:
>           swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>             swishtitle: 7 ( 10) S: "real title"
>           swishdocsize: 8 (  4) N: "98"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
>                 flavor:11 (  4) S: "href"
> 
> ReadSingleDocPropertiesFromDisk:
>           swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
>             swishtitle: 7 ( 10) S: "real title"
>           swishdocsize: 8 (  4) N: "98"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
>                 flavor:11 (  4) S: "href"
> 
> Dumping File Properties for File Number: 2
>  (No Properties)
> 
> ReadAllDocProperties:
>           swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>             swishtitle: 7 ( 30) S: "real title is the title i want"
>           swishdocsize: 8 (  4) N: "149"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
>                 flavor:11 (  6) S: "normal"
> 
> ReadSingleDocPropertiesFromDisk:
>           swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
>             swishtitle: 7 ( 30) S: "real title is the title i want"
>           swishdocsize: 8 (  4) N: "149"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
>                 flavor:11 (  6) S: "normal"
> 
> Dumping File Properties for File Number: 3
>  (No Properties)
> 
> ReadAllDocProperties:
>           swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>             swishtitle: 7 ( 10) S: "real title"
>           swishdocsize: 8 (  4) N: "97"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
>                 flavor:11 (  6) S: "strong"
> 
> ReadSingleDocPropertiesFromDisk:
>           swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
>             swishtitle: 7 ( 10) S: "real title"
>           swishdocsize: 8 (  4) N: "97"
>      swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
>                 flavor:11 (  6) S: "strong"
> 
> 
> [josh@josh]# swish-e -w title AND flavor=strong -x '"<strong>" "<swishtitle>" "<flavor>"\n'
> # SWISH format: 2.4.5
> # Search words: title AND flavor=strong
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.000 seconds
> # Run time: 0.009 seconds
> "" "real title" "strong"
> 
> 
> 
> 
> josh
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 25 11:38:03 2007