Skip to main content.
home | support | download

Back to List Archive

Re: MetaNames - XML

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Oct 01 2004 - 19:00:37 GMT
No, I made a typo:

MetaNameAlias

NOT

MetaNamesAlias

check out:

http://swish-e.org/current/docs/SWISH-CONFIG.html#item_MetaNameAlias

Antonio Barrera wrote on 10/1/04 1:58 PM:

> Probably made a newbie error, but this is the result of adding the
> MetaNamesAlias
> MetaNames maintitle alttitle description longdescription keywords
> MetaNamesAlias swishdefault keywords
> MetaNamesAlias swishdefault description
> MetaNamesAlias swishdefault longdescription
> #UndefinedMetaTags index
> PropertyNames maintitle title description longdescription link
> ConvertHTMLEntities yes
> 
> "az.config" [converted] 141L, 5332C written
> [antonio@libserv4 antonio]$ !sw
> swish-e -c az.config
> Bad directive on line #34 of file az.config: MetaNamesAlias swishdefault
> keywords 
> Bad directive on line #35 of file az.config: MetaNamesAlias swishdefault
> description 
> Bad directive on line #36 of file az.config: MetaNamesAlias swishdefault
> longdescription 
> 
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Peter Karman
> Sent: Friday, October 01, 2004 2:40 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: MetaNames - XML
> 
> you probably want to alias some of your metanames into swishdefault. 
> Otherwise you have to specify the metaname to search in. try this in your
> config:
> 
> MetaNamesAlias swishdefault  keywords
> 
> that way content in <keywords> should get indexed under both metanames.
> 
> 
> 
> Antonio Barrera wrote on 10/1/04 1:34 PM:
> 
> 
>>I'm using 2.4.2, here's my full test.
>>
>>[antonio@libserv4 antonio]$ swish-e -c az.config -T indexed_words 
>>Indexing Data Source: "File-System"
>>Indexing "/home/antonio/azs"
>>
>>Checking dir "/home/antonio/azs"...
>>  143.xml - Using XML2 parser -     Adding:[1:swishdefault(1)]   'http'
>>Pos:3  Stuct:0x1 ( FILE )
>>    Adding:[1:swishdefault(1)]   'libweb.princeton.edu'   Pos:4  Stuct:0x1
> 
> (
> 
>>FILE )
>>    Adding:[1:swishdefault(1)]   'depart'   Pos:5  Stuct:0x1 ( FILE )
>>    Adding:[1:swishdefault(1)]   'fiscal'   Pos:6  Stuct:0x1 ( FILE )
>>    Adding:[1:swishdefault(1)]   'photoservices.php'   Pos:7  Stuct:0x1 (
>>FILE )
>>    Adding:[1:alttitle(11)]   'copi'   Pos:11  Stuct:0x1 ( FILE )
>>    Adding:[1:alttitle(11)]   'servic'   Pos:12  Stuct:0x1 ( FILE )
>>    Adding:[1:maintitle(10)]   'photoservic'   Pos:15  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'copi'   Pos:23  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'photocopi'   Pos:24  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'photodupl'   Pos:25  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'photocopi'   Pos:26  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'reproduct'   Pos:27  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'xerox'   Pos:28  Stuct:0x1 ( FILE )
>>    Adding:[1:keywords(14)]   'copier'   Pos:29  Stuct:0x1 ( FILE )
>> (15 words)
>>
>>Removing very common words...
>>no words removed.
>>Writing main index...
>>Sorting words ...
>>Sorting 13 words alphabetically
>>Writing header ...
>>Writing index entries ...
>>  Writing word text: Complete
>>  Writing word hash: Complete
>>  Writing word data: Complete
>>13 unique words indexed.
>>10 properties sorted.                                              
>>1 file indexed.  408 total bytes.  15 total words.
>>Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done!
>>[antonio@libserv4 antonio]$ cat azs/143.xml <?xml version="1.0" 
>>encoding="ISO-8859-1"?> <record id='162'> 
>><link>http://libweb.princeton.edu/departments/fiscal/photoservices.php
>></link
>>
>><title>
>><alttitle>Copying services</alttitle>
>><maintitle>Photoservices</maintitle>
>></title>
>><description></description>
>><longdescription></longdescription>
>><keywords>copy, photocopying, photoduplication, photocopiers, 
>>reproduction, xerox, copiers</keywords> </record>
>>[antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "xerox"
>># SWISH format: 2.4.2
>># Search words: xerox
>># Removed stopwords: 
>>err: no results
>>.
>>[antonio@libserv4 antonio]$
>>
>>-----Original Message-----
>>From: swish-e@sunsite3.berkeley.edu 
>>[mailto:swish-e@sunsite3.berkeley.edu]
>>On Behalf Of Peter Karman
>>Sent: Friday, October 01, 2004 1:09 PM
>>To: Multiple recipients of list
>>Subject: [SWISH-E] Re: MetaNames - XML
>>
>>
>>
>>Antonio Barrera wrote on 10/1/04 10:18 AM:
>>
>>
>>
>>>Bill,
>>>
>>>Here are the search results using different MetaNames treatments.
>>>
>>>Using specified MetaTags:
>>>- MetaNames maintitle alttitle brief_description long_description 
>>>keywords
>>>
>>>[antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "photoservices" 
>>>-p maintitle link description # SWISH format: 2.4.2 # Search words: 
>>>photoservices # Removed stopwords:
>>>err: no results
>>>.
>>
>>
>>hmm. works for me with the latest CVS version (2.5.2):
>>
>>karpet@cartermac 6% swish-e -i xml -c c -T indexed_words Indexing Data
>>Source: "File-System"
>>Indexing "xml"
>>     Adding:[1:swishdefault(1)]   'http'   Pos:7  Stuct:0x9 ( BODY FILE )
>>     Adding:[1:swishdefault(1)]   'libweb'   Pos:8  Stuct:0x9 ( BODY FILE
> 
> )
> 
>>     Adding:[1:swishdefault(1)]   'princeton'   Pos:9  Stuct:0x9 ( BODY 
>>FILE )
>>     Adding:[1:swishdefault(1)]   'edu'   Pos:10  Stuct:0x9 ( BODY FILE )
>>     Adding:[1:swishdefault(1)]   'departments'   Pos:11  Stuct:0x9 ( 
>>BODY FILE )
>>     Adding:[1:swishdefault(1)]   'fiscal'   Pos:12  Stuct:0x9 ( BODY FILE
> 
> )
> 
>>     Adding:[1:swishdefault(1)]   'photoservices'   Pos:13  Stuct:0x9 ( 
>>BODY FILE )
>>     Adding:[1:swishdefault(1)]   'php'   Pos:14  Stuct:0x9 ( BODY FILE )
>>     Adding:[1:alttitle(11)]   'copying'   Pos:18  Stuct:0x8B ( META 
>>BODY TITLE FILE )
>>     Adding:[1:alttitle(11)]   'services'   Pos:19  Stuct:0x8B ( META 
>>BODY TITLE FILE )
>>     Adding:[1:maintitle(10)]   'photoservices'   Pos:22  Stuct:0x8B ( 
>>META BODY TITLE FILE )
>>     Adding:[1:keywords(14)]   'copy'   Pos:31  Stuct:0x89 ( META BODY 
>>FILE )
>>     Adding:[1:keywords(14)]   'photocopying'   Pos:32  Stuct:0x89 ( 
>>META BODY FILE )
>>     Adding:[1:keywords(14)]   'photoduplication'   Pos:33  Stuct:0x89 ( 
>>META BODY FILE )
>>     Adding:[1:keywords(14)]   'photocopiers'   Pos:34  Stuct:0x89 ( 
>>META BODY FILE )
>>     Adding:[1:keywords(14)]   'reproduction'   Pos:35  Stuct:0x89 ( 
>>META BODY FILE )
>>     Adding:[1:keywords(14)]   'xerox'   Pos:36  Stuct:0x89 ( META BODY 
>>FILE )
>>     Adding:[1:keywords(14)]   'copiers'   Pos:37  Stuct:0x89 ( META 
>>BODY FILE )
>>Removing very common words...
>>no words removed.
>>Writing main index...
>>Sorting words ...
>>Sorting 17 words alphabetically
>>Writing header ...
>>Writing index entries ...
>>   Writing word text: Complete
>>   Writing word hash: Complete
>>   Writing word data: Complete
>>17 unique words indexed.
>>4 properties sorted.
>>1 file indexed.  408 total bytes.  18 total words.
>>Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done!
>>karpet@cartermac 7% cat xml
>><?xml version="1.0" encoding="ISO-8859-1"?> <record id='162'> 
>><link>http://libweb.princeton.edu/departments/fiscal/photoservices.php
>></link
>>
>><title>
>><alttitle>Copying services</alttitle>
>><maintitle>Photoservices</maintitle>
>></title>
>><description></description>
>><longdescription></longdescription>
>><keywords>copy, photocopying, photoduplication, photocopiers, 
>>reproduction, xerox, copiers</keywords> </record> karpet@cartermac 8% 
>>swish-e -w photoservices # SWISH format: 2.5.2 # Search words: 
>>photoservices # Removed
>>stopwords:
>># Number of hits: 1
>># Search time: 0.006 seconds
>># Run time: 0.037 seconds
>>1000 xml "Copying services Photoservices" 408
>>
>>
>>
>>
>>
>>>Using unspecified MetaTags:
>>>UndefinedMetaTags index
>>>
>>># SWISH format: 2.4.2
>>># Search words: photoservices
>>># Removed stopwords: 
>>># Number of hits: 1
>>># Search time: 0.000 seconds
>>># Run time: 0.025 seconds
>>>1000 /home/antonio/az/143.xml "143.xml" 408 "Photoservices"
>>>"http://libweb.princeton.edu/departments/fiscal/photoservices.php" ""
>>>
>>>
>>>
>>>Antonio
>>>
>>>-----Original Message-----
>>>From: swish-e@sunsite3.berkeley.edu
>>>[mailto:swish-e@sunsite3.berkeley.edu]
>>>On Behalf Of Bill Moseley
>>>Sent: Friday, October 01, 2004 9:57 AM
>>>To: Multiple recipients of list
>>>Subject: [SWISH-E] Re: MetaNames - XML
>>>
>>>On Fri, Oct 01, 2004 at 06:40:30AM -0700, Antonio Barrera wrote:
>>>
>>>
>>>
>>>>Problem occurs with the MetaNames, some of them are not being indexed.
>>>
>>>
>>>I guess I'm not following what's not working.  Can you index using -T 
>>>indexed_words and point out what's missing?
>>>
>>>I'm not that happy with how indexing XML works -- for example if you 
>>>tell swish to ignore a tag it ignores everything inside that tag even 
>>>if you specify a metaname or property.  Plus, should be able to ignore 
>>>metatags and properties separately.
>>>
>>>
>>>--
>>>Bill Moseley
>>>moseley@hank.org
>>>
>>>Unsubscribe from or help with the swish-e list: 
>>>  http://swish-e.org/Discussion/
>>>
>>>Help with Swish-e:
>>>  http://swish-e.org/current/docs
>>>  swish-e@sunsite.berkeley.edu
>>>
>>
>>
>>--
>>Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com
> 
> 
> --
> Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com

-- 
Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com
Received on Fri Oct 1 12:00:47 2004