Skip to main content.
home | support | download

Back to List Archive

Re: MetaNames - XML

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Oct 01 2004 - 17:09:12 GMT
Antonio Barrera wrote on 10/1/04 10:18 AM:

> Bill, 
> 
> Here are the search results using different MetaNames treatments.
> 
> Using specified MetaTags:
> - MetaNames maintitle alttitle brief_description long_description keywords
> 
> [antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "photoservices" -p
> maintitle link description
> # SWISH format: 2.4.2
> # Search words: photoservices
> # Removed stopwords: 
> err: no results
> .

hmm. works for me with the latest CVS version (2.5.2):

karpet@cartermac 6% swish-e -i xml -c c -T indexed_words
Indexing Data Source: "File-System"
Indexing "xml"
     Adding:[1:swishdefault(1)]   'http'   Pos:7  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'libweb'   Pos:8  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'princeton'   Pos:9  Stuct:0x9 ( BODY 
FILE )
     Adding:[1:swishdefault(1)]   'edu'   Pos:10  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'departments'   Pos:11  Stuct:0x9 ( 
BODY FILE )
     Adding:[1:swishdefault(1)]   'fiscal'   Pos:12  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'photoservices'   Pos:13  Stuct:0x9 ( 
BODY FILE )
     Adding:[1:swishdefault(1)]   'php'   Pos:14  Stuct:0x9 ( BODY FILE )
     Adding:[1:alttitle(11)]   'copying'   Pos:18  Stuct:0x8B ( META 
BODY TITLE FILE )
     Adding:[1:alttitle(11)]   'services'   Pos:19  Stuct:0x8B ( META 
BODY TITLE FILE )
     Adding:[1:maintitle(10)]   'photoservices'   Pos:22  Stuct:0x8B ( 
META BODY TITLE FILE )
     Adding:[1:keywords(14)]   'copy'   Pos:31  Stuct:0x89 ( META BODY 
FILE )
     Adding:[1:keywords(14)]   'photocopying'   Pos:32  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'photoduplication'   Pos:33  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'photocopiers'   Pos:34  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'reproduction'   Pos:35  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'xerox'   Pos:36  Stuct:0x89 ( META BODY 
FILE )
     Adding:[1:keywords(14)]   'copiers'   Pos:37  Stuct:0x89 ( META 
BODY FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 17 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
17 unique words indexed.
4 properties sorted.
1 file indexed.  408 total bytes.  18 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
karpet@cartermac 7% cat xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<record id='162'>
<link>http://libweb.princeton.edu/departments/fiscal/photoservices.php</link>
<title>
<alttitle>Copying services</alttitle>
<maintitle>Photoservices</maintitle>
</title>
<description></description>
<longdescription></longdescription>
<keywords>copy, photocopying, photoduplication, photocopiers, reproduction,
xerox, copiers</keywords>
</record>
karpet@cartermac 8% swish-e -w photoservices
# SWISH format: 2.5.2
# Search words: photoservices
# Removed stopwords:
# Number of hits: 1
# Search time: 0.006 seconds
# Run time: 0.037 seconds
1000 xml "Copying services Photoservices" 408



> 
> Using unspecified MetaTags:
> UndefinedMetaTags index
> 
> # SWISH format: 2.4.2
> # Search words: photoservices
> # Removed stopwords: 
> # Number of hits: 1
> # Search time: 0.000 seconds
> # Run time: 0.025 seconds
> 1000 /home/antonio/az/143.xml "143.xml" 408 "Photoservices"
> "http://libweb.princeton.edu/departments/fiscal/photoservices.php" ""
>  
> 
> 
> Antonio
> 
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Bill Moseley
> Sent: Friday, October 01, 2004 9:57 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: MetaNames - XML
> 
> On Fri, Oct 01, 2004 at 06:40:30AM -0700, Antonio Barrera wrote:
> 
>>Problem occurs with the MetaNames, some of them are not being indexed.
> 
> 
> I guess I'm not following what's not working.  Can you index using -T
> indexed_words and point out what's missing?
> 
> I'm not that happy with how indexing XML works -- for example if you tell
> swish to ignore a tag it ignores everything inside that tag even if you
> specify a metaname or property.  Plus, should be able to ignore metatags and
> properties separately.
> 
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 

-- 
Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com
Received on Fri Oct 1 10:09:38 2004