Antonio Barrera wrote on 10/1/04 10:18 AM:
> Bill,
>
> Here are the search results using different MetaNames treatments.
>
> Using specified MetaTags:
> - MetaNames maintitle alttitle brief_description long_description keywords
>
> [antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "photoservices" -p
> maintitle link description
> # SWISH format: 2.4.2
> # Search words: photoservices
> # Removed stopwords:
> err: no results
> .
hmm. works for me with the latest CVS version (2.5.2):
karpet@cartermac 6% swish-e -i xml -c c -T indexed_words
Indexing Data Source: "File-System"
Indexing "xml"
Adding:[1:swishdefault(1)] 'http' Pos:7 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'libweb' Pos:8 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'princeton' Pos:9 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'edu' Pos:10 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'departments' Pos:11 Stuct:0x9 (
BODY FILE )
Adding:[1:swishdefault(1)] 'fiscal' Pos:12 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'photoservices' Pos:13 Stuct:0x9 (
BODY FILE )
Adding:[1:swishdefault(1)] 'php' Pos:14 Stuct:0x9 ( BODY FILE )
Adding:[1:alttitle(11)] 'copying' Pos:18 Stuct:0x8B ( META
BODY TITLE FILE )
Adding:[1:alttitle(11)] 'services' Pos:19 Stuct:0x8B ( META
BODY TITLE FILE )
Adding:[1:maintitle(10)] 'photoservices' Pos:22 Stuct:0x8B (
META BODY TITLE FILE )
Adding:[1:keywords(14)] 'copy' Pos:31 Stuct:0x89 ( META BODY
FILE )
Adding:[1:keywords(14)] 'photocopying' Pos:32 Stuct:0x89 (
META BODY FILE )
Adding:[1:keywords(14)] 'photoduplication' Pos:33 Stuct:0x89 (
META BODY FILE )
Adding:[1:keywords(14)] 'photocopiers' Pos:34 Stuct:0x89 (
META BODY FILE )
Adding:[1:keywords(14)] 'reproduction' Pos:35 Stuct:0x89 (
META BODY FILE )
Adding:[1:keywords(14)] 'xerox' Pos:36 Stuct:0x89 ( META BODY
FILE )
Adding:[1:keywords(14)] 'copiers' Pos:37 Stuct:0x89 ( META
BODY FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 17 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
17 unique words indexed.
4 properties sorted.
1 file indexed. 408 total bytes. 18 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
karpet@cartermac 7% cat xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<record id='162'>
<link>http://libweb.princeton.edu/departments/fiscal/photoservices.php</link>
<title>
<alttitle>Copying services</alttitle>
<maintitle>Photoservices</maintitle>
</title>
<description></description>
<longdescription></longdescription>
<keywords>copy, photocopying, photoduplication, photocopiers, reproduction,
xerox, copiers</keywords>
</record>
karpet@cartermac 8% swish-e -w photoservices
# SWISH format: 2.5.2
# Search words: photoservices
# Removed stopwords:
# Number of hits: 1
# Search time: 0.006 seconds
# Run time: 0.037 seconds
1000 xml "Copying services Photoservices" 408
>
> Using unspecified MetaTags:
> UndefinedMetaTags index
>
> # SWISH format: 2.4.2
> # Search words: photoservices
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.000 seconds
> # Run time: 0.025 seconds
> 1000 /home/antonio/az/143.xml "143.xml" 408 "Photoservices"
> "http://libweb.princeton.edu/departments/fiscal/photoservices.php" ""
>
>
>
> Antonio
>
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Bill Moseley
> Sent: Friday, October 01, 2004 9:57 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: MetaNames - XML
>
> On Fri, Oct 01, 2004 at 06:40:30AM -0700, Antonio Barrera wrote:
>
>>Problem occurs with the MetaNames, some of them are not being indexed.
>
>
> I guess I'm not following what's not working. Can you index using -T
> indexed_words and point out what's missing?
>
> I'm not that happy with how indexing XML works -- for example if you tell
> swish to ignore a tag it ignores everything inside that tag even if you
> specify a metaname or property. Plus, should be able to ignore metatags and
> properties separately.
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
--
Peter Karman - http://www.cray.com/craydoc/ - karman(at)not-real.cray.com
Received on Fri Oct 1 10:09:38 2004