Hi-
Yes swish is returning everything after the <body as description.
See attached config and HTML files.
--------------
Z:\>D:\swish-e\swish-e.exe -c D:\swish-e\conf\test.config -f
D:\swish-e\tmp_test
.index
Indexing Data Source: "File-System"
Indexing "D:/temp/indextest"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 20 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
20 unique words indexed.
5 properties sorted.
1 file indexed. 289 total bytes. 28 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
---------------
D:\swish-e>swish-e -w This -f test.index -p swishdescription
# SWISH format: 2.1-dev-25
# Search words: This
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.030 seconds
1000 D:/temp/indextest/index2.html "index2.html" 289
"class=3D"portal" b
gcolor=3D"#F0F0F0" text=3D"#000000"
link=3D"#000000" vli
nk=3D"#000000" alink=3D"#000000"
leftmargin=3D"0" rightm
argin=3D"0" topmargin=3D"0" =
bottommargin=3D"0"
marginwid
th=3D"0" marginheight=3D"0"> This is a test..."
.
-----------------
This is on Win2000 SP2 with the swish-e-2.1-dev-25-2002-03-22-win32.exe
binary distro.
-markus
-----Urspr=FCngliche Nachricht-----
Von: swish-e@sunsite.berkeley.edu [mailto:swish-e@sunsite.berkeley.edu]
Im Auftrag von Bill Moseley
Gesendet: Mittwoch, 27. M=E4rz 2002 15:43
An: Multiple recipients of list
Betreff: [SWISH-E] Re: Getting a description out of the html <body>
At 04:30 AM 3/27/2002 -0800, Markus Strickler wrote:
>For exampleif my html contains:
><body class=3D"portal" bgcolor=3D"#F0F0F0" text=3D"#000000" =
link=3D"#000000"=20
>vlink=3D"#000000" alink=3D"#000000" leftmargin=3D"0" rightmargin=3D"0"=20
>topmargin=3D"0" bottommargin=3D"0" marginwidth=3D"0" =
marginheight=3D"0">
>
>Swishdescription will start with:
>class=3D"portal" bgcolor=3D"#F0F0F0" text=3D"#000000" link=3D"#000000"=20
>vlink=3D"#000000" alink=3D"#000000" leftmargin=3D"0" rightmargin=3D"0"=20
>topmargin=3D"0" bottommargin=3D"0" marginwidth=3D"0" =
marginheight=3D"0">
Are you saying that swish is returning that for the description?
>Is this a bug? Or did I something wrong in the config?
Yes, look at line 23 of your config file. I hope my ESP powers are
working! Post your config AND a the HTML file, as you shouldn't see the
contents of the tag.
Oh regarding <span>, you might look at XMLClassAttributes in the 2.1
docs. I though I had added a config option to let you define what
attributes to use (instead of hard-coding "class"), and I can't imagine
(at this moment) why that couldn't be extended to HTML parsing.
Bill Moseley
mailto:moseley@hank.org
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Mar 27 15:31:09 2002