Skip to main content.
home | support | download

Back to List Archive

RE: StoreDescription / swishdescription field parsing wrong meta tags

From: Tref Gare <TrefG(at)not-real.areeba.com.au>
Date: Tue Dec 17 2002 - 02:09:08 GMT
Thanks Bill,

I've done some testing on the format you suggested and have worked out
that it's the jsp files that are causing the grief.

Ie: if I try to get a swishdescription from identical files, one a jsp
and the other an html then the html description works fine but the jsp
uses the first metatag it finds rather than the body tag

          swishdocpath: 6 ( 38) S:
"http://devbox:88enuehire/venueHire.jsp"
            swishtitle: 7 ( 10) S: "venue hire"
          swishdocsize: 8 (  4) N: "0000000009138"
     swishlastmodified: 9 (  4) D: "2002-11-26 11:01:15"
      swishdescription:10 (120) S: "PageID 158 - generated by RedDot 4.5
(SP3) - 4.5.3.14 - 2-K5b true venue hire venue hire acmi, venue hire
acmi, venue hi"
           description:12 ( 16) S: "acmi, venue hire"


Tref

------------------------------------------------------
Tref Gare
Development Consultant
Areeba
Level 19/114 William St, Melbourne VIC 3000
email: trefg@areeba.com.au
phone: +61 3 9642 5553
fax: +61 3 9642 1335
website: http://www.areeba.com.au
------------------------------------------------------
"This email is intended only for the use of the individual or entity
named above and contains information that is confidential. No
confidentiality is waived or lost by any mis-transmission. If you
received this correspondence in error, please notify the sender and
immediately delete it from your system. You must not disclose, copy or
rely on any part of this correspondence if you are not the intended
recipient. Any communication directed to clients via this message is
subject to our Agreement and relevant Project Schedule. Any information
that is transmitted via email which may offend may have been sent
without knowledge or the consent of Areeba."
------------------------------------------------------

-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org] 
Sent: Tuesday, 17 December 2002 12:16 PM
To: Tref Gare; Multiple recipients of list
Subject: Re: [SWISH-E] StoreDescription / swishdescription field parsing
wrong meta tags

At 04:59 PM 12/16/02 -0800, Tref Gare wrote:
>Hi all and thanks again for any assistance anyone may be able to give.
>
>I'm indexing a bunch of html files (alongside some pdfs and jsp) and am
>having trouble getting the StoreDescription to work quite as I'd
expect.

I can't reproduce.  Can you generate an example like this that shows the
problem?  

(if you can turn off wrapping in your mail program it makes it easier to
cut-n-paste - thanks.)

$ cat 1.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><META NAME="GENERATOR" CONTENT="PageID 531 - generated by RedDot
4.5 (SP3) - 4.5.3.14 - 2-K5b" />
 <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
 <meta http-equiv="imagetoolbar" content="no">
 <meta http-equiv="MSThemeCompatible" content="no">
 <meta name="MSSmartTagsPreventParsing" content="true">
 <!-- metadata -->
 <title>Run Lola Run</title>
 <meta name="DC.Title" lang="en" content="Run Lola Run">
 <meta name="DC.Subject" scheme="to be advised before development"
content="Run Lola Run">
 <meta name="keywords" content="Run Lola Run">
 <meta name="DC.Description" lang="en" content="Run Lola Run">
 <meta name="Description" content="Run Lola Run">
 <meta name="DC.Creator" lang="en" content="corporateName=Australian
Centre for the Moving Image; address=Federation Square, Melbourne, VIC;
contact=+61 3 8663 2200">
 <meta name="DC.Publisher" lang="en" content="corporateName=Australia
Centre for the Moving Image">
 <meta name="DC.Date.modified" scheme="ISO8601" content="2002-11-21">
</head>
<body>
  Body text
</body>
</html>


$ cat c
parserwarnlevel 9
IndexContents HTML2 .htm .html .jsp 
StoreDescription HTML2 <body> 120


$ ./swish-e -c c -i 1.html -T properties -v0
          swishdocpath: 6 (  6) S: "1.html"
            swishtitle: 7 ( 12) S: "Run Lola Run"
          swishdocsize: 8 (  4) N: "1184"
     swishlastmodified: 9 (  4) D: "2002-12-16 17:07:50"
      swishdescription:10 (  9) S: "Body text"

$ ./swish-e -w not dkdk -x '%d\n' -H0
Body text

$ ./swish-e -w not dkdk -x '<swishdescription>\n' -H0   
Body text
-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Dec 17 02:12:57 2002