Skip to main content.
home | support | download

Back to List Archive

RE: StoreDescription / swishdescription field parsing wrong meta tags

From: Tref Gare <TrefG(at)not-real.areeba.com.au>
Date: Tue Dec 17 2002 - 02:13:13 GMT
Copying the file in question here.

PS: I've tried to address the wrapping issues of my outlook, hopefully
this is better (wrapping at 132chars), but on the face of it I can't
seem to get the verdammt thing to stop wrapping altogether.  


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><META NAME="GENERATOR" CONTENT="PageID 313 - generated by RedDot
4.5 (SP3) - 4.5.3.14 - 2-K5b" />
 <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
 <meta http-equiv="imagetoolbar" content="no">
 <meta http-equiv="MSThemeCompatible" content="no">
 <meta name="MSSmartTagsPreventParsing" content="true">
 <!-- metadata -->
 <title>cinemas</title>
 <meta name="DC.Title" lang="en" content="cinemas">
 <meta name="DC.Subject" scheme="to be advised before development"
content="cinemas">
 <meta name="keywords" content="cinemas">
 <meta name="DC.Description" lang="en" content="cinemas">
 <meta name="Description" content="cinemas">
 <meta name="DC.Creator" lang="en" content="corporateName=Australian
Centre for the Moving Image; address=Federation Square, Melbourne, VIC;
contact=+61 3 8663 2200">
 <meta name="DC.Publisher" lang="en" content="corporateName=Australia
Centre for the Moving Image">
 <meta name="DC.Date.modified" scheme="ISO8601" content="2002-11-12">
 
 <meta name="DC.Availability " LANG="en" content="contact=Contact the
webmaster on +61 3 8663 2200">
 <meta name="DC.Language" scheme="RFC3066" content="en">
 <meta name="AGLS.Mandate.act" content="Film Act 2001 Act No. 87/2001">
 <meta name="DC.Relation.IsPartOf" scheme="URI"
content="http://www.acmi.net.au/">
 <meta name="DC.Rights" scheme="URI"
content="http://www.acmi.net.au/copyright.html">
 <meta name="robots" content="index,follow">
 <meta http-equiv="PICS-RATING" content="">
 <!-- metadata -->
 <link rel="stylesheet" type="text/css" href="/global/style/acmi.css">
 <style type="text/css"><!--
  @import "/global/style/acmi-ex.css";
 -->
 </style>
 <link rel="stylesheet" type="text/css"
href="/global/style/experience.css">
 <script language="JavaScript1.2" src="/global/js/fixStyle.js"
type="text/javascript"></script>
 <script language="JavaScript1.2" src="/global/js/lib.js"
type="text/javascript"></script>
 <script language="JavaScript1.2" src="/global/js/calendar.js"
type="text/javascript"></script>
 <script language="JavaScript1.2" src="/global/js/events-calendar.js"
type="text/javascript"></script>

</head>
<body marginwidth="0" marginheight="0">
<center>
<!-- Top navigation -->
<table width="763" border="0" cellspacing="0" cellpadding="0"
height="95">
  <tr> 
     <td rowspan="2" width="152" valign="bottom"><a
href="/index.htm"><IMG src="/global/images/logo_acmi.gif"
alt="Australian Centre for the Moving Image" border="0" width="139"
height="73"></a></td>
     <td id="searchCell" width="611" height="49" align="right"
valign="top" background="/global/images/nav/bg_exp_menubanner.jpg"><form
method="post" action="/search/searchKeyword.jsp" name="search"><input 
    type="text" name="txt_search" size="8" class="search"
id="txt_search" value=""
    title="Enter search text" />&#160;<select 
    name="catsel" class="search" id="searchwithin" title="search
within">
        <option>acmi website</option>
        <option>lending collection</option>
        <option>NFVLS collection</option>
    </select>&#160;<input type="image" 
    src="/global/images/nav/but_exp_search.gif" border="0" width="68"
height="22" alt="search" align="top" id="search" /></form></td>
  </tr>
  <tr> 
    <td width="611" valign="top"><nobr><A
href="/experience/experience.htm"><IMG
src="/global/images/nav/but_exp_nav_experience.gif" alt="experience"
border="0" width="124" height="46"></A><A href="/belong/belong.htm"><IMG
src="/global/images/nav/but_exp_nav_belong.gif" alt="belong" border="0"
width="81" height="46"></A><A href="/borrow/borrow.htm"><IMG
src="/global/images/nav/but_exp_nav_borrow.gif" alt="borrow" border="0"
width="75" height="46"></A><A href="/learn/learn.htm"><IMG
src="/global/images/nav/but_exp_nav_learn.gif" alt="learn" border="0"
width="57" height="46"></A><A href="/play/play.htm"><IMG
src="/global/images/nav/but_exp_nav_play.gif" alt="play" border="0"
width="57" height="46"></A><A href="/buy/buy.htm"><IMG
src="/global/images/nav/but_exp_nav_buy.gif" alt="buy" border="0"
width="52" height="46"></A><A href="/venuehire/venueHire.jsp"><IMG
src="/global/images/nav/but_exp_nav_venuehire.gif" alt="venue hire"
border="0" width="91" height="46"></A><A href="/about/about.htm"><IMG
src="/global/images/nav/but_exp_nav_about.gif" alt="about" border="0"
width="74" height="46"></A></nobr></td>
  </tr>
</table>
<!-- Navigation context -->
<table width="763" height="40" border="0" cellspacing="0"
cellpadding="0">
  <tr> 
 <td width="13" height="40">&#160;</td>
    <td width="203" height="40"
background="/global/images/bg_breadcrumb1.gif">&#160;</td>
    <td width="547" height="40"
background="/global/images/bg_breadcrumb2.gif" class="navContext"
valign="top"><navContext3B7232131C0F454ABED319E13F819A15>cinemas</navCon
text3B7232131C0F454ABED319E13F819A15></td>
  </tr>
</table>
<table width="750" border="0" cellspacing="0" cellpadding="0">
  <tr>
      <td width="187" valign="top" id="tblSidebar">
          <table width="187" border="0" cellspacing="0" cellpadding="0">
  <tr>
      <td width="187" height="480" valign="top" align="left"
background="/global/images/nav/bg_exp_sidenav.gif">
        <ul id="sideMenuFF3BA41323624B5296D4D54CCCFE5621" class="menu">
            <li class="menuSection">
        <a href="/about/calendar.htm" class="menuSection">acmi events
calendar</a>
        <ul class="menuSectionCurr">
            
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/exhibitions.htm"
class="menuSection">exhibitions</a>
        <ul class="menuSectionCurr">
            
                <li class="menuSubsection">
                    <a
href="/about/84DBC3DEBB4945E8AA6B7CA67FED59FF.htm"
class="menuSubsection">AS TEST</a>
                </li>
                <li class="menuSubsection">
                    <a
href="/about/546910BACB6C4C94B450920789F571DA.htm"
class="menuSubsection">TEST EVENT 2</a>
                </li>
                <li class="menuSubsection">
                    <a
href="/about/116EC59F96744661907B4E8C6EDA370E.htm"
class="menuSubsection">deep space</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/comingsoon_exhibitions.htm"
class="menuSubsection">coming soon</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/exhibitions_archive.htm"
class="menuSubsection">archive</a>
                </li>
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/cinemas.htm" class="menuSection">cinemas</a>
        <ul class="menuSectionCurr">
            
                <li class="menuSubsection">
                    <a href="/about/cinemas_archive.htm"
class="menuSubsection">archive</a>
                </li>
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/events_programs.htm" class="menuSection">events
& programs</a>
        <ul class="menuSectionCurr">
            
                <li class="menuSubsection">
                    <a href="/about/talks_forums.htm"
class="menuSubsection">talks & forums</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/workshops.htm"
class="menuSubsection">production workshops</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/special_events.htm"
class="menuSubsection">special events</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/school_programs.htm"
class="menuSubsection">school programs</a>
                </li>
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/visit_acmi.htm" class="menuSection">visit</a>
        <ul class="menuSectionCurr">
            
                <li class="menuSubsection">
                    <a href="/about/locations_transport.htm"
class="menuSubsection">locations & transport</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/specialneeds_access.htm"
class="menuSubsection">special needs access</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/tours.htm"
class="menuSubsection">tours</a>
                </li>
                <li class="menuSubsection">
                    <a href="/about/virtual_tour.htm"
class="menuSubsection">virtual tour</a>
                </li>
                <li class="menuSubsection">
                    <a
href="/about/06C348106CCB47D1AAB6E01261A93877.htm"
class="menuSubsection">example page</a>
                </li>
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/exhibition_collection.htm"
class="menuSection">exhibition collection</a>
        <ul class="menuSectionCurr">
            
                <li class="menuSubsection">
                    <a href="/about/exhibition_collection_works.htm"
class="menuSubsection">selected works</a>
                </li>
        </ul>
    </li>

    
<li class="menuSection">
        <a href="/about/1EA7EDB822E94D31A1B676BDF10E3A36.htm"
class="menuSection">e-cards</a>
        <ul class="menuSectionCurr">
            
        </ul>
    </li>

    

        </ul><!-- end side menu -->
        
        
    </td>
  </tr>
  </table>
      </td>
      <td width="563" id="tblContent">
        <!-- content with rhs -->
        <table width="563" border="0" cellspacing="0" cellpadding="0">
        <tr>
         <td valign="top"><br>
        
        
        
         <h1>cinemas</h1>
          ACMI's cinemas celebrate the vibrancy and diversity of the
moving image, engaging audiences with dynamic screening programs and
events. 
<P>ACMI and our partners present a program of regular screenings for
visitors to enjoy every week or on a drop-in basis.</P>
<P>Special seasons and events are also presented by ACMI and other
screen culture organisations, production houses, community and cultural
groups, educational institutions and corporations.</P>
<h2></h2>



<table border="0" cellpadding="0" cellspacing="0">
     <tr>
      <td style="text-align:right"><a href="/about/cinemas.htm"><br />
           <IMG src="/global/images/buttons/but_more.gif" alt="more"
border="0" width="71" height="25"></a></td>
      <td width="10"><div style="width:10px;"><spacer type="block"
width="10" height="1" /></div></td>
      <td><h3 class="tightVertical"><a
href="/about/cinemas_archive.htm"></a></h3>
       <p class="tightVertical"></p>
      </td>
     </tr>
</table><br />


<table border="0" cellpadding="0" cellspacing="0">
     <tr>
      <td style="text-align:right"><img border="0" alt="<%EventType%>"
src="/experience/images/ico_cinema.gif" width="11" height="13"><br><a
href="/about/5621150FDD9D48F0AAE734F2584CCAC8.htm"><img
src="/experience/images/event_thumbs/driving.jpg" width="70" height="70"
alt="Event Thumbnail" border="0" /><br />
           <IMG src="/global/images/buttons/but_more.gif" alt="more"
border="0" width="71" height="25"></a></td>
      <td width="10"><div style="width:10px;"><spacer type="block"
width="10" height="1" /></div></td>
      <td><h3 class="tightVertical"><a
href="/about/116EC59F96744661907B4E8C6EDA370E.htm">Run Lola Run</a></h3>
       <p class="tightVertical">Iris (Samantha Morton), engulfed by
grief and a sense of maternal abandonment, surrenders herself to a
series of increasingly reckless and self-abasing encounters. Barry
Akroyd's verite camerawork infuses the film with a subjectivity that
potently transmi</p>
      </td>
     </tr>
</table><br />

           </td>
           <td width="20"><div style="width:20px;"><spacer type="block"
width="20" height="1" /></div></td>
           <td valign="top">
             
             <table width="155" border="0" cellspacing="0"
cellpadding="0">
       <tr>
         <td colspan="2"><img src="/global/images/img_cpanel_top.gif"
width="155" height="32" alt="" /></td>
       </tr>
       <tr>
         <td width="21"><img src="/global/images/img_cpanel_side.gif"
width="21" height="145" alt="" /></td>
         <td width="134">
            
            
         </td>
       </tr>
     </table>
             <br />
             <div style="width:20px;"><spacer type="block" width="20"
height="1" /></div>
           </td>
         </tr>
       </table>
       <!-- end content table with rhs -->
    </td>
  </tr>
  <tr>
    <td>&#160;</td>
    <td align="left">
        <table width="408" border="0" cellspacing="0" cellpadding="0">
        <tr> 
           <td class="footer" width="54"></td>
           <td class="footer" width="72"><a href="/about/contact.jsp"
class="footer">contact us</a></td>
           <td class="footer" width="57"><a href="/about/privacy.htm"
class="footer">privacy</a></td>
           <td class="footer" width="69"><a href="/about/copyright.htm"
class="footer">copyright</a></td>
           <td class="footer" width="103"><a href="/about/terms.htm"
class="footer">terms of use</a></td>
           <td class="footer" width="53"><IMG
src="/global/images/logo_vicgov.gif" alt="Victoria The place to be"
border="0" width="39" height="28"></td>
        </tr>
       </table>
     </td>
  </tr>
</table>
<p>&#160;</p>
</center>
</body>
</html>


------------------------------------------------------
Tref Gare
Development Consultant
Areeba
Level 19/114 William St, Melbourne VIC 3000
email: trefg@areeba.com.au
phone: +61 3 9642 5553
fax: +61 3 9642 1335
website: http://www.areeba.com.au
------------------------------------------------------
"This email is intended only for the use of the individual or entity
named above and contains information that is confidential. No
confidentiality is waived or lost by any mis-transmission. If you
received this correspondence in error, please notify the sender and
immediately delete it from your system. You must not disclose, copy or
rely on any part of this correspondence if you are not the intended
recipient. Any communication directed to clients via this message is
subject to our Agreement and relevant Project Schedule. Any information
that is transmitted via email which may offend may have been sent
without knowledge or the consent of Areeba."
------------------------------------------------------

-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org] 
Sent: Tuesday, 17 December 2002 12:16 PM
To: Tref Gare; Multiple recipients of list
Subject: Re: [SWISH-E] StoreDescription / swishdescription field parsing
wrong meta tags

At 04:59 PM 12/16/02 -0800, Tref Gare wrote:
>Hi all and thanks again for any assistance anyone may be able to give.
>
>I'm indexing a bunch of html files (alongside some pdfs and jsp) and am
>having trouble getting the StoreDescription to work quite as I'd
expect.

I can't reproduce.  Can you generate an example like this that shows the
problem?  

(if you can turn off wrapping in your mail program it makes it easier to
cut-n-paste - thanks.)

$ cat 1.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><META NAME="GENERATOR" CONTENT="PageID 531 - generated by RedDot
4.5 (SP3) - 4.5.3.14 - 2-K5b" />
 <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
 <meta http-equiv="imagetoolbar" content="no">
 <meta http-equiv="MSThemeCompatible" content="no">
 <meta name="MSSmartTagsPreventParsing" content="true">
 <!-- metadata -->
 <title>Run Lola Run</title>
 <meta name="DC.Title" lang="en" content="Run Lola Run">
 <meta name="DC.Subject" scheme="to be advised before development"
content="Run Lola Run">
 <meta name="keywords" content="Run Lola Run">
 <meta name="DC.Description" lang="en" content="Run Lola Run">
 <meta name="Description" content="Run Lola Run">
 <meta name="DC.Creator" lang="en" content="corporateName=Australian
Centre for the Moving Image; address=Federation Square, Melbourne, VIC;
contact=+61 3 8663 2200">
 <meta name="DC.Publisher" lang="en" content="corporateName=Australia
Centre for the Moving Image">
 <meta name="DC.Date.modified" scheme="ISO8601" content="2002-11-21">
</head>
<body>
  Body text
</body>
</html>


$ cat c
parserwarnlevel 9
IndexContents HTML2 .htm .html .jsp 
StoreDescription HTML2 <body> 120


$ ./swish-e -c c -i 1.html -T properties -v0
          swishdocpath: 6 (  6) S: "1.html"
            swishtitle: 7 ( 12) S: "Run Lola Run"
          swishdocsize: 8 (  4) N: "1184"
     swishlastmodified: 9 (  4) D: "2002-12-16 17:07:50"
      swishdescription:10 (  9) S: "Body text"

$ ./swish-e -w not dkdk -x '%d\n' -H0
Body text

$ ./swish-e -w not dkdk -x '<swishdescription>\n' -H0   
Body text
-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Dec 17 02:16:49 2002