Hello,
I'm using swish-e to index standard HTML files. Sometime the metatag
"description" contains HTML special character for exemple:
<title>Ingénieur</title>
<meta name="description" content="La formation d'Ingénieur en Génie
Electromécanique permet d'avoir des compétences polyvalentes qui réponds
aux profils de postes dans nombreux domaines, entre autres : -Conception
et fabrication mécanique -Systèmes électriques, électrotechniques et
automatismes industriels -Mécanique des matériaux et des structures
-Energétique et mécanique des fluides -Gestion de production, qualité,
maintenance et contrôle -Mettre en place un système de management selon
les normes "ISO 9001:2000" "ISO14001:2004"
"ISO/TS16949:2002" ">
In this case the description stored contains &qu because it is
truncated. This make the XML parser on top of it crash.
I guess the truncation process cannot be changed to take in accound
special entites or unicode entities so I tried to increase the size of
the stored decription with
PropertyNamesMaxLength 1000 description
but It seems this not the way. I couldn't manage to use the
StoreDescription HTML on the tag meta description.
Here is my conf file :
----------------------------------------------------------------
IndexFile ../data/resume.index
# StoreDescription HTML2 1000
IndexDir toIndex/
IndexContents HTML2 *
TranslateCharacters
:ascii7:
FollowSymLinks no
ReplaceRules replace
"toIndex/([0-9]*)/([0-9]*)/([0-9]*).([a-zA-Z]{1,2}*)" "$3"
MetaNames swishtitle description lastupdate
PropertyNamesMaxLength 1000 description
PropertyNames description lastupdate
----------------------------------------------------------------
Thanks for support
xav
Received on Tue Aug 30 07:38:35 2005