Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Swish-2.4.5 index problem on a typo3 website

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Oct 30 2008 - 03:19:04 GMT
Einhorn Stephan wrote on 10/27/08 5:03 AM:
> > Hello Peter,
> >
> > Thanks for your answer, but....
> > I already have a Title meta on the source :
> >
> > A example page :

[snip]

When I took your example page and made a real html file out of it, and then
indexed it, the title showed up as expected:

% cat typo.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<!--     This website is powered by TYPO3 - inspiring people to share!
    TYPO3 is a free open source Content Management Framework initially created
by Kasper Skaarhoj and licensed under GNU/GPL.
    TYPO3 is copyright 1998-2006 of Kasper Skaarhoj. Extensions are copyright of
their respective owners.
    Information and contribution at http://typo3.com/ and http://typo3.org/
--><base href="http://www.xxx.fr/">
<link rel="stylesheet" type="text/css" href="typo3temp/stylesheet_65e620f356.css">
<link rel="stylesheet" type="text/css" href="fileadmin/crfc_tpl/css/css_xxx.css">
<link rel="alternate" type="application/rss+xml" title="CRFC - Actualités"
href="/index.php?id=751&amp;type=100">
<link rel="alternate" type="application/rss+xml" title="CRFC - Agenda"
href="/index.php?id=752&amp;type=100">
<meta name="verify-v1" content="wz0hdhyJ69K4tHZ1oVd+bWlOjX6q/rF8xfV86me15HA=">
<link rel="schema.dc" href="http://purl.org/metadata/dublin_core_elements">
<title>L'organisation</title>
<meta name="generator" content="TYPO3 4.1 CMS">
<script type="text/javascript"
src="typo3temp/javascript_757c080409.js"></script><script type="text/javascript"
src="t3lib/jsfunc.menu.js"></script>
</head>
<body>
hello world
</body>
</html>

% swish-e -i typo.html
...
% swish-e -w hello
# Run time: 0.006 seconds
1000 typo.html "L'organisation" 1428


That tells me that the default config works as expected, so either (a) there is
something in your config that is throwing off the title identification and/or
(b) your html really doesn't look like the example you sent.

Try a single test document, like I did, with your config and see if you can
reproduce your problem. If you can't reproduce it, try some more test docs,
adding a few at a time till you can reproduce the problem.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 30 23:39:02 2008