Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Returning document title rather than file name in search results?

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Mar 13 2009 - 02:39:18 GMT
Greg Keith wrote on 3/11/09 4:14 PM:
I want the document title returned as the first link, if there
> is one - most of the documents I'm indexing are HTML, so there should
> be a <title> tag for most of them. I am not clear on how to do this -
> it looks like it should be the proper combination of specifying the
> title_property in swish.cgi and the MetaNames directive in my
> swish.conf. However, I don't know what the proper combination is - I
> tried  not having any MetaNames directive in the swish.conf, and
> having title_property set to "title" rather than "swishtitle", but
> this just produces a "(null)" result for each document found. My
> swish.conf and swish.cgi are below.
> 
> Can anyone enlighten me?
> 

The MetaNames config option is irrelevant in this case. MetaNames are for
limiting a query to certain *contexts*. PropertyNames are for returning
*contents* of hits.

The best thing to do is find a document you think *should* be returning a title
and isn't, and then make a test case with it. Here's an example:

[karpet@pekmac:~/tmp]$ swish-e -i title.html
Indexing Data Source: "File-System"
Indexing "title.html"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 6 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
6 unique words indexed.
4 properties sorted.
1 file indexed.  94 total bytes.  6 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
[karpet@pekmac:~/tmp]$ swish-e -w hello
# SWISH format: 2.5.6
# Search words: hello
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 title.html "this is the title" 94
.
[karpet@pekmac:~/tmp]$ cat title.html
<html>
 <head>
  <title>this is the title</title>
 </head>
 <body>hello world</body>
</html>


What you'll probably find, in the case of your HTML anyway, is that the swish-e
HTML parser isn't finding your <title> tagset for some reason: it isn't there,
or is named slightly differently, or...

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 12 22:39:13 2009