On 8 Sep 2007 at 19:36, Peter Karman wrote:
> If you're using the spider.pl or DirTree.pl with -S prog, then yes, you
> can filter the content with a regex and output additional <meta> tags
> with the content.
I'm planning to do a -prog thing that would do its own xml-parsing
and pass just plain text for swish to index. Is it possible to
produce meta-fields in this scenario? The text would not have any
tags.. no "<" or ">" .. well, of course I could write them, but seems
like a waste to have swish parse it for xml a second time,
Something like outputting:
Path-Name: MYPATH
Content-Lines: NUBWER_OF_LINES
Last-Mtime: $mtime
Document-Type: TEXT
Meta: Subject=MYSUBJECT
Meta: AUTHOR=MYAUTHOR
DOCUMENT-CONTENT-TEXT
(I changed the content-length -header wishfully to content-lines,
as calculating the number of bytes swish thinks the file contains can be a
bit tedios if I have lines ending in crlf, and others with just cr or lf..
number of lines would be much easier. Also for swish, i think, if it reads
the input line-by-line. But this is not so important)
.Timo
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Sep 10 01:03:04 2007