[I'm cc'ing back to the list]
On Fri, Jan 16, 2004 at 11:39:06AM -0500, Julie Wetherill wrote:
>
> Followed your instructions. Wouldn't ya know, with just MetaNames and no
> PropertyNames, I can retrieve on the metaname "description". Just FYI, the
> instruction
Yep. That would have been an obvious bug, I think.
> swish-e.new -T index_metanames (note that my executable is "swish-e.new"
> rather than swish-e)
>
> causes a segmentation fault. Don't know why. Seems like this could be a
> helpful command if I could get it to work.
Dave, can you test this on Windows?
> Anyway, I do have a related problem that maybe you can explain. I need to
> retrieve on metadata imbedded in PDFs. Adobe uses Dublin Core tags
> (dc:description, dc:title, dc:creator). I can't get swish-e to recognize
> these as metanames (whether these are in PDFs or in HTML).
$ cat c
MetaNames dc:description
$ cat 1.html
<html>
<head><title><b>title</title>
<meta name="dc:description" content="=foo">
</head>
<body>
hello
</body>
</html>
$ swish-e -c c -i 1.html -v0 -T indexed_words
Adding:[1:swishdefault(1)] 'b' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'title' Pos:3 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:dc:description(10)] 'foo' Pos:6 Stuct:0x85 ( META HEAD FILE )
Adding:[1:swishdefault(1)] 'hello' Pos:9 Stuct:0x9 ( BODY FILE )
$ swish-e -w dc:description=foo
# SWISH format: 2.4.1
# Search words: dc:description=foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.043 seconds
1000 1.html "<b>title" 125
.
> Warning: Substituted possible embedded null character(s) in file
> '/home/hul/htdocs/ois/systems/aleph/docs/test/serial_claiming_in_Aleph.pdf'
Looks like you are not filtering the pdf files.
--
Bill Moseley
moseley@hank.org
Received on Fri Jan 16 16:57:38 2004