David Brown wrote on 9/1/10 8:26 PM:
> Side question: I thought I read somewhere that swish-e's HTML parser will
> weight text in headings more heavily than in regular text, but I'm not
> finding that in the documentation. Is this in fact the case? If not, then I
> might as well just stick with pdftotext. If it does, then I'll try harder
> to get pdftohtml 0.40a compiled or look into any alternatives you all might
you can limit or weigh matching terms based on their context (MetaName). But the
parser doesn't do anything special by default.
fwiw, I use pdftotext and pdfinfo via the SWISH::Filters::Pdf2HTML filter.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Wed Sep 1 23:30:07 2010