On Thu, 7 May 1998, Brendan Jones wrote:
> If people search on four terms, you're providing four separate context
> outputs for each document - disconnected or potentially overlapping
> slabs of text which swamps the reader - and may provide no more
> illumination than my preferred method of document title and first 100
> words. All IMO, of course.
Of course.
> My interface won't chop out content between <SCRIPT>...</SCRIPT>, but then
> again, no site I maintain has any javascript or applets.
And let everybody else eat cake, I suppose.
> Chopping out content between <SCRIPT>...</SCRIPT> would be a relatively
> trivial perl modification if I ever needed to implement it. Not messy at
> all.
Not as trivial as you think:
<SCRIPT>
document.write( "</SCRIPT>" );
document.write( "oops!" );
</SCRIPT>
Granted, it's a pathelogical case, but my background is in
compilers and it's the pathalogical cases that are the fun
ones. (You have to balance quotation marks to get the above
right.)
> > I personally throw out stopwords alltogether ...
>
> In many cases doing this is fine, but in some situations, the stopword
> adds nonzero value. For example, say someone does an AND search for
> "fee fie foe foo" and "foe" is a stopword that has been thrown out because
> it is too common.
>
> In my view, the correct way to implement this search is to first find all
> documents which contain fee, fie and foo. So far, this is what you've
> implemented. But that output could be further refined (and correctly
> refined) by then throwing out any documents in this list which do NOT also
> contain foe, even though foe is a stopword.
You're missing the point: it a word is so common as to be
considered a stopword, then most of the results will be thrown
out.
> > SWISH++ discards comments.
>
> Well, you've sold me on swish++ rather than swish-e on this alone!
Incidentally, do you know if:
<!-- "-->" -->
is a valid comment in HTML? (I had to look this one up
myself.) My interpretation of the HTML 4.0 spec. is that it is
NOT a valid comment since quotes are not to be treated
specially, i.e., balanced.
- Paul J. Lucas
NASA Ames Research Center Caelum Research Corporation
Moffett Field, California San Jose, California
<pjl AT ptolemy DOT arc DOT nasa DOT gov>
Received on Thu May 7 23:56:09 1998