On Fri, 4 Dec 1998, Jacques Delsemme wrote:
> 1. I had to increase the number of characters read to 2048 characters,
OK.
> 2. By the same token, I've decreased the number of words returned to no
> more than 50.
I'll make this a parameter to the function.
> 3. I've inserted the line:
>
> s/<!--.*-->//gi; # remove comments tags
>
> to remove comments tags. I do this first.
I don't understand why. The line:
s!<.*?>!!g;
in my code will remove comments also. I don't see why it has to
be done first. Please explain.
> 4. You are using the "description" meta tag to extract the description of the
> page. Is this use universal?
Probably not universal, but fairly common. See:
http://www.w3.org/TR/REC-html40/appendix/notes.html#recs
under "Provide keywords and descriptions." Although it says:
The value of the name attribute sought by a
search attribute is not defined by this
specification.
the example given uses "description." For what it's worth,
AltaVista uses "description"; see:
http://www.altavista.com/av/content/addurl_meta.htm
Excite doesn't use META tags at all. Hotbot points one to:
http://searchenginewatch.internet.com/webmasters/meta.html
that also uses "description." (They also point you to a
"Search Engine Features" page, but that page states that
AltaVista doesn't use META tags which is wrong.)
There is also the "Dublin Core" set of names:
http://purl.oclc.org/dc/
All of their names start with "DC." so their description would
look like:
<META NAME="DC.description" CONTENT="blah blah">
I've changed the regular expression in the Perl function to
allow an optional "DC." before "description":
(?:DC\.)description
- Paul
Received on Fri Dec 4 15:02:23 1998