On Wed, Jun 25, 2003 at 08:29:17AM -0500, Cleveland@mail.winnefox.org wrote:
> > If you don't want to index a page then use robots.txt or a
> > meta robots tag to say don't follow links.
>
> What we have is a directory of pdf files. There are about 10 of them we
> don't want indexed, but they are linked on a browse.html page that has
> links to all the files. I don't know much about pdf files. Is there a
> way to put meta tags in them?
As in <meta> noindex tags? No, that's HTML. (PDFs can have meta data
associated with them, though, so I suppose there might be a way to
check the PDF after converting it to text/html.)
If what you want is to avoid indexing the pdf files in a directory then
I'd probably use a robots.txt file. Unfortunately, you cannot use
regular expressions in that file.
<web_root>/robots.txt:
----------------------
User-agent: *
# don't allow spidering in the /pdfdocs direcotry
Disallow: /pdfdocs/
# don't allow spidering of these specific files:
Disallow: /otherpdfdir/pdfdoc.pdf
Disallow: /otherpdfdir/pdfdoc2.pdf
Or if you want more control, just add tests in the "test_url" callback
function. Then you can use regular expressions.
return if $uri->path =~ m[^/pdfdocs/\w+\.pdf$];
Will either of those work for you?
--
Bill Moseley
moseley@hank.org
Received on Wed Jun 25 14:06:53 2003