Skip to main content.
home | support | download

Back to List Archive

Re: avoid indexing php code

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Dec 19 2001 - 14:14:10 GMT
Hi Klaus,

At 02:03 AM 12/19/2001 -0800, Klaus Hollenbach wrote:
>Is it somehow possible to make swishe not index php-code in a html-file?
>I switched off the IndexComments directive, but the php-code seems still
>be indexed.

"not index"?  What version are you running?  2.1-dev, I hope.  

It's more helpful if you post examples.  I can't make swish index either
php or asp.  Wouldn't you want to spider so that ASP/php can fill in your
pages so you index what people see?

Anyway:

> cat 1.php
<html><head><title>PHP Test</title></head>
<body>
<?php echo "Hello World<p>"; ?>
</body></html>

> cat c
Defaultcontents HTML

> ./swish-e -c c -i 1.php -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'php'   Pos:1  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'test'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
Indexing done!

> cat c
Defaultcontents HTML2
ParserWarnLevel 9
> ./swish-e -c c -i 1.php -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'php'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'test'   Pos:3  Stuct:0x7 ( HEAD TITLE FILE )
1.php:3: error: htmlParseStartTag: invalid element name
<?php echo "Hello World<p>"; ?>
 ^
Indexing done!

 
>
>Is there a similar directive that avoids indexing of non-html-code? E.g.
>everything that is enclosed within asp-style tags
>( <% ... %> ) ?


> cat 1.html
<title>titleword</title>
foo
<% comment %>


> cat c
Defaultcontents HTML
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'titleword'   Pos:1  Stuct:0x3 ( TITLE FILE )
    Adding:[1:swishdefault(1)]   'foo'   Pos:2  Stuct:0x1 ( FILE )
Indexing done!


 > cat c
Defaultcontents HTML
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'titleword'   Pos:1  Stuct:0x3 ( TITLE FILE )
    Adding:[1:swishdefault(1)]   'foo'   Pos:2  Stuct:0x1 ( FILE )
Indexing done!

> cat c
Defaultcontents HTML2
> ./swish-e -c c -i 1.html -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'titleword'   Pos:2  Stuct:0x7 ( HEAD
TITLE FILE )
    Adding:[1:swishdefault(1)]   'foo'   Pos:5  Stuct:0x9 ( BODY FILE )
Indexing done!

> cat c
Defaultcontents HTML2
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'titleword'   Pos:2  Stuct:0x7 ( HEAD
TITLE FILE )
    Adding:[1:swishdefault(1)]   'foo'   Pos:5  Stuct:0x9 ( BODY FILE )
Indexing done!

Now enable libxml2 parser warnings:

 > cat c
Defaultcontents HTML2
ParserWarnLevel 9
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words 
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'titleword'   Pos:2  Stuct:0x7 ( HEAD
TITLE FILE )
1.html:3: error: htmlParseStartTag: invalid element name
<% comment %>
 ^
    Adding:[1:swishdefault(1)]   'foo'   Pos:5  Stuct:0x9 ( BODY FILE )
Indexing done!




Bill Moseley
mailto:moseley@hank.org
Received on Wed Dec 19 14:14:24 2001