Skip to main content.
home | support | download

Back to List Archive

Re: Getting older .cgi scripts to work with

From: Andrew Lord <andrewlord(at)not-real.internode.on.net>
Date: Mon Jun 10 2002 - 15:21:59 GMT
On Monday 10 June 2002 23:01, you wrote:
> At 05:38 AM 06/10/02 -0700, Andrew Lord wrote:
> >In the meantime, I've been working on getting swishdev to index my .php
> >files via the http method.  No problems creating the index but a search of
> >the index only provides a result when looking for words contained in the
> >title of the .php file.  Those contained in any MetaNames are not found at
> >search.

> Best way to get help is to provided some samples:

Hi Bill,

No worries.  Example follows below:

bz23.php generates the following html
-------------------------------------------------------
<html>
<body>
<a href="/bz23.php?id=1">testdoc orderin system</a><br>
<a href="/bz23.php?id=1">nodoc diorder anarch</a><br>
</body>
</html>
---------------------------------------------

bz23.php?id=1 generates the following content from a mysql database.
---------------------------------------------
<html>

<head>
<title>Crazy Thang</title>
<meta NAME="Meta1" VALUE="testword, aardvark">
<meta NAME="Meta2" VALUE="nothing at all">

</head>
<body BACKGROUND="bkgnd.gif" BGCOLOR="#FFFFFF">
<p><a HREF="#Anchor"><font SIZE="2" FACE="Arial">
etc..

bz23.php?id=2 generates the following content from a mysql database.
---------------------------------------------
<html>

<head>
<title>Sane Yang</title>
<meta NAME="Meta1" VALUE="">
<meta NAME="Meta2" VALUE="">

</head>
<body BACKGROUND="bkgnd.gif" BGCOLOR="#FFFFFF">
<p><a HREF="#Anchor"><font SIZE="2" FACE="Arial">
etc..

Swish ====> swish-e-2.1-dev-10-06-02

---------swishdev.conf---------------------------------------
IndexComments yes
ReplaceRules replace "/home/httpd/html/" "//localhost.localdomain/"
MinWordLimit 3
WordCharacters abcdefghijklmnopqrstuvwxyz&SŲ0123456789_\|/-+=?!@$%^'
IgnoreLimit 50 1000
IndexComments 0
IndexReport 1
IndexName swishdev
IndexFile swishdev.swish
IndexDir http://localhost.localdomain/bz23.php
EquivalentServer http://localhost.localdomain/
MaxDepth 10
Delay 1
SpiderDirectory /home/httpd/html/swishdev/src
------------------------------------------------------------------

Indexing is performed at the command line as follows

/home/httpd/html/swishdev/src/swish-e -S http -c 
/home/httpd/html/indexes/swishdev.conf -f 
/home/httpd/html/indexes/swishdev.swish -T indexed_words

Indexing Data Source: "HTTP-Crawler"
Indexing "http://localhost.localdomain/bz23.php"
 Adding:[1:swishdefault(1)] 	'testdoc'	Pos:1	Stuct:0x1 ( FILE )
 Adding:[1:swishdefault(1)] 	'orderin'	Pos:2	Stuct:0x1 ( FILE )
 Adding:[1:swishdefault(1)] 	'system'	Pos:3	Stuct:0x1 ( FILE )
 Adding:[1:swishdefault(1)] 	'nodoc'	Pos:4	Stuct:0x1 ( FILE )
 Adding:[1:swishdefault(1)] 	'diorder'	Pos:5	Stuct:0x1 ( FILE )
 Adding:[1:swishdefault(1)] 	'anarch'	Pos:6	Stuct:0x1 ( FILE )
 Adding:[2:swishdefault(1)] 	'crazy'	Pos:1	Stuct:0x7 ( HEAD TITLE FILE )
 Adding:[2:swishdefault(1)] 	'thang'	Pos:2	Stuct:0x7 ( HEAD TITLE FILE )
 Adding:[3:swishdefault(1)] 	'sane'	Pos:1	Stuct:0x7 ( HEAD TITLE FILE )
 Adding:[3swishdefault(1)] 	'yang'	Pos:2	Stuct:0x7 ( HEAD TITLE FILE )
Removing very common words. . .
 Getting IgnoreLimit stopwords: Complete
no words removed.
Writing main index. . .
Sorting words . . .
Sorting 10 words alphabetically
Writing header . . .
Writing index entries . . .
 Writing word text: Complete
 Writing word hash: Complete
 Writing word data: Complete
10 unique words indexed.
2 properties sorted.
3 files indexed. 3844 totalbytes. 10 total words.
Elapsed time: 00:00:03 CPU time: 00:00:00
Indexing done!

Searching for the word "testword" was done as follows:

/home/httpd/html/swishdev/src/swish-e -f 
/home/httpd/html/indexes/swishdev.swish -w testword

Result was:

# SWISH format: 2.1-dev-25
# Search words: testword
err: no results

Searching for the word "crazy" was done as follows:

/home/httpd/html/swishdev/src/swish-e -f 
/home/httpd/html/indexes/swishdev.swish -w crazy

Result was:

# SWISH format: 2.1-dev-25
# Search words: crazy
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.043 seconds
1000 http://localhost.localdomain/bz23.php?id=1 "testdoc orderin system" 
________________________________________________________________

Please let me know if you require any further info. to help pinpoint the 
problem with indexing of MetaNames.

Cheers,

Andrew Lord
Received on Mon Jun 10 15:25:47 2002