Skip to main content.
home | support | download

Back to List Archive

Re: contract work for a site search utility

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Mar 03 2004 - 21:05:23 GMT
[back to the list]

On Wed, Mar 03, 2004 at 12:31:14PM -0800, Gil Vidals wrote:
> Well if it will only take  you a minute, could you do it cheaper than by the
> day ;-)If it's as easy as you say can you just show me how this is done?

How what is done?  Indexing?

moseley@bumby:~$ cat c
HTMLLinksMetaName links

moseley@bumby:~$ cat 1.html
<html>
<head>
<title>Title</title>
</head>
<body>

text <a href="http://www.abc.com">abc site</a>

</body>

moseley@bumby:~$ swish-e -c c -i 1.html -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'title'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:links(10)]   'http'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'www'   Pos:6  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'abc'   Pos:7  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'com'   Pos:8  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'text'   Pos:9  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'abc'   Pos:10  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'site'   Pos:11  Stuct:0x9 ( BODY FILE )

ok, so the link http://www.abc.com was indexed as three works (that can
be changed by WordCharacters but I like being able to search for
"abc.com" and still find it.

So to search:

moseley@bumby:~$ swish-e -w 'links=("www.abc.com")' -H0
1000 1.html "Title" 108

> It should search <a href> tags; however, javascript links should be searched
> as well.

All bests are off with javascript.  You need a javascript interpreter to
figure that out.  If they are simple you could filter the files and
convert the javascript links into something that swish-e can index (i.e.
convert it to a meta tag).

You can use the included swish.cgi or search.cgi examples for creating a
search interface.  Look at http://search.apache.org/ -- it has a way to
search "HTML Links".

> The code should search the entire site up to N pages deep.

Filter results by number of path segments.


> 
> 
> 
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, March 03, 2004 12:28 PM
> To: Gil Vidals
> Cc: Multiple recipients of list
> Subject: Re: contract work for a site search utility
> 
> 
> On Wed, Mar 03, 2004 at 12:12:27PM -0800, Gil Vidals wrote:
> > I've downloaded and studied Swish-e. My company, Position Research, has a
> > small project which involves locating a given URL on a given website. For
> > example, use Swish-e to see if the url www.123.com is anywhere on the site
> > www.abc.com. If it is, then return the page from www.abc.com where the
> link
> > to www.123.co was found.
> 
> You mean search href tags?
> 
> > Let me know if you are interested and approximately how many hours of work
> > is required to produce the perl code.
> 
> HTMLLinksMetaName links
> 
> Less than a minute.  But I charge by the day.  Invoice to follow.
> 
> Or do you mean something more custom than that?
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> 
> 
> 
> 

-- 
Bill Moseley
moseley@hank.org
Received on Wed Mar 3 13:05:24 2004