On Sun, Oct 10, 2004 at 04:46:03PM -0700, Mark Greenaway wrote:
> I thought I did show everything. You must mean the output:
I mean everything. Cat all your input files, show the commands you
run for indexing and their output and the command you use for testing.
Ok, so in your last email you had (my reformatting):
File: nacl.config
MetaNames outputs organisation strategy domain mission hqcountry countries web email
PropertyNames outputs organisation strategy domain mission hqcountry countries web email
SwishProgParameters nacl.pl
IndexDir spider.pl IndexFile index.nacl
> /swish-e -c nacl.config -i site4.html -v0 -T properties
Ok, but that's not indexing using spider.pl. That's why I want to see
everything you type.
> swishdocpath: 6 ( 10) S: "site4.html"
> swishdocsize: 8 ( 4) N: "724"
> swishlastmodified: 9 ( 4) D: "2004-10-09 09:51:16 EST"
> outputs:19 ( 43) S: "Papers Journals Newsletters Policy
> Research"
> organisation:20 ( 5) S: "Site4"
> strategy:21 ( 18) S: "research education"
> domain:22 ( 23) S: "government politics law"
> mission:23 ( 28) S: "To influence decision makers"
> hqcountry:24 ( 9) S: "Australia"
> countries:25 ( 9) S: "Australia"
> web:26 ( 23) S: "http://www.site4.org.au"
> email:27 ( 16) S: "jim@site4.org.au"
Well, that's very odd, isn't it.
moseley(at)not-real.bumby:~$ wget --quiet http://incres.anu.edu.au/site4.html
moseley@bumby:~$ ls -l site4.html
-rw-r--r-- 1 moseley moseley 724 2004-10-07 20:50 site4.html
moseley@bumby:~$ grep hqcountry site4.html
<meta name="hqcountry" content="Australia">
moseley@bumby:~$ cat nacl.config
MetaNames outputs organisation strategy domain mission hqcountry countries web email
PropertyNames outputs organisation strategy domain mission hqcountry countries web email
SwishProgParameters nacl.pl
IndexDir spider.pl
IndexFile index.nacl
moseley@bumby:~$ swish-e -c nacl.config -i site4.html -v0 -T properties
swishdocpath: 6 ( 10) S: "site4.html"
swishtitle: 7 ( 24) S: "Site4 - confusion reigns"
swishdocsize: 8 ( 4) N: "724"
swishlastmodified: 9 ( 4) D: "2004-10-07 20:50:38 PDT"
outputs:19 ( 43) S: "Papers Journals Newsletters Policy Research"
organisation:20 ( 5) S: "Site4"
strategy:21 ( 18) S: "research education"
domain:22 ( 23) S: "government politics law"
mission:23 ( 28) S: "To influence decision makers"
hqcountry:24 ( 9) S: "Australia"
countries:25 ( 9) S: "Australia"
web:26 ( 23) S: "http://www.site4.org.au"
email:27 ( 16) S: "jim@site4.org.au"
Also note the difference in the time of the file between your output
and mine? Why the difference?
moseley(at)not-real.bumby:~$ HEAD http://incres.anu.edu.au/site4.html | grep Last-Mod
Last-Modified: Fri, 08 Oct 2004 03:50:38 GMT
moseley@bumby:~$ TZ=UT swish-e -c nacl.config -i site4.html -v0 -T properties | grep swishlast
swishlastmodified: 9 ( 4) D: "2004-10-08 03:50:38 "
But your example has a different time. So now I'm wondering if I'm
indexing the same file you are indexing. I imagine it is because the
file sizes match up, but it's odd the dates are different.
Anyway, you can see how it works for me. If I were you I'd try from
another machine. Your description of installation seems a bit more
complex than needed -- make install should be all you need. No need
to copy files from the src directory any place.
moseley@bumby:~$ md5sum site4.html
9595829207c6a4c5890206becaf8ad68 site4.html
moseley@bumby:~$ cat site4.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<meta name="organisation" content="Site4">
<meta name="strategy" content="research education">
<meta name="domain" content="government politics law">
<meta name="outputs" content="Papers Journals Newsletters Policy Research">
<meta name="countries" content="Australia">
<meta name="hqcountry" content="Australia">
<meta name="mission" content="To influence decision makers">
<meta name="web" content="http://www.site4.org.au">
<meta name="email" content="jim@site4.org.au">
<TITLE>Site4 - confusion reigns</TITLE>
</HEAD>
<BODY>
<H1>Site4 - NACL Matrix test site</H1>
<hr>
<a href="http://incres.anu.edu.au/nacl/index.html">link</a>
<hr>
</BODY>
</HTML>
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Sun Oct 10 17:14:20 2004