On Tue, Mar 20, 2007 at 07:19:42PM -0500, Matthew Stanislawski wrote:
> Here's the output of my script, for this particular document:
> http://mattstan.net/spew.out
Do you see problems when you run it like this:
cat spew.out | swish-e -i stdin -S prog -T indexed_words -c c
See my output below.
> Getting a lot of errors like this:
>
> https://opcenter-test.cso.uiuc.edu/doc/DOORCODES:33: error: Entity
> 'nbsp' not defined
Well, it's an XML doc and you have to define the entities, I suspect.
You probably should be using the XML2 parser, too.
DefaultContents XML2
> Given the first error above, that's what it looks like. I didn't
> install libxml2 myself, however, so I don't exactly know my way around
> it. How can I troubleshoot libxml2 directly?
Can you check what version you have installed?
I'm on Debian:
moseley@bumby:~/WS2/lib/WS2/C$ apt-cache policy aspell
aspell:
Installed: 0.60.5-1
Candidate: 0.60.5-1
Version table:
*** 0.60.5-1 0
500 http://128.101.240.212 unstable/main Packages
100 /var/lib/dpkg/status
You can compare with this:
moseley@bumby:~$ cat spew.out | swish-e -i stdin -S prog -T indexed_words -c c
Indexing Data Source: "External-Program"
Indexing "stdin"
Adding:[1:swishdocpath(14)] 'https' Pos:1 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'opcent' Pos:2 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'test' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'cso' Pos:4 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'uiuc' Pos:5 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'edu' Pos:6 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'doc' Pos:7 Stuct:0x1 ( FILE )
Adding:[1:swishdocpath(14)] 'doorcod' Pos:8 Stuct:0x1 ( FILE )
Adding:[1:source(20)] 'doc' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:source(20)] '5' Pos:4 Stuct:0x1 ( FILE )
Adding:[1:owner(21)] 'monnin' Pos:9 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] '1170455836' Pos:13 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'opcent' Pos:16 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'door' Pos:17 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'system' Pos:18 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'door' Pos:19 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'inform' Pos:20 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'door' Pos:22 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'name' Pos:23 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'connect' Pos:24 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'room' Pos:25 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'access' Pos:26 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'via' Pos:27 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'sensor' Pos:28 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'descript' Pos:29 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'm1' Pos:30 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] '1452' Pos:31 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'and' Pos:32 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] '1440' Pos:33 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'prox' Pos:34 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'card' Pos:35 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'yes' Pos:36 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'glass' Pos:37 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'doubl' Pos:38 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'door' Pos:39 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'between' Pos:40 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'opcent' Pos:41 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'and' Pos:42 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'datacent' Pos:43 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'main' Pos:44 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'entranc' Pos:45 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'into' Pos:46 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'datacent' Pos:47 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'm2' Pos:48 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] '1440' Pos:49 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'and' Pos:50 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] '1420' Pos:51 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'prox' Pos:52 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'card' Pos:53 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'yes' Pos:54 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'metal' Pos:55 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'singl' Pos:56 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'door' Pos:57 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'between' Pos:58 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'datacent' Pos:59 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'and' Pos:60 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'help' Pos:61 Stuct:0x1 ( FILE )
Adding:[1:excerpt(23)] 'des' Pos:62 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'request' Pos:69 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'review' Pos:70 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'main' Pos:75 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'tool' Pos:78 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'doormon' Pos:81 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'opcent' Pos:85 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'door' Pos:86 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'system' Pos:87 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'door' Pos:88 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'inform' Pos:89 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'door' Pos:94 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'door' Pos:101 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'name' Pos:102 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'connect' Pos:107 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'room' Pos:108 Stuct:0x1 ( FILE )
https://opcenter-test.cso.uiuc.edu/doc/DOORCODES:33: error: Entity 'nbsp' not defined
body><tr><td>Door Name<br/></td><td>Connects Rooms<br/></td><td>Access Via
^
Adding:[1:details(15)] 'access' Pos:112 Stuct:0x1 ( FILE )
Adding:[1:details(15)] 'via' Pos:113 Stuct:0x1 ( FILE )
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 20 20:54:41 2007