Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Output not readable

From: Saubhagya Srivastava <saubhagya777(at)not-real.gmail.com>
Date: Mon May 19 2008 - 10:25:15 GMT
Hi,

I suppose the http://validator.w3.org/. validates for XHTML, my HTML files
can easily open in browser and the tags are intact. The
http://validator.w3.org/.gives an error even for many sites which are
perfectly running.

My main problems are :

1. Output not readable, do we have some program to read that output?

2. Cannot index website, says : No such file or directory .

If possible send me a complete example.

PROBLEM DESCRIPTION is as follows :

I'm using swish-e version 2.

I want to make index for a website or later more than one websites but
that's not working when I put a URL in the Indexdir option. Also I can't
find any spider exe to spider the web on windows environment. Following is
the output:

------------------------------------------------------------------------------------------------------------

C:\SWISH-E\bin>swish-e.exe -c Config_http.txt

Indexing Data Source: "File-System"

Indexing "http://www.download.com"

Warning: Invalid path 'http://www.download.com': No such file or directory

Removing very common words...

no words removed.

Writing main index...

err: No unique words indexed!

.

------------------------------------------------------------------------------------------------------------


Another config.txt is attached that indexes from html files but it gives
some tags errors like " ; " is expected at some places in html file, whereas
the html file is perfect and opens in browser. Following is the output:


-------------------------------------------------------------------------------------------------------

C:\SWISH-E\bin>swish-e.exe -c Config.txt

Indexing Data Source: "File-System"

Indexing "D:HTML_files"

Checking dir "D:HTML_files"...

  Guide.html - Using DEFAULT (HTML2) parser - D:HTML_files/Guide.html:1374:
error: htmlParseEntityRef: expecting ';'

                write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
balance;

                             ^

D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'

                write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
balance;

                                                     ^

D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'

                write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
balance;

                                                                    ^

D:HTML_files/Guide.html:1376: error: htmlParseEntityRef: no name

        catch( ConfigException & )

                                ^

D:HTML_files/Guide.html:1399: error: htmlParseEntityRef: no name

        catch( ConfigException & )

                                ^

(23896 words)

  Slide105.html - Using DEFAULT (HTML2) parser -  (210 words)

Removing very common words...

no words removed.

Writing main index...

Sorting words ...

Sorting 2,011 words alphabetically

Writing header ...

Writing index entries ...

  Writing word text: Complete

  Writing word hash: Complete

  Writing word data: Complete

2,011 unique words indexed.

4 properties sorted.

2 files indexed.  214,655 total bytes.  24,106 total words.

Elapsed time: 00:00:00 CPU time: 00:00:00

Indexing done!

-------------------------------------------------------------------------------------------------------


*In this case the output generated is not readable; it has something like
Japanese characters. This is the main problem.*

*Expected solution:*

Please send me a sample config.txt (configuration file) and Exe's/dlls to
index a website. I'm using WinXp SP2 environment.


On Fri, May 16, 2008 at 12:07 PM, William Conlon <bill@tothept.com> wrote:

> please post your questions within the email, so the question and
> answers can be included in the searchable archive.
>
> Your question 2:  this shows HTML parsing errors from invalid HTML
> syntax.  Validate your pages (if you are bothered by these parsing
> errors), for example using http://validator.w3.org/.
>
>
> On May 15, 2008, at 11:30 PM, Saubhagya Srivastava wrote:
>
> > Hi,
> >
> > The problem description and config files are attached.
> > Please give a solution to my problems.
> >
> > Even the output is not readable, it gives some japanese characters,
> > whereas HTML files are English, ANSI.
> >
> > Regards,
> > Saubhagya
> > <Config.txt><Config_http.txt><Problem
> > Description.doc>_______________________________________________
> > Users mailing list
> > Users@lists.swish-e.org
> > http://lists.swish-e.org/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon May 19 06:22:45 2008