Hi,
I suppose the http://validator.w3.org/. validates for XHTML, my HTML files
can easily open in browser and the tags are intact. The
http://validator.w3.org/.gives an error even for many sites which are
perfectly running.
My main problems are :
1. Output not readable, do we have some program to read that output?
2. Cannot index website, says : No such file or directory .
If possible send me a complete example.
PROBLEM DESCRIPTION is as follows :
I'm using swish-e version 2.
I want to make index for a website or later more than one websites but
that's not working when I put a URL in the Indexdir option. Also I can't
find any spider exe to spider the web on windows environment. Following is
the output:
------------------------------------------------------------------------------------------------------------
C:\SWISH-E\bin>swish-e.exe -c Config_http.txt
Indexing Data Source: "File-System"
Indexing "http://www.download.com"
Warning: Invalid path 'http://www.download.com': No such file or directory
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.
------------------------------------------------------------------------------------------------------------
Another config.txt is attached that indexes from html files but it gives
some tags errors like " ; " is expected at some places in html file, whereas
the html file is perfect and opens in browser. Following is the output:
-------------------------------------------------------------------------------------------------------
C:\SWISH-E\bin>swish-e.exe -c Config.txt
Indexing Data Source: "File-System"
Indexing "D:HTML_files"
Checking dir "D:HTML_files"...
Guide.html - Using DEFAULT (HTML2) parser - D:HTML_files/Guide.html:1374:
error: htmlParseEntityRef: expecting ';'
write << SettingsVersion << volume <<
balance;
^
D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'
write << SettingsVersion << volume <<
balance;
^
D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'
write << SettingsVersion << volume <<
balance;
^
D:HTML_files/Guide.html:1376: error: htmlParseEntityRef: no name
catch( ConfigException & )
^
D:HTML_files/Guide.html:1399: error: htmlParseEntityRef: no name
catch( ConfigException & )
^
(23896 words)
Slide105.html - Using DEFAULT (HTML2) parser - (210 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,011 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
2,011 unique words indexed.
4 properties sorted.
2 files indexed. 214,655 total bytes. 24,106 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
-------------------------------------------------------------------------------------------------------
*In this case the output generated is not readable; it has something like
Japanese characters. This is the main problem.*
*Expected solution:*
Please send me a sample config.txt (configuration file) and Exe's/dlls to
index a website. I'm using WinXp SP2 environment.
On Fri, May 16, 2008 at 12:07 PM, William Conlon <bill@tothept.com> wrote:
> please post your questions within the email, so the question and
> answers can be included in the searchable archive.
>
> Your question 2: this shows HTML parsing errors from invalid HTML
> syntax. Validate your pages (if you are bothered by these parsing
> errors), for example using http://validator.w3.org/.
>
>
> On May 15, 2008, at 11:30 PM, Saubhagya Srivastava wrote:
>
> > Hi,
> >
> > The problem description and config files are attached.
> > Please give a solution to my problems.
> >
> > Even the output is not readable, it gives some japanese characters,
> > whereas HTML files are English, ANSI.
> >
> > Regards,
> > Saubhagya
> > <Config.txt><Config_http.txt><Problem
> > Description.doc>_______________________________________________
> > Users mailing list
> > Users@lists.swish-e.org
> > http://lists.swish-e.org/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon May 19 06:22:45 2008