Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Output not readable

From: Saubhagya Srivastava <saubhagya777(at)not-real.gmail.com>
Date: Mon May 19 2008 - 10:53:13 GMT
Hi,

> I suppose the http://validator.w3.org/. validates for XHTML, my HTML files
> can easily open in browser and the tags are intact. The
> http://validator.w3.org/.gives an error even for many sites which are
> perfectly running.
>
> My main problems are :
>
> 1. Output not readable, do we have some program to read that output?
>
> 2. Cannot index website, says : No such file or directory .
>
> If possible send me a complete example.
>
> PROBLEM DESCRIPTION is as follows :
>
> I'm using swish-e version 2.
>
> I want to make index for a website or later more than one websites but
> that's not working when I put a URL in the Indexdir option. Also I can't
> find any spider exe to spider the web on windows environment. Following is
> the output:
>
>
> ------------------------------------------------------------------------------------------------------------
>
> C:\SWISH-E\bin>swish-e.exe -c Config_http.txt
>
> Indexing Data Source: "File-System"
>
> Indexing "http://www.download.com"
>
> Warning: Invalid path 'http://www.download.com': No such file or directory
>
>
> Removing very common words...
>
> no words removed.
>
> Writing main index...
>
> err: No unique words indexed!
>
> .
>
>
> ------------------------------------------------------------------------------------------------------------
>
>
> Another config.txt is attached that indexes from html files but it gives
> some tags errors like " ; " is expected at some places in html file, whereas
> the html file is perfect and opens in browser. Following is the output:
>
>
>
> -------------------------------------------------------------------------------------------------------
>
> C:\SWISH-E\bin>swish-e.exe -c Config.txt
>
> Indexing Data Source: "File-System"
>
> Indexing "D:HTML_files"
>
> Checking dir "D:HTML_files"...
>
>   Guide.html - Using DEFAULT (HTML2) parser - D:HTML_files/Guide.html:1374:
> error: htmlParseEntityRef: expecting ';'
>
>                 write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
> balance;
>
>                              ^
>
> D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'
>
>                 write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
> balance;
>
>                                                      ^
>
> D:HTML_files/Guide.html:1374: error: htmlParseEntityRef: expecting ';'
>
>                 write &lt;&lt SettingsVersion &lt;&lt volume &lt;&lt
> balance;
>
>                                                                     ^
>
> D:HTML_files/Guide.html:1376: error: htmlParseEntityRef: no name
>
>         catch( ConfigException & )
>
>                                 ^
>
> D:HTML_files/Guide.html:1399: error: htmlParseEntityRef: no name
>
>         catch( ConfigException & )
>
>                                 ^
>
> (23896 words)
>
>   Slide105.html - Using DEFAULT (HTML2) parser -  (210 words)
>
> Removing very common words...
>
> no words removed.
>
> Writing main index...
>
> Sorting words ...
>
> Sorting 2,011 words alphabetically
>
> Writing header ...
>
> Writing index entries ...
>
>   Writing word text: Complete
>
>   Writing word hash: Complete
>
>   Writing word data: Complete
>
> 2,011 unique words indexed.
>
> 4 properties sorted.
>
> 2 files indexed.  214,655 total bytes.  24,106 total words.
>
> Elapsed time: 00:00:00 CPU time: 00:00:00
>
> Indexing done!
>
>
> -------------------------------------------------------------------------------------------------------
>
>
> *In this case the output generated is not readable; it has something like
> Japanese characters. This is the main problem.*
>
> *Expected solution:*
>
> Please send me a sample config.txt (configuration file) and Exe's/dlls to
> index a website. I'm using WinXp SP2 environment.
>
>

>
> On Fri, May 16, 2008 at 12:07 PM, William Conlon <bill@tothept.com> wrote:
>
>> please post your questions within the email, so the question and
>> answers can be included in the searchable archive.
>>
>> Your question 2:  this shows HTML parsing errors from invalid HTML
>> syntax.  Validate your pages (if you are bothered by these parsing
>> errors), for example using http://validator.w3.org/.
>>
>>
>> On May 15, 2008, at 11:30 PM, Saubhagya Srivastava wrote:
>>
>> > Hi,
>> >
>> > The problem description and config files are attached.
>> > Please give a solution to my problems.
>> >
>> > Even the output is not readable, it gives some japanese characters,
>> > whereas HTML files are English, ANSI.
>> >
>> > Regards,
>> > Saubhagya
>> > <Config.txt><Config_http.txt><Problem
>> > Description.doc>_______________________________________________
>> > Users mailing list
>> > Users@lists.swish-e.org
>> > http://lists.swish-e.org/listinfo/users
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>>
>
> Regards,
Saubhagya


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon May 19 06:50:44 2008