Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: William M Conlon <bill(at)not-real.tothept.com>
Date: Wed Feb 20 2008 - 18:39:47 GMT
On Feb 20, 2008, at 10:24 AM, Adam Douglas wrote:

>> What does the debug show when swish-e GETs the login page?
>
> Well I have the following set in SwishSpiderConfig.pl, debug =>
> DEBUG_URL | DEBUG_SKIPPED | DEBUG_HEADERS | DEBUG_ERRORS | DEBUG_INFO.
> However it sure doesn't show me anything in regards to  
> authentication at
> the console. This is what I get, it appears I have a few other  
> issues to
> resolve but not sure how.

You need to validate your (x)html.  In particular the '&' in the URI  
needs to be '&amp;'

> Just so you know blowfish is my internal dev
> web server. I just removed the reference to it in previous posts,
> doesn't really matter so I'll just leave it in I guess.
>

I thought you were starting spidering at the login page, /login/.   
What is the debug for the first request?

> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
> RobotRules <http://blowfish.venmarces.com/robots.txt>: Unexpected  
> line:
> Sitemap: http://www.venmarces.com/sitemap.xml
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> http://www.aiha.org">http://www.aiha.org</a></dd><dt>AMCA</dt><dd>Air
> Movement &
>
> ^
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> //www.ansi.org/">http://www.ansi.org/</a></dd><dt>ARI</dt><dd>Air- 
> Condit
> ioning &
>
> ^
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> RV</dt><dd>Heat Recovery Ventilator</dd><dt>HVAC</dt><dd>Heating,
> Ventilation, &
>
> ^
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> </dd><dt>LC</dt><dd>Light Commercial</dd><dt>LEED</ 
> dt><dd>Leadership in
> Energy &
>
> ^
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> <dd>Operation Method Sheet (DFT Term)</dd><dt>OSHA</ 
> dt><dd>Occupational
> Safety &
>
> ^
> http://blowfish.venmarces.com/education/hvac-dft-terms/:95: error:
> htmlParseEntityRef: no name
> stration<br /><em>Example:</em> U.S. Department of Labor -  
> Occupational
> Safety &
>
> ^
> http://blowfish.venmarces.com/news/announcements/159/:102: error:
> htmlParseEntityRef: expecting ';'
> inally posted at <a
> href="http://www.cap-e.com/spotlight/index.cfm?Page=1&NewsID
>
> ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=85:106: error:
> htmlParseEntityRef: no name
> D & B Engineering of NJ Inc. (Head Office)
> </td>
>    ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=86:106: error:
> htmlParseEntityRef: no name
> Dan Rainville & Associates (Head Office)
> </td>
>                ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=129:106: error:
> htmlParseEntityRef: no name
> Gagnon & Associates (Head Office)                               </td>
>         ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=101:106: error:
> htmlParseEntityRef: no name
> Kasmerchak Gonzalez & Associates (Head Office)
> </td>
>                      ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=1249:106:  
> error:
> htmlParseEntityRef: no name
> Process Engineering & Equipment Co (Head Office)
> </td>
>                      ^
> http://blowfish.venmarces.com/findarep/dir/?nCompanyID=199:106: error:
> htmlParseEntityRef: no name
> Rome, Eddleman & Associates (Head Office)
> </td>
>
>> Do you need cookies enabled?
>
> Yes the session requires a cookie.

Then you need this in your @servers stanza:

          use_cookies => 1,


>
>> Is the username szUsername (as used for the form POST), or is it szID
> as in the swish-e GET below?
>
> szUsername and szPassword is used via POST. I have setup separate
> variables for use via GET, szID (username) and szPWD (password). I  
> have
> manually tested authentication via the URL query string using the szID
> and szPWD and it works fine.
>
> I have checked my access logs for the web server and Swish-e is
> connecting and indexing. I then checked the error log for the web  
> server
> and found only one error. Not sure what is causing this on that  
> line as
> I've yet to figure out how to resolve this.
>
> Use of uninitialized value in concatenation (.) or string at
> /usr/local/lib/swish-e/perl/SWISH/VenmarCESTemplate.pm line 188.
>
> Best,
> Adam
>
> This message (including any attachments) is intended only for the  
> use of the individual or entity to which it is addressed and may  
> contain information that is non-public, proprietary,privileged,  
> confidential, and exempt from disclosure under applicable law or  
> may constitute as attorney work product. If you are not the  
> intended recipient, you are hereby notified that any use,  
> dissemination, distribution, or copying of this communication is  
> strictly prohibited. If you have received this communication in  
> error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication. Thank you.

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 20 13:39:52 2008