Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: Adam Douglas <ADouglas(at)not-real.venmarces.com>
Date: Wed Feb 20 2008 - 20:18:07 GMT
> You need to validate your (x)html.  In particular the '&' in 
> the URI needs to be '&amp;'

I do this regularly but apparently this was not detected by Firebug or
Web Developer Toolbar oddly. I have resolved all of these issues now in
regards to (x)html validation. There is no errors returned now except
for the URLs being processed. Which I noticed and still nothing is being
indexed for the authenticated content. However it appears that the
indexing stops before actually indexing the entire web site. As I
noticed that not all announcements where indexed. I wonder what is
causing this. I verified and there is no authentication occurring with
the account I created and instructed Swishe to use in the config file.
There has to be a few things off here.
 
> I thought you were starting spidering at the login page, /login/.   
> What is the debug for the first request?

I am starting the spider at /login/ as stated in the
SwishSpiderConfig.pl file. Would the swishe.venmarces.private.conf
setting of "SwishProgParameters default http://blowfish.venmarces.com/"
be the issue? I tried to comment out "SwishProgParameters default
http://blowfish.venmarces.com/" in the swishe.venmarces.private.conf
file. I ran "swish-e -S prog -c swishe.venmarces.private.conf" and
received the following message.

Indexing Data Source: "External-Program"
Indexing "/usr/local/lib/swish-e/spider.pl"
External Program found: /usr/local/lib/swish-e/spider.pl
Unquoted string "query" may clash with future reserved word at
SwishSpiderConfig.pl line 115.
Failed to read /usr/local/lib/swish-e/spider.pl configuration parameters
'SwishSpiderConfig.pl'  syntax error at SwishSpiderConfig.pl line 115,
near "-  >"

Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

Line 115 is "$_[0]-  >query =~ /logout/);". I edited this line and
removed the two spaces after $_[0]- so it looks like this $_[0]->query
=~ /logout/);". Then I ran "swish-e -S prog -c
swishe.venmarces.private.conf" again and now this time around I received
some better news but still not successful. Here is what I received. What
does this mean?

Indexing Data Source: "External-Program"
Indexing "/usr/local/lib/swish-e/spider.pl"
External Program found: /usr/local/lib/swish-e/spider.pl
/usr/local/lib/swish-e/spider.pl: Reading parameters from
'SwishSpiderConfig.pl'

 -- Starting to spider:
https://blowfish.venmarces.com/login/?szID=username&szPWD=password --
?Testing 'test_url' user supplied function #1
'https://blowfish.venmarces.com/login/?szID=username&szPWD=password'
+Passed all 1 tests for 'test_url' user supplied function

vvvvvvvvvvvvvvvv HEADERS for
https://blowfish.venmarces.com/login/?szID=username&szPWD=password
vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET https://blowfish.venmarces.com/login/?szID=username&szPWD=password
Accept-Encoding: gzip, x-gzip, deflate
From: webmaster@venmarces.com
User-Agent: swish-e spider http://swish-e.org/


---- Response ---
Status: 500 Can't locate object method "new" via package
"LWP::Protocol::https::Socket"
Content-Type: text/plain
Client-Date: Wed, 20 Feb 2008 20:11:47 GMT
Client-Warning: Internal response

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^


Summary for:
https://blowfish.venmarces.com/login/?szID=username&szPWD=password
Connection: Close: 1  (1.0/sec)
      Unique URLs: 1  (1.0/sec)

Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

 
> >> Do you need cookies enabled?
> >
> > Yes the session requires a cookie.
> 
> Then you need this in your @servers stanza:
> 
>           use_cookies => 1,

Ahh I complete missed put this one in as you told me before. Sorry about
that. I added use_cookies => 1,.

Best,
Adam

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary,privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and
(i) destroy this message if a facsimile or (ii) delete this message
immediately if this is an electronic communication. Thank you.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 20 15:18:11 2008