Friday, 13 September 2013

How do I write a scraper that logs into secure sites using WebGains as an example?

How do I write a scraper that logs into secure sites using WebGains as an
example?

I'm writing a scraper for WebGains to pick up commission details etc (they
don't provide an API).
I'm using PHP and cURL. I'm stuck right at the beginning for this scraper
- Once I'm logged into the website I'll be fine from there.
I used Fiddler to check any parameters being passed but since it uses SSL
this doesn't help.
I then used the Chrome Extension Virtual Event to view the event-driven
code under the login button which I discovered is all in the following
file:
http://us.webgains.com/public/wp-content/themes/clean-home/js/login.js
From that, I've gathered that I need to call the page
'https://us.webgains.com/loginform.html?action=checkauth&callback=?' to do
the login.
I'm trying to call that with my username and password (using curl) and
then re-use the curl session for subsequent page requests to pick up
commission details I need to automate.
This page looks like a .html page, but presumably in reality it's got some
server side code going on. If I try to load it directly in the browser
it's empty.
When I call the page using curl (PHP code below), I get no response too.
Am I calling the right page?
Is there a better tool to figure out what page to call/parameters to pass
in order to log into a secure site?
How do I log into WebGains using PHP/cURL?
Here's the PHP code I have:
$targeturl =
"https://us.webgains.com/loginform.html?action=checkauth&callback=?";
$ch = curl_init($targeturl);
$cookie = 'cookies.txt';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,
"username=myusername&password=mypassword");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
$response = curl_exec($ch);
echo "<textarea cols=60 rows=10>" . print_r(curl_getinfo($ch), true) .
"</textarea>";
echo "<textarea cols=60 rows=10>" . curl_getinfo($ch,
CURLINFO_EFFECTIVE_URL) . "</textarea>";
echo "<textarea cols=60 rows=10>" . $response . "</textarea>";

No comments:

Post a Comment