14 January 2013

Ruby Script to Scrape an API

I wrote a basic Ruby script last weekend to scrape a public API for data.

The public API is World of Warcraft's Quest Data API.  It takes a quest ID, and returns information on the quest (like quest title, start location, level, etc.)  The problem is, I don't know Quests by ID.  I want to query by Level, or start location.  So I thought about pulling down their data, and sorting it locally.

So I wrote this little script:
 
require 'open-uri'
require 'json'

  def Rubyresults(y)
      $a=0
      begin
          @output = JSON.parse(open("http://us.battle.net/api/wow/quest/"+"#{y}").read)
          puts @output['title']+", "+ @output['category']+","+@output['level'].to_s
          y=y+$a
          $a +=1
      end  until y > 9269

      rescue
        #puts "invalid quest id"+","+@output['id'].to_s
        y=y+1
        retry
  end

  puts Rubyresults(1)


Script Breakdown
I start off by importing/requiring two libraries (open-uri and json), these allow me to call a URL/URI and to be able to parse the JSON results returned.

I built the script to take a param (in my case "y"), which is the quest ID I want to start at.  So I do a:
def Rubyresults(y)

Next, I assign a variable $a to 0, this will be my counter.

Now I start the loop.  I use a Began loop and go until a certain value is reached.
I start off in the loop with a instance variable of @output and assign it to hit the Wold of Warcraft quest API.  I leave the ID a variable: "#{y}"  This will be the value being passed in and incremented.

The next line outputs the results in JSON.  since I previously used JSON.parse on that URL, I can now use @output['title'] (or any category returned in the JSON) to return the value.  I set this output to be comma separated so that I could sort this as a CSV doc later.  so I have @output['title'] + ","+ @output['category'] and so forth.  So the output will be title,category,level per line.

Now I increment the value of Y my adding $a and $a is incremented by 1 each time. So if I start this at quest id 1, it will turn $a to 2, and 3 etc. and add it to y each time.

The end of the loop is set to stop at 9269. 

Now for the Rescue.  I added the rescue, because the World of Warcraft API doesn't have all ID's.  For example it has a quest for id = 1 and 2, but not 3.  If you try and query for quest id 3, it will return a 404.   If that happens, the whole script errors and stops.  So Rescue lets me continue on.  I initially added a puts to output the invalid id, but later commented it out. 

In the rescue routine, I add 1 to y and retry... this continues the script.

At the very end is the puts Rubyresults(1), this sets the quest id value to start with, which is being passed in as value y.

The output of this script looks like (I need to work on not getting duplicate entries):

Kanrethad's Quest, Designer Island,80
Sharptalon's Claw, Ashenvale,23
Give Gerard a Drink, Elwynn Forest,1
Ursangous' Paw, Ashenvale,24
Ursangous' Paw, Ashenvale,24
Shadumbra's Head, Ashenvale,24
A Fishy Peril, Elwynn Forest,10
A Fishy Peril, Elwynn Forest,10
Discover Rolf's Fate, Elwynn Forest,10
Discover Rolf's Fate, Elwynn Forest,10
Bounty on Murlocs, Elwynn Forest,10
Protect the Frontier, Elwynn Forest,10
Protect the Frontier, Elwynn Forest,10
Cloth and Leather Armor, Elwynn Forest,10
Cloth and Leather Armor, Elwynn Forest,10
The Fargodeep Mine, Elwynn Forest,7
The Fargodeep Mine, Elwynn Forest,7
Report to Thomas, Elwynn Forest,10
Report to Thomas, Elwynn Forest,10
The Jasperlode Mine, Elwynn Forest,10
The Jasperlode Mine, Elwynn Forest,10
Fine Linen Goods, Elwynn Forest,10
Fine Linen Goods, Elwynn Forest,10

No comments:

Post a Comment