Grabbing data from the internet: Web scraping

Web Scraping

Facebook is explicit in their prohibition of web crawlers!
Standard format of a blanket web scraping ban (from facebook.com/robots.txt)
All bots are allowed to scrape all pages except those with the /Sitecore/ extension (from owgr.com/robots.txt).

Scraping the Official World Golf Rankings

Assigning a value to url variable (we can also make this more flexible with f-strings)
Now you have a Beautiful Soup object!
The rankings are organized neatly in an HTML table tag.
# Find the first table element on the page
table = soup.find('table')
Entire full loop iterating through HTML table
player['name'] = rows.find('td',{'class':'name'}).text
Now we have a CSV with the OWGRs!
The time I won the US Open.

--

--

--

Data scientist; CFA charterholder and financial valuation specialist; avid golfer and racquet sport aficionado; homebrewing hobbyist; TWTR: @wmkarney

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Deduplication, near duplicate: a short guide

Bootstrapping at scale in Snowflake

“You want ME, the analyst to do that?”, and other silly things analyst should never say.

The Moving Average-Stochastic Oscillator Strategy: An Algorithmic Approach

From Football Newbies to NFL (data) Champions | A Winner’s Interview with The Zoo

Using Valhalla Map-Matching to get route and travelled distance from raw GPS points

#1Week1BookChallenge — Four “The Hidden DNA of Amazon, Apple, Facebook and Google

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Will Karnasiewicz

Will Karnasiewicz

Data scientist; CFA charterholder and financial valuation specialist; avid golfer and racquet sport aficionado; homebrewing hobbyist; TWTR: @wmkarney

More from Medium

Websites to improve your Python skills!

Master Python in Data Science

How Web Scraping API Is Used To Extract Real Estate Website Data?

Scrape Google Forever