Web scrapping tutorial

Web scrapping tutorial

Web scrapping tutorial: In our previous tutorial, we have talked a lot about what web scrapping is? and simple web scrapping. We have done web scrapping using request module of python. There was also an example to get all <h2> of same website. If you want to learn basics of web scrapping, you can visit the link below:

Web scrapping python

But, the code in above link is limited for some websites because we are using simple request method to get the data. As we already know that web scrapping is very powerful tool. The request method don’t allow us to fetch secured website’s data. In this tutorial we will discuss about fetching data of secured websites, which do not allows to fetch data directly.

Note

Web scrapping is an illegal process in most of cases, i.e. some websites do not allow to access or use their information. So i highly recommend you to do not use web scrapping for those websites, you can use APIs instead. This article is for study purpose only.

How to use web scrapping in various websites

As i discussed earlier most websites do not allow to scrap their data directly. So we need an platform, from where we can scrap secure website’s data or we can say protected data.

How-to-use-webscraping-in-various-websites
How-to-use-webscraping-in-various-websites

Here we need some automated programs, by which we can easily extract website’s data. To implement web scrapping for multiple websites we need following:

Software Requirements:

  1. Jupyter Notebook (optional)
  2. selenium(its webdriver)
  3. bs4(its BeautifulSoup)

What is selenium?

Selenium is an portable software-testing framework for web applications. It is open-source and free-of cost software i.e anyone can download it easily. Basically selenium is used for automated programs, i.e for open browsers, testing etc. Selenium has various components, which we can use easily and without any cost. These are listed below:

  1. Selenium IDE
  2. Selenium RC
  3. Web driver
  4. Selenium Grid
seleium components
seleium components

Install selenium

To install selenium, you need to execute following command in command prompt:

pip install selenium

Example

Web scrapping tutorial
Web scrapping tutorial

Introduction to selenium and Web driver

In above description, we have discussed various components of Selenium. We will use web driver of selenium for web scrapping. Web driver is a tool for automating web  application testing, exploring data. Moreover it checks whether all is done in correct and expected way or not(testing).

Introduction-to-selenium-and-webdriver
Introduction-to-selenium-and-webdriver

Here are some important points of selenium Web driver:

  • We will use Web driver in web scrapping rather than using simple request method.
  • It allows us to fetch website’s data easily.
  • There are different web drivers for different browsers.
  • Basically web driver accepts commands and send it to particular browser.
  • Moreover, it also retrieve results from commands

Install Web driver

First of all we need to install web driver. Here i am using Chrome browser’s web driver. If you are using the same you can visit the link below. If you are using another one browser(like FireFox, IE etc.), you have to install web driver of that particular browser. Here are the steps to install web driver for chrome:

  1. Click To Install Chrome Web driver
  2. Open above link and click on ChromeDriver 2.42
  3. Download zip file according to your OS i.e. whether you need web driver for Mac, Linux or Windows.
  4. Extract file to any location and copy path of that location to use web driver.

Example

So, we have talked a lot about web drivers and selenium.Let’s take an example of web scrapping using selenium. Here i am fetching each player’s name and description link from the website https://www.nba.com .You can use any website instead.

Here is the code for the same:

from selenium import webdriver
from bs4 import BeautifulSoup

driver =webdriver.Chrome(executable_path = r'C:\Users\akd62\OneDrive\Desktop\chromedriver.exe')

url ='https://www.nba.com/players'


driver.get(url)

soup =BeautifulSoup(driver.page_source,'lxml')

div =soup.find('div',class_='static')
d=div.find_all('a')
for i in d:
    print(i.text)
    print("Player Name: ",i['title'])
    print("More Details:","https://www.nba.com"+i['href'])
    print('')
    
driver.quit()

Output: 1

Web scrapping tutorial
Web scrapping tutorial

Output: 2

Web scrapping tutorial
Web scrapping tutorial

Explaination:

  1. First of all it opens a web browser (Set driver path here).
  2. Then it put, given link to the browser.
  3. It uses get() method of browser.
  4. Moreover, it uses BeautifulSoup module to extract data (same as in previous article).
  5. Here we extracts class static‘s anchor <a> tag.
  6. Furthermore, there is an for loop to find each and every tag with same properties.
  7. At last, we need to close driver using quit() method

Download source code

Download project on Github
Download project on Github 

So, it is all about web scrapping tutorial(using selenium and web driver). I hope you guys enjoyed the post. Thanks

Credit goes to:

Rajat Sharma

GNDU, Gurdaspur

19 thoughts on “Web scrapping tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Hi there, simply was alert to your weblog via Google, and found that it’s truly informative. I’m going to be careful for brussels. I’ll appreciate in the event you proceed this in future. A lot of other people will be benefited from your writing. Cheers!

  2. I simply want to say I’m new to blogging and site-building and absolutely loved you’re blog site. Most likely I’m going to bookmark your website . You definitely have incredible writings. Regards for sharing your website page.

  3. I’m not sure why but this weblog is loading incredibly slow for me. Is anyone else having this problem or is it a problem on my end? I’ll check back later on and see if the problem still exists.

  4. Does your website have a contact page? I’m having a tough time locating it but, I’d like to shoot you an email. I’ve got some creative ideas for your blog you might be interested in hearing. Either way, great website and I look forward to seeing it expand over time.

  5. It’s a shame you don’t have a donate button! I’d without a doubt donate to this outstanding blog! I suppose for now i’ll settle for book-marking and adding your RSS feed to my Google account. I look forward to brand new updates and will talk about this blog with my Facebook group. Talk soon!

  6. Its like you read my mind! You seem to know a lot about this, like you wrote the book in it or something. I think that you can do with some pics to drive the message home a little bit, but other than that, this is magnificent blog. A fantastic read. I’ll definitely be back.

  7. Hi there just wanted to give you a quick heads up. The text in your post seem to be running off the screen in Safari. I’m not sure if this is a format issue or something to do with browser compatibility but I thought I’d post to let you know. The design and style look great though! Hope you get the issue fixed soon. Kudos

  8. I found your blog site on google and check a number of of your early posts. Continue to maintain up the very good operate. I just further up your RSS feed to my MSN News Reader. Looking for forward to studying more from you in a while!…

  9. I’m not sure where you’re getting your info, but good topic.

    I needs to spend some time learning much more or
    understanding more. Thanks for magnificent info I was looking for
    this info for my mission.

Please wait...

Subscribe to our newsletter

Want to be notified when our article is published? Enter your email address and name below to be the first to know.