Web scrapping python

What is web scrapping python?

Web scrapping python: Web scrapping is the process of fetching and accessing information from other websites.Basically it is done by the software program. This software program can be designed using different languages like python, java, JavaScript etc. But as we know that python is an very easy, powerful and fast processing language, i recommend to do so using python. Python has various built in modules and libraries which makes web scrapping very easy task.

Note

Web scrapping is an illegal process in most of cases, i.e. some websites do not allow to access or use their information. So i recommend you to do not use web scrapping for those websites, you can use APIs instead. This article is for study purpose only.

Why web scrapping?

The main idea behind using web scrapping is to  fetch and access information from other sources. No doubt a legal and genuine way for accessing information is using APIs, which is highly recommended. But if you don’t have API and you want to access information, in that case you can use web scrapping. There are following main features of web scrapping:

  1. Fetch/ download web page (usually page automatically downloaded by the browser when we search for it).
  2. Access information from this web-page including links, images etc.
  3. Format or arranging fetched data according to requirement.
  4. Use this data in future.
  5. Place data to a local file. database or excel.
what is we scrappping python
what is we scrappping python

Web scrapping process

We have already discussed the methodology behind web scrapping. In simple words, it is the process of fetching and extracting information from other websites. As you can see in the image below web scrapping consists of following three steps:

  1. Fetch information from website.
  2. Extract Information using web scrapping software program.You can extract anything from web page including HTML, links, images etc.
  3. Organize fetched information within specific structure like database, file system etc.
web scrapping
web scrapping

Web scrapping python example

Now let’s take an example of using web scrapping in python. I am using following modules/software for the code below:

Software requirements:

  1. Jupyter Notebook (optional)
  2. bs4 module
  3. requests module

You can install above packages using:

pip install bs4
    and
pip install requests

Webscrapping python startup:

If you want to scrap a web page, first of all you need to analyze web page completely. To do so, you need basic understanding of front end. Because here we need to analyze structure and CSS of that web page. You can fetch data of any tag, class or id using web scrapping. You just need to know structure, name of tag, class or id in which data is placed.

I am going to fetch data from our website(not recommend to you).If you re using Google Chrome web browser you can do following steps to detect tag, class or id of any content:

  1. Select Element/ Text from website wich you want to fetch
  2. Right click on screen > Inspect.
  3. And navigate between different elements to see their parent/child elements like below:
Web scrapping python
Web scrapping python

Example – 1. Fetch an HTML tag

import bs4
import requests

website_link = requests.get("http://onlinetutorial.co.in")
ab = bs4.BeautifulSoup(website_link.text,'lxml')
heading2= ab.select("h2")
print(heading2)

Output

Web scrapping python
Web scrapping python

Explanation

  • As you can see in example above first of all, you need to import modules.(Make sure to install first)
  • Fetch web page using get method of requests module by passing URL of that page,
  • Use BeautifulSoup library of bs4 to fetch data in specific structure.( Here is lxml). If you want to learn more about BeautifulSoup you can visit the link https://en.wikipedia.org/wiki/Web_scraping
  • Select element, which you want to fetch using select method.
  • If you have done above steps successfully, you will see the data enclosed in python list. Later on you can loop through that data to use in more convenience way.

Example -2. Looping through elements:

You can loop through the result because it is in the form of list. You can fetch content using indexing. Suppose you want to access first <h2> of page you can do (In above code):

print(heading2[0]) #Will return the first <h2>

But if you want to loop through all elements of the result list, then you can do as follows(In above code):

#To return whole elements i.e. tag with text
for i in heading2:
    print(i)

#To return Text only
for i in heading2: 
    print(i.text)

Output

Web scrapping python
Web scrapping python

As you can see in the result above, it returns all the heading(<h2>) of the page.

Example -3. Fetch elements using class or id:

You can also select elements using class or id of it. Syntax is same except:

  1. You can select id using Hash symbol(#) followed by name of id. Example #IdName
  2. Moreover, you can select class using . symbol(.) followed by name of class. Example .className

Here is the code for doing same:

import bs4
import requests

website_link = requests.get("http://onlinetutorial.co.in")
ab = bs4.BeautifulSoup(website_link.text,'lxml')
heading2= ab.select(".widget_categories")
for i in heading2:
    print(i.text)

Output

Web scrapping python
Web scrapping python

Explanation

  • Here i am using class named widget_categories, you can use your own class name instead.
  • As you can see i have use dot symbol to select a class, instead you can use hash symbol to select id.
  • If you want to access the data of single block, then you can use id.
  • But if you want to access data, more than one blocks , i recommend you to use class name.
  • Furthermore, you can loop through each block, to get each block’s contents.
  • It is because class name is common and id is unique.

Download source code

Download project on Github
Download project on Github

So it is all about today’s article, i hope you guys enjoyed the post. We will discuss a lot about web scrapping in our further tutorials. Thanks!!!

Credit goes to:

Rajat Sharma

GND, Gurdaspur

28 thoughts on “Web scrapping python

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. That is the precise weblog for anybody who desires to seek out out about this topic. You understand so much its virtually hard to argue with you (not that I truly would want…HaHa). You positively put a brand new spin on a subject thats been written about for years. Great stuff, just nice!

  2. I just want to tell you that I am very new to weblog and actually enjoyed this web blog. Almost certainly I’m want to bookmark your blog . You actually come with excellent stories. Thanks for sharing with us your webpage.

  3. An impressive share, I just now given this onto a colleague who has been carrying out a small analysis with this. And the man the truth is bought me breakfast due to the fact I discovered it for him.. smile. So i want to reword that: Thnx to the treat! But yeah Thnkx for spending the time to talk about this, I feel strongly concerning this and adore reading much more about this topic. If possible, as you become expertise, would you mind updating your blog with increased details? It can be extremely great for me. Massive thumb up because of this writing!

  4. It is actually a nice and useful piece of information. I
    am glad that you simply shared this helpful info with us.
    Please keep us informed like this. Thank you for sharing.

  5. I was curious if you ever considered changing the layout
    of your site? Its very well written; I love what youve got to say.
    But maybe you could a little more in the way of content so
    people could connect with it better. Youve got an awful lot
    of text for only having 1 or 2 images. Maybe you could space it out better?

  6. My spouse and I absolutely love your blog
    and find the majority of your post’s to be just what I’m
    looking for. Would you offer guest writers to write content for you
    personally? I wouldn’t mind writing a post or
    elaborating on a lot of the subjects you write about here.

    Again, awesome weblog!

  7. A fascinating discussion is definitely worth comment. I do think that you need
    to write more on this issue, it may not be a taboo subject but generally folks don’t discuss these
    subjects. To the next! All the best!!

Please wait...

Subscribe to our newsletter

Want to be notified when our article is published? Enter your email address and name below to be the first to know.