C# Programming 16 - What is Web scraping and Why we use this?

Hello All,

In this article we will learn about Web Scrapping and how can we implement that in C# Programming.

What is Web Scrapping?

Web scraping is the process of extracting data from any websites.

Lets create a simple console application to retrieve the list of article title from the given web site.

Step 1 : Create a New Project: Open up Visual Studio and create a new C# console application project.

Step 2: Install Libraries: The first step is to install the required libraries. In C# most commonly used libraries are HtmlAgilityPack and HttpClient.

Step 3 : Next Step is? send a request to the website that you want to scrape. This can be done using the HttpClient library.

  
   string url = "https://lastbenchcoder.blogspot.com/";
   HttpClient httpClient = new HttpClient();
   HttpResponseMessage response = httpClient.GetAsync(url).Result;
   response.EnsureSuccessStatusCode();
   string html = response.Content.ReadAsStringAsync().Result;

Step 4: Next Step is to Parse the HTML, Once you have sent the request, the website's HTML code will be returned. The next step is to parse the HTML code using the HtmlAgilityPack library.

    
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

Step 5: Final step is to find the data and Extract that to desired file(csv, excel, pdf etc..). In this example we will display the data in console.

      
   var titleNodes = doc.DocumentNode.SelectNodes("//h2[@class='entry-title']");

   foreach (var node in titleNodes)
   {
       Console.WriteLine(node.InnerText);
   }

Here is the complete source code

                
using System;
using System.Net.Http;
using HtmlAgilityPack;

class Program
{
    static void Main(string[] args)
    {
        string url = "https://lastbenchcoder.blogspot.com/";
        HttpClient httpClient = new HttpClient();
        HttpResponseMessage response = httpClient.GetAsync(url).Result;
        response.EnsureSuccessStatusCode();
        string html = response.Content.ReadAsStringAsync().Result;

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        var titleNodes = doc.DocumentNode.SelectNodes("//h2[@class='entry-title']");

        foreach (var node in titleNodes)
        {
            Console.WriteLine(node.InnerText);
        }
    }
}

Finally Output will be

That's it for this article, See you in new article.

Take care Bye.

Ads

C# Programming 16 - What is Web scraping and Why we use this?

Chakrapani Upadhyaya

Post a Comment

Popular Posts

C# Programming 32 - Mastering Design Patterns in C#

OKTA SSO Authentication for .Net Core MVC application

Connecting a .NET Core Web API to an AWS S3 Bucket

Labels

Most Recent

Last Bench Coder

Follow Us

Footer Copyright

#buttons=(Accept !) #days=(20)

Contact form

Ads

C# Programming 16 - What is Web scraping and Why we use this?

Chakrapani Upadhyaya

You may like these posts

Post a Comment

#buttons=(Accept !) #days=(20)

Contact form