In this article we will learn about Web Scrapping and how can we implement that in C# Programming.
What is Web Scrapping?
Web scraping is the process of extracting data from any websites.
Lets create a simple console application to retrieve the list of article title from the given web site.
Step 1 : Create a New Project: Open up Visual Studio and create a new C# console application project.
Step 2: Install Libraries: The first step is to install the required libraries. In C# most commonly used libraries are HtmlAgilityPack and HttpClient.
Step 3 : Next Step is? send a request to the website that you want to scrape. This can be done using the HttpClient library.
string url = "https://lastbenchcoder.blogspot.com/";
HttpClient httpClient = new HttpClient();
HttpResponseMessage response = httpClient.GetAsync(url).Result;
response.EnsureSuccessStatusCode();
string html = response.Content.ReadAsStringAsync().Result;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
Step 5: Final step is to find the data and Extract that to desired file(csv, excel, pdf etc..). In this example we will display the data in console.
var titleNodes = doc.DocumentNode.SelectNodes("//h2[@class='entry-title']");
foreach (var node in titleNodes)
{
Console.WriteLine(node.InnerText);
}
using System;
using System.Net.Http;
using HtmlAgilityPack;
class Program
{
static void Main(string[] args)
{
string url = "https://lastbenchcoder.blogspot.com/";
HttpClient httpClient = new HttpClient();
HttpResponseMessage response = httpClient.GetAsync(url).Result;
response.EnsureSuccessStatusCode();
string html = response.Content.ReadAsStringAsync().Result;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var titleNodes = doc.DocumentNode.SelectNodes("//h2[@class='entry-title']");
foreach (var node in titleNodes)
{
Console.WriteLine(node.InnerText);
}
}
}
Finally Output will be
Take care Bye.