The web-scraper for C# allows .Net developers to create logical that extract content from web applications and turn it into JSON, spreadsheets, C# objects or even SQL using simple C# and Linq code.
This leaves the developer with clean, efficient web-scraping applications which are easy to understand and debug.
The C# Web Scraping Library is extremely polite, ensuring that no domain or IP address has too many concurrent requests. It intelligently throttles both client and server side looking for excessive CPU usage and slowing to an appropriate pace. In addition, it can obey robots.txt directives including bot specific crawl rates and limitation. The exact urls and content types to be strapped can be set using logical workflows and regex/wildcard rules.
Screen-scraping is made easier with identity control, automatically managing threads, rate limits, urls, duplicates, retries, proxies, headers and cookies into a an army of virtual browser which can mimic human behavior and even client buttons, fill in forms or log in behind security walls. This is useful for migrating legacy systems, populating enterprise search facilities and for statistical competitive analysis
Full documentation, support and downloadable DLLS for the C# Web Scraper are available from http://ironsoftware.com/csharp/webscraper/ , in addition to links to a .Net 4.5+ Nuget package with full Azure and Mono compatibility.
Windows C# Development Environment such as Visual Studio Community 2015 or newer.
'IronWebScraper' is a new Web Scraping Library for the C# / .Net programming platform. Release features include: - Read structured content from websites for population of databases, search indexes & applications. - Submission to Nuget.org : https:/