Yesterday, after I finished a few finals, I decided that I wanted to start working on a basic web scraping program. It just annoyed me that I spend a lot of my development time offline, but yet there are so many good references online (www.cplusplus.com, java apis, msdn library, etc.). So, I started work on my WebRetriever program, which will crawl any provided web page and retrieve all content from it that I can find through hyperlinks. The first step was to create a URL class, which will do the basic URL parsing and also enable me to be able to view certain pieces of a URL with ease.
The C# URL Code can be found here: URL.cs
No comments:
Post a Comment