HTTP clients designed for easy tool building
Fast, asynchronous URL retriever
Retriever-based Spider
Starts with an initial list of URLs and crawls them asynchronously, providing results to header_processors, html_processors and tree_processors which implement additional functionality.
check_site demonstrates the HTML processor feature to report HTML validation errors from pytidylib.
Callback used to process a URL after it’s been retrieved
Start the spider with the provided list of URLs
Block until the spider has crawled the entire site