Finally, I use feedparser to go through the list of possible RSS feeds and validate them to ensure that the links point to valid feeds.įeel free to fork this gist on GitHub or download the raw file. I start by looking for tags pointing to RSS feeds, then parse the page looking for any a hrefs pointing to links with “xml”, “rss”, or “feed” in the URL. I’ve copied my solution below, which you should be able to interpret fairly easily. This script does have some non-standard dependencies, both of which you are probably already using if you’re doing anything related to web scraping or feed reading: feedparser and beautifulsoup4. ![]()
0 Comments
Leave a Reply. |