obscraper: scrape posts from the overcomingbias blog¶
obscraper lets you scrape blog posts and associated metadata from the
overcomingbias blog.
It’s easy to get a single post:
>>> import obscraper
>>> intro_url = 'https://www.overcomingbias.com/2006/11/introduction.html'
>>> intro = obscraper.get_post_by_url(intro_url)
>>> intro.title
'How To Join'
>>> intro.plaintext
'How can we better believe what is true? ...'
>>> intro.internal_links
[
'http://www.overcomingbias.com/2007/02/moderate_modera.html': 1,
'http://www.overcomingbias.com/2006/12/contributors_be.html': 1
]
>>> intro.comments
20
Or a full list of post names and edit dates:
>>> import obscraper
>>> edit_dates = obscraper.get_edit_dates()
...
>>> len(edit_dates)
4352
>>> {name: str(edit_dates[name]) for name in list(edit_dates)[:5]}
{'2022/01/much-talk-is-sales-patter':
'2022-01-14 20:46:35+00:00',
'2022/01/old-man-rant':
'2022-01-13 15:21:33+00:00',
'2022/01/my-11-bets-at-10-1-odds-on-10m-covid-deaths-by-2022':
'2022-01-12 19:15:10+00:00',
'2022/01/to-innovate-unify-or-fragment':
'2022-01-11 01:03:44+00:00',
'2022/01/on-what-is-advice-useful':
'2022-01-10 18:46:26+00:00'}
For more on how to use the package, see Getting Started.
Features¶
Get posts by their URLs or edit dates, or get all posts hosted on the overcomingbias site
Provides detailed post metadata including post URLs, titles, authors, tags, publish dates, and last edit dates
Provides summary of post content including full post text as HTML or plaintext, and a list of hyperlinks to other overcomingbias posts
Asynchronous execution and caching for fast downloads
Use via
import obscraperor the simple command line interfaceComprehensively tested
Supports python 3.8+
Documentation¶
See Getting Started for an introduction to the package.
A full reference to the obscraper public API can be found at Public API Reference.
For the full details, check out the well-documented code.
Bugs/Requests¶
Please use the GitHub issue tracker to submit bugs or request features.
Changelog¶
See the Changelog for a list of fixes and enhancements of each version.
License¶
Copyright (c) 2022 Christopher McDonald
Distributed under the terms of the MIT license.
All overcomingbias posts are copyright the original authors.