obscraper: scrape posts from the overcomingbias blog
====================================================
``obscraper`` lets you scrape blog posts and associated metadata from the
`overcomingbias `_ blog.
It's easy to get a single post:
.. code-block:: python
>>> import obscraper
>>> intro_url = 'https://www.overcomingbias.com/2006/11/introduction.html'
>>> intro = obscraper.get_post_by_url(intro_url)
>>> intro.title
'How To Join'
>>> intro.plaintext
'How can we better believe what is true? ...'
>>> intro.internal_links
[
'http://www.overcomingbias.com/2007/02/moderate_modera.html': 1,
'http://www.overcomingbias.com/2006/12/contributors_be.html': 1
]
>>> intro.comments
20
Or a full list of post names and edit dates::
>>> import obscraper
>>> edit_dates = obscraper.get_edit_dates()
...
>>> len(edit_dates)
4352
>>> {name: str(edit_dates[name]) for name in list(edit_dates)[:5]}
{'2022/01/much-talk-is-sales-patter':
'2022-01-14 20:46:35+00:00',
'2022/01/old-man-rant':
'2022-01-13 15:21:33+00:00',
'2022/01/my-11-bets-at-10-1-odds-on-10m-covid-deaths-by-2022':
'2022-01-12 19:15:10+00:00',
'2022/01/to-innovate-unify-or-fragment':
'2022-01-11 01:03:44+00:00',
'2022/01/on-what-is-advice-useful':
'2022-01-10 18:46:26+00:00'}
For more on how to use the package, see :doc:`Getting Started `.
Features
********
- Get posts by their URLs or edit dates, or get all posts hosted on the
overcomingbias site
- Provides detailed post metadata including post URLs, titles, authors, tags,
publish dates, and last edit dates
- Provides summary of post content including full post text as HTML or
plaintext, and a list of hyperlinks to other overcomingbias posts
- Asynchronous execution and caching for fast downloads
- Use via ``import obscraper`` or the simple command line interface
- Comprehensively tested
- Supports python 3.8+
Documentation
*************
See :doc:`Getting Started ` for an introduction to the package.
A full reference to the obscraper public API can be found at
:doc:`Public API Reference `.
For the full details, check out the well-documented
`code `_.
Bugs/Requests
*************
Please use the `GitHub issue tracker `_
to submit bugs or request features.
Changelog
*********
See the :doc:`Changelog ` for a list of fixes and enhancements of each
version.
License
*******
Copyright (c) 2022 Christopher McDonald
Distributed under the terms of the
`MIT `_ license.
All overcomingbias posts are copyright the original authors.