Changelog

Versions follow the Semantic Versioning 2.0.0 standard.

obscraper 0.8.2 (2022-12-17)

Improvements

  • Build with hatchling.

obscraper 0.8.1 (2022-07-07)

Improvements

  • Use HTTP/2 client for improved performance.

obscraper 0.8.0 (2022-05-27)

Improvements

  • No longer fail for posts where comments appear to be disabled. (I.e. posts without a disqus_id.) This applies to 4 posts:

    • 2007/12/welcome-to-overcoming-bias

    • 2008/08/about-the-future-of-humanity-institute

    • 2008/08/yudkowskys-book

    • 2009/02/the-most-important-thing

Trivial/Internal Changes

  • Use absolute rather than relative imports for clarity.

obscraper 0.7.0 (2022-03-14)

Breaking changes

  • Post name is now formatted without the leading slash. E.g.:

    • Before: /2009/02/the-most-important-thing

    • After: 2009/02/the-most-important-thing

Bug fixes

  • Fixed small bug matching URLs in regular expressions.

obscraper 0.6.0 (2022-03-11)

Breaking changes

  • Updated API: the internal_links and external_links attributes of Post are now lists (possibly containing duplicates) rather than dictionaries.

obscraper 0.5.0 (2022-02-10)

Major update.

Improvements

  • Asynchronous execution: internals now execute requests and postprocessing asynchronously using trio. This is at least 10% faster than the previous multithreaded version.

  • Improved tests: migrated all tests to pytest. Added more systematic testing of random posts.

  • Sessions: internals now use (asynchronous) sessions, reducing the load on the overcomingbias server and increasing download speed.

Breaking changes

  • Updated interface for consistency and clarity:

  • Updated behaviour of get_all_posts to return None when the post could not be retrieved.

  • Removed outdated max_workers argument from public API functions.

Trivial / internal changes

  • Source code now follows the black format.

obscraper 0.4.0 (2022-02-06)

Features

Bug fixes

  • AttributeNotFoundError exceptions are now caught when downloading multiple posts. This prevents crashes on “broken” posts, e.g. 2009/02/the-most-important-thing.

obscraper 0.3.0 (2022-02-03)

Breaking Changes

  • get_all_posts, get_posts_by_edit_date and grab_edit_dates now return post names rather than post URLs in their keys.

  • “Short” URLs - the form overcomingbias.com/?p=12345 - are no longer accepted. This might change again in the future.

Features

Improved Documentation

  • Add information on exceptions raised by public API functions.

Trivial / internal changes

  • Most internal interfaces now use post names rather than URLs.

obscraper 0.2.0 (2022-01-19)

Breaking Changes

  • get_posts_by_urls will now fail when a post attribute can not be extracted from the post HTML, since this situation is technically a bug. Previously it returned None.

  • The Post name attribute now contains the year and month of publication, as in URLs. E.g. ‘jobs-explain-lots’ becomes ‘2010/09/jobs-explain-lots’. This ensures the post URL can be reconstructed from the post name.

Improvements

  • Let users specify the maximum number of threads used to download posts, via the max_workers optional argument.

  • Remove repeated whitespace within the text, when getting post text as plaintext.

Trivial/Internal Changes

  • Post now represents the post URL as a property rather than an attribute.

obscraper 0.1.3 (2022-01-18)

First public release!

For the initial list of features, see Getting Started and Public API Reference.