Changelog¶
Versions follow the Semantic Versioning 2.0.0 standard.
obscraper 0.8.2 (2022-12-17)¶
Improvements¶
Build with hatchling.
obscraper 0.8.1 (2022-07-07)¶
Improvements¶
Use HTTP/2 client for improved performance.
obscraper 0.8.0 (2022-05-27)¶
Improvements¶
No longer fail for posts where comments appear to be disabled. (I.e. posts without a
disqus_id.) This applies to 4 posts:2007/12/welcome-to-overcoming-bias2008/08/about-the-future-of-humanity-institute2008/08/yudkowskys-book2009/02/the-most-important-thing
Trivial/Internal Changes¶
Use absolute rather than relative imports for clarity.
obscraper 0.7.0 (2022-03-14)¶
Breaking changes¶
Post
nameis now formatted without the leading slash. E.g.:Before: /2009/02/the-most-important-thing
After: 2009/02/the-most-important-thing
Bug fixes¶
Fixed small bug matching URLs in regular expressions.
obscraper 0.6.0 (2022-03-11)¶
Breaking changes¶
Updated API: the
internal_linksandexternal_linksattributes of Post are now lists (possibly containing duplicates) rather than dictionaries.
obscraper 0.5.0 (2022-02-10)¶
Major update.
Improvements¶
Asynchronous execution: internals now execute requests and postprocessing asynchronously using trio. This is at least 10% faster than the previous multithreaded version.
Improved tests: migrated all tests to pytest. Added more systematic testing of random posts.
Sessions: internals now use (asynchronous) sessions, reducing the load on the overcomingbias server and increasing download speed.
Breaking changes¶
Updated interface for consistency and clarity:
grab_edit_datesis now get_edit_datesget_votesandget_commentsare now get_vote_counts and get_comment_counts
Updated behaviour of get_all_posts to return None when the post could not be retrieved.
Removed outdated
max_workersargument from public API functions.
Trivial / internal changes¶
Source code now follows the black format.
obscraper 0.4.0 (2022-02-06)¶
Features¶
Added logging functionality, and documentation in the Getting Started guide.
Bug fixes¶
AttributeNotFoundError exceptions are now caught when downloading multiple posts. This prevents crashes on “broken” posts, e.g. 2009/02/the-most-important-thing.
obscraper 0.3.0 (2022-02-03)¶
Breaking Changes¶
get_all_posts, get_posts_by_edit_date and grab_edit_dates now return post names rather than post URLs in their keys.
“Short” URLs - the form overcomingbias.com/?p=12345 - are no longer accepted. This might change again in the future.
Features¶
Add get_post_by_name and get_posts_by_names to the public API.
Add OB_POST_URL_PATTERN to the public API.
Add url_to_name and name_to_url to the public API.
Improved Documentation¶
Add information on exceptions raised by public API functions.
Trivial / internal changes¶
Most internal interfaces now use post names rather than URLs.
obscraper 0.2.0 (2022-01-19)¶
Breaking Changes¶
get_posts_by_urls will now fail when a post attribute can not be extracted from the post HTML, since this situation is technically a bug. Previously it returned None.
The Post name attribute now contains the year and month of publication, as in URLs. E.g. ‘jobs-explain-lots’ becomes ‘2010/09/jobs-explain-lots’. This ensures the post URL can be reconstructed from the post name.
Improvements¶
Let users specify the maximum number of threads used to download posts, via the
max_workersoptional argument.Remove repeated whitespace within the text, when getting post text as plaintext.
Trivial/Internal Changes¶
Post now represents the post URL as a property rather than an attribute.
obscraper 0.1.3 (2022-01-18)¶
First public release!
For the initial list of features, see Getting Started and Public API Reference.