blog.8-p.info

This is a re-do of Looking back 2017, through Hacker News. Google Cloud Platform is hosting some public datasets which are readily accessible from BigQuery. In this article, I'm going to check what was happening in 2019, through the stories on Hacker News.

Setup

Previously I was using %bq magic, but it seems deprecated and now %bigquery is the way, according to Migrating from the datalab Python package.

# To use "%%bigquery"
%load_ext google.cloud.bigquery

For the BigQuery client library, I need to setup authentication by setting GOOGLE_APPLICATION_CREDENTIALS environment variable. Without that, you would get Project was not passed and could not be determined from the environment.

The dictionary below is used to parameterize queries below. Updating this notebook for 2011 would be just changing the cell below.

params = {"year": 2019}

Then import pandas for just setting max_colwidth. Without that, pandas turncates long strings, such as URLs

import pandas as pd
pd.set_option('display.max_colwidth', -1)

Google Cloud Platform's dataset doesn't explain the schema, but mentioned that the dataset is based on Hacker News API.

%%bigquery --params $params
SELECT title, timestamp, `by`, score, descendants as comments, url, id FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story' AND extract(year FROM timestamp) = @year
ORDER BY score DESC LIMIT 50

title timestamp by score comments url id
0 Switch from Chrome to Firefox 2019-05-30 16:09:19+00:00 WisNorCan 3287 981 https://www.mozilla.org/en-US/firefox/switch/ 20052623
1 I Sell Onions on the Internet 2019-04-23 13:00:24+00:00 eightturn 3015 435 https://www.deepsouthventures.com/i-sell-onions-on-the-internet/ 19728132
2 Announcing unlimited free private repos 2019-01-07 17:03:59+00:00 razer6 2867 684 https://blog.github.com/2019-01-07-new-year-new-github/ 18847043
3 Slack’s new WYSIWYG input box is terrible 2019-11-20 23:13:09+00:00 ingve 2776 1076 https://quuxplusone.github.io/blog/2019/11/20/slack-rich-text-box/ 21589647
4 Show HN: A retro video game console I've been working on in my free time 2019-03-14 20:25:03+00:00 pkiller 2690 210 https://internalregister.github.io/2019/03/14/Homebrew-Console.html 19393279
5 My Business Card Runs Linux 2019-12-24 10:15:42+00:00 rcarmo 2584 397 https://www.thirtythreeforty.net/posts/2019/12/my-business-card-runs-linux/ 21871026
6 Blizzard Suspends Professional Hearthstone Player for Hong Kong Comments 2019-10-08 09:23:08+00:00 hownottowrite 2525 1126 https://playhearthstone.com/en-us/blog/23179289/ 21190265
7 Raspberry Pi 4 2019-06-24 06:00:28+00:00 MarcScott 2504 837 https://www.raspberrypi.org/blog/raspberry-pi-4-on-sale-now-from-35 20260863
8 Twitter to ban political advertising 2019-10-30 20:07:19+00:00 coloneltcb 2447 1004 https://twitter.com/jack/status/1189634360472829952 21401973
9 No Thank You, Mr. Pecker 2019-02-07 22:52:16+00:00 coloneltcb 2444 730 https://medium.com/@jeffreypbezos/no-thank-you-mr-pecker-146e3922310f 19109474
10 Julian Assange arrested in London 2019-04-11 09:37:56+00:00 kragniz 2369 1119 https://www.bbc.co.uk/news/uk-47891737 19632449
11 Save .org 2019-11-23 01:02:44+00:00 jaden 2297 342 https://savedotorg.org/ 21611677
12 Otonomo, with nearly $55M in funding, is cloning our product 2019-04-22 14:55:04+00:00 sahaskatta 2292 603 https://smartcar.com/blog/how-otonomo-is-cloning-our-product/ 19719380
13 Mazda is purging touchscreens from its vehicles 2019-06-17 06:11:43+00:00 meteor333 2220 952 https://www.motorauthority.com/news/1121372_why-mazda-is-purging-touchscreens-from-its-vehicles 20200335
14 A Conspiracy to Kill IE6 2019-05-01 16:25:19+00:00 zacman85 2195 363 http://blog.chriszacharias.com/a-conspiracy-to-kill-ie6 19798678
15 Unveiling the first-ever image of a black hole [video] 2019-04-10 13:08:34+00:00 doktorn 2164 488 https://www.youtube.com/watch?v=Dr20f19czeE 19624226
16 Reflecting on My Failure to Build a Billion-Dollar Company 2019-02-07 15:28:10+00:00 jamesjyu 2101 353 https://medium.com/@shl/reflecting-on-my-failure-to-build-a-billion-dollar-company-b0c31d7db0e7 19105733
17 Ken Thompson's Unix Password 2019-10-09 13:22:51+00:00 stargrave 2101 637 https://leahneukirchen.org/blog/archive/2019/10/ken-thompson-s-unix-password.html 21202905
18 Google to restrict modern ad blocking Chrome extensions to enterprise users 2019-05-29 19:38:04+00:00 estranhosidade 2093 877 https://9to5google.com/2019/05/29/chrome-ad-blocking-enterprise-manifest-v3/ 20044430
19 GitHub Sponsors 2019-05-23 08:32:43+00:00 Heliosmaster 2082 501 https://github.com/sponsors 19989684
20 Court: Suspicionless Searches of Travelers’ Phones and Laptops Unconstitutional 2019-11-12 20:21:22+00:00 coloneltcb 2063 580 https://www.aclu.org/press-releases/federal-court-rules-suspicionless-searches-travelers-phones-and-laptops 21517722
21 Firefox Send: Free encrypted file transfer service 2019-03-12 13:24:42+00:00 dnlserrano 2031 512 https://blog.mozilla.org/blog/2019/03/12/introducing-firefox-send-providing-free-file-transfers-while-keeping-your-personal-information-private/ 19367850
22 The boring technology behind a one-person Internet company 2019-09-16 16:40:08+00:00 mxschumacher 2010 451 https://broadcast.listennotes.com/the-boring-technology-behind-listen-notes-56697c2e347b 20985875
23 Ask HN: What books changed the way you think about almost everything? 2019-02-05 17:31:45+00:00 anderspitman 2009 1165 None 19087418
24 Having Kids 2019-12-14 16:49:19+00:00 yarapavan 1992 863 http://paulgraham.com/kids.html 21790396
25 California bans private prisons 2019-09-12 19:02:53+00:00 anigbrowl 1952 597 https://www.theguardian.com/us-news/2019/sep/12/california-private-prison-ban-immigration-ice 20955103
26 Vulnerability in the Mac Zoom client allows malicious websites to enable camera 2019-07-08 22:17:47+00:00 mplanchard 1937 456 https://medium.com/@jonathan.leitschuh/zoom-zero-day-4-million-webcams-maybe-an-rce-just-get-them-to-visit-your-website-ac75c83f4ef5 20387298
27 Google Is Eating Our Mail 2019-04-26 10:35:21+00:00 saintamh 1931 685 https://www.tablix.org/~avian/blog/archives/2019/04/google_is_eating_our_mail/ 19756125
28 Joe Armstrong has died 2019-04-20 13:50:28+00:00 okket 1928 188 https://twitter.com/FrancescoC/status/1119596234166218754 19706514
29 I tried creating a web browser, and Google blocked me 2019-04-02 13:43:02+00:00 smaddock 1920 605 https://blog.samuelmaddock.com/posts/google-widevine-blocked-my-browser/ 19553941
30 Spotify to Apple: Time to Play Fair 2019-03-13 10:20:44+00:00 dmitriid 1903 841 https://www.timetoplayfair.com/timeline/ 19377322
31 Google’s GDPR Workaround 2019-09-04 11:59:36+00:00 donohoe 1868 588 https://brave.com/google-gdpr-workaround/ 20876248
32 16-inch MacBook Pro 2019-11-13 13:38:04+00:00 rayascott 1840 1688 https://www.apple.com/newsroom/2019/11/apple-introduces-16-inch-macbook-pro-the-worlds-best-pro-notebook/ 21523780
33 Tell HN: Thank you for not redesigning Hacker News 2019-09-01 19:27:00+00:00 ramphastidae 1831 390 None 20854214
34 A guide to difficult conversations 2019-03-26 10:35:34+00:00 davesuperman 1829 400 https://medium.dave-bailey.com/the-essential-guide-to-difficult-conversations-41f736e63ccf 19490573
35 Microsoft has removed the “use offline account” option when installing Windows 2019-09-28 23:11:53+00:00 rahuldottech 1793 763 https://www.reddit.com/r/Windows10/comments/daim1y/ms_has_removed_the_use_offline_account_option/ 21103683
36 Serverless: slower and more expensive 2019-09-23 07:00:38+00:00 kiyanwang 1787 712 http://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/ 21046547
37 Tesla Cybertruck 2019-11-22 04:30:23+00:00 sahin-boydas 1765 1928 https://www.tesla.com/cybertruck 21602437
38 A uBlock Origin update was rejected from the Chrome Web Store 2019-10-12 13:43:01+00:00 ismaildonmez 1757 588 https://github.com/uBlockOrigin/uBlock-issues/issues/745 21233041
39 Richard M. Stallman resigns 2019-09-17 02:15:50+00:00 maxdeviant 1747 2180 https://www.fsf.org/news/richard-m-stallman-resigns 20990583
40 UC terminates subscriptions with Elsevier in push for open access 2019-02-28 18:51:11+00:00 tingletech 1743 233 https://www.universityofcalifornia.edu/press-room/uc-terminates-subscriptions-worlds-largest-scientific-publisher-push-open-access-publicly 19273955
41 Ask HN: What do you do with your Raspberry Pi? 2019-06-24 16:02:25+00:00 xylo 1741 1069 None 20264911
42 Turning a MacBook into a Touchscreen with $1 of Hardware (2018) 2019-08-06 12:41:15+00:00 soegaard 1724 215 https://www.anishathalye.com/2018/04/03/macbook-touchscreen/ 20624576
43 We Stood Up to a Patent Troll and Won 2019-11-04 22:21:58+00:00 eastdakota 1675 246 https://blog.cloudflare.com/the-project-jengo-saga-how-cloudflare-stood-up-to-a-patent-troll-and-won/ 21447215
44 Learning at work is work, and we must make space for it 2019-12-11 14:30:15+00:00 sarapeyton 1673 453 https://sloanreview.mit.edu/article/learning-for-a-living/ 21762640
45 Google terminated our business via our Google Play Developer Account 2019-02-09 20:14:13+00:00 jacquesm 1668 498 https://blog.usejournal.com/google-wrongly-terminated-our-new-business-via-our-google-play-developer-account-5f5b7b742542?gi=78a7126ab7f8 19124324
46 The Lonely Work of Moderating Hacker News 2019-08-08 09:49:23+00:00 lordnacho 1663 777 https://www.newyorker.com/news/letter-from-silicon-valley/the-lonely-work-of-moderating-hacker-news 20643052
47 You Are Not Google (2017) 2019-04-04 19:19:05+00:00 gerbilly 1623 572 https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb 19576092
48 Start with a Website, Not a Mobile App 2019-01-04 16:07:34+00:00 jenthoven 1620 559 https://www.atrium.co/blog/founders-should-build-website-not-mobile-app/ 18824993
49 Sunsetting Python 2 2019-09-09 06:43:24+00:00 azizsaya 1616 704 https://www.python.org/doc/sunset-python-2/ 20915746

Commonly Shared/Upvoted Domains

Note that NO DOMAIN means stories without associated URLs, such as “Ask HN”.

%%bigquery --params $params
SELECT
    domains_this_year.domain,

    domains_this_year.count as count_this_year,
    domains_last_year.count as count_last_year,
    (domains_this_year.count / domains_last_year.count) as count_yoy,

    domains_this_year.score as score_this_year,
    domains_last_year.score as score_last_year,
    (domains_this_year.score / domains_last_year.score) as score_yoy
FROM
    (SELECT
     domain, COUNT(1) AS count, SUM(score) AS score FROM
        (SELECT
         IFNULL(REGEXP_EXTRACT(url,r'^https?://(?:www.)?([^/]*)/?(?:.*)'), "NO DOMAIN") AS domain, score
         FROM `bigquery-public-data.hacker_news.full`
         WHERE extract(year FROM timestamp) = @year)
     GROUP BY domain
     ORDER BY count DESC LIMIT 100) domains_this_year
    JOIN
    (SELECT
     domain, COUNT(1) AS count, SUM(score) AS score FROM
         (SELECT
          IFNULL(REGEXP_EXTRACT(url,r'^https?://(?:www.)?([^/]*)/?(?:.*)'), "NO DOMAIN")  AS domain, score
          FROM `bigquery-public-data.hacker_news.full`
          WHERE extract(year FROM timestamp) = (@year-1))
     GROUP BY domain
     ORDER BY count DESC LIMIT 100) domains_last_year
    ON domains_this_year.domain = domains_last_year.domain
LIMIT 50

domain count_this_year count_last_year count_yoy score_this_year score_last_year score_yoy
0 NO DOMAIN 2786600 2420204 1.151391 229058 241363 0.949019
1 medium.com 17424 18113 0.961961 103059 117330 0.878369
2 github.com 13539 13336 1.015222 262248 228399 1.148201
3 youtube.com 8008 7972 1.004516 47554 48127 0.988094
4 nytimes.com 6643 5769 1.151499 174981 159671 1.095885
5 en.wikipedia.org 4566 2779 1.643037 52026 28513 1.824641
6 theguardian.com 3926 3481 1.127837 59598 77924 0.764822
7 bloomberg.com 3863 3469 1.113577 103875 123642 0.840127
8 twitter.com 3775 2460 1.534553 102478 48913 2.095108
9 arstechnica.com 3231 3554 0.909116 44255 56575 0.782236
10 theverge.com 3069 3178 0.965702 40884 39752 1.028477
11 techcrunch.com 2932 3729 0.786270 86911 100636 0.863617
12 bbc.com 2832 2213 1.279711 61032 52447 1.163689
13 wsj.com 2681 2207 1.214771 52781 41874 1.260472
14 arxiv.org 2471 1574 1.569886 24085 22904 1.051563
15 youtu.be 2303 2177 1.057878 2778 2218 1.252480
16 dev.to 1976 954 2.071279 8218 4106 2.001461
17 washingtonpost.com 1893 1647 1.149362 26031 23650 1.100677
18 wired.com 1815 1800 1.008333 22292 22155 1.006184
19 nature.com 1779 1051 1.692674 28962 20019 1.446726
20 forbes.com 1711 1221 1.401310 19642 12070 1.627341
21 theatlantic.com 1663 1761 0.944350 28942 32640 0.886703
22 reuters.com 1624 1373 1.182811 48865 36061 1.355065
23 reddit.com 1511 1090 1.386239 25897 16633 1.556965
24 cnbc.com 1421 1442 0.985437 23143 28778 0.804191
25 zdnet.com 1366 971 1.406797 24841 12397 2.003791
26 bbc.co.uk 1327 1356 0.978614 38649 28625 1.350183
27 npr.org 1107 820 1.350000 26464 25169 1.051452
28 phys.org 1097 836 1.312201 14258 10430 1.367018
29 towardsdatascience.com 1051 391 2.687980 5898 4432 1.330776
30 hackernoon.com 1023 2586 0.395592 5756 14281 0.403053
31 economist.com 1005 954 1.053459 19912 26013 0.765463
32 linkedin.com 923 800 1.153750 5632 2391 2.355500
33 businessinsider.com 890 859 1.036088 14674 13205 1.111246
34 technologyreview.com 875 982 0.891039 8047 8202 0.981102
35 theregister.co.uk 805 933 0.862808 16234 5914 2.745012
36 spectrum.ieee.org 787 900 0.874444 12469 16677 0.747676
37 latimes.com 784 440 1.781818 24548 10313 2.380297
38 cnn.com 780 365 2.136986 8006 3518 2.275725
39 fastcompany.com 768 700 1.097143 11307 9588 1.179287
40 newyorker.com 745 747 0.997323 19336 17399 1.111328
41 qz.com 740 1023 0.723363 12961 17417 0.744158
42 vox.com 735 458 1.604803 11659 9100 1.281209
43 scientificamerican.com 704 544 1.294118 12380 10110 1.224530
44 edition.cnn.com 690 364 1.895604 7700 2695 2.857143
45 gizmodo.com 683 818 0.834963 9695 13309 0.728454
46 engadget.com 649 806 0.805211 6940 6935 1.000721
47 ft.com 638 394 1.619289 7148 1975 3.619241
48 iafrikan.com 632 874 0.723112 2027 1511 1.341496
49 venturebeat.com 590 816 0.723039 5227 5365 0.974278

See Also