Show HN: Hacker-News Buddies

https://hn-buddies.stupidlabs.lol

Find hackernews audience whose comments matches most to you.

This is based on tf-idf weighted keywords match from comments. Covers data from Jan 1, 2020 through May 31, 2026. Keywords that are too rare or too broad across authors are filtered out before scoring.

Due to above filtering, a lot of the authors are NOT covered here. Full coverage would have yielded more than a few trillion records, and I don't have that much compute or disk.

Details of the process: https://hn-buddies.stupidlabs.lol/about-data

---

You can also see who talks most about certain topic or keyword. For example,

NSA: https://hn-buddies.stupidlabs.lol/?keyword=nsa

Trump: https://hn-buddies.stupidlabs.lol/?keyword=trump

---

Global insights page: https://hn-buddies.stupidlabs.lol/insights

---

This also lets you uncover duplicate accounts. For example:

1. "fdklhhjf" and "selamcan"

2. "angkatoto" and "jalantoto"

3. "Donnakravo" and "dommakravosec" and "kravossedonna" and "kravosdonna"

6 points | by freakynit 7 hours ago

4 comments

  • enragebait 7 hours ago
    This looks interesting, though you should know that “buddies” aren’t the ones who are too much alike, buddies are the ones different enough to complement each other, yet alike enough not to disagree too often. That’s why they say “opposites attract” even among “platonic” relationships.

    This could mean you might take the “temperament” of the person posting (like “estj”) and map the 2nd and 4th to the 4th and 2nd (the 2nd is regarded as the “input” and the 4th the “output”) so “S” is compatible with “P” and “N” Is compatible with “J”.

    And then give a bonus modification for the opposite of the others, so “E” likes “I” and vise versa (always a quite dude hanging out with a talkative dude, though not exclusively). And “T” prefers the company of “F” though not exclusively (see these as technical and creative.)

    This gives you compatible interfaces (input/output) and diverging (thus “more interesting”) social dispositions.

    You could probably turn that into a good dating algorithm if it isn’t already, though it works for “pals” too!

    • freakynit 5 hours ago
      These seem fun to explore. Will definitely check out. Thanks!
  • malandin 7 hours ago
    Great project! I was thinking of building something similar with not only search but analytics as well. Could you hint at where the dataset comes from? I'd really like to have a look
    • freakynit 5 hours ago
      Thank you. This has been in my mind for past 1 year. Wanted to do it using vector embedding similarity match, but due to costs and compute requirements, had to resort to keyword based.

      The data comes from daily-updated public BigQuery dataset: https://news.ycombinator.com/item?id=40644563

  • tetris11 7 hours ago
    I like it! I'm just missing from the corpus for some reason...

    Quick glance: TF-IDF, cosine-similarity, the only thing missing is a nice UMAP :-)

    • freakynit 5 hours ago
      Thanks for the UMAP suggestion. Will add.

      Most of the authors are actually missing. Full processing would have yielded multi-trillion row dataset. I didn't rally have that kind of compute with me.

      I have even tried running the cross-join on BigQuery... after one hour, only about 3% was done.. so, had to cancel it.

  • holg 6 hours ago
    interesting, to me it eems "buddies" is the wrong term anyhow...
    • freakynit 5 hours ago
      Yea.. even if not wrong, it's definitely not correct. Couldn't think of any other, so, sticked to it for now.