My first patch to the Linux kernel

(pooladkhay.com)

133 points | by pooladkhay 2 days ago

14 comments

  • monus 27 minutes ago
    > You may wonder whether I tried asking an LLM for help or not. Well, I did. In fact it was very helpful in some tasks like summarizing kernel logs [^13] and extracting the gist of them. But when it came to debugging based on all the clues that were available, it concluded that my code didn't have any bugs, and that the CPU hardware was faulty.

    This matches my experience whenever I do an unconventional or deep work like the article mentions. The engineers comfortable with this type of work will multiply their worth.

  • fonheponho 38 minutes ago
    Everybody seems to be missing the forest for the trees on this.

    There is absolutely no "sign extension" in the C standard (go ahead, search it). "Sign extension" is a feature of some assembly instructions on some architectures, but C has nothing to do with it.

    Citing integer promotion from the standard is justified, but it's just one part (perhaps even the smaller part) of the picture. The crucial bit is not quoted in the article: the specification of "Bitwise shift operators". Namely

    > The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. [...]

    > The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

    What happens here is that "base2" (of type uint8_t, which is "unsigned char" in this environment) gets promoted to "int", and then left-shifted by 24 bits. You get undefined behavior because, while "base2" (after promotion) has a signed type ("int") and nonnegative value, E1×2^E2 (i.e., base2 × 2^24) is NOT representable in the result type ("int").

    What happens during the conversion to "uint64_t" afterwards is irrelevant; even the particulars of the sign bit of "int", and how you end up with a negative "int" from the shift, are irrelevant; you got your UB right inside the invalid left-shift. How said UB happens to materialize on this particular C implementation may perhaps be explained in terms of sign extension of the underlying ISA -- but do that separately; be absolutely clear about what is what.

    The article fails to mention the root cause (violating the rules for the bitwise left-shift operator) and fails to name the key consequence (undefined behavior); instead, it leads with not-a-thing ("sign-extension bug in C"). I'm displeased.

    BTW this bug (invalid left shift of a signed integer) is common, sadly.

    • manwe150 7 minutes ago
      It was implementation defined for shifting negative numbers, but now the standard specifies twos-complement for this and all related IB
  • ashwinnair99 3 hours ago
    The first one always takes way longer than the code itself deserves. Most of the work is figuring out the unwritten rules, not writing the patch.
    • fooker 3 hours ago
      This is a big problem in open source that seems taboo to discuss.

      In my opinion, unwritten rules are for gatekeeping. And if a new person follows all the unwritten rules, magically there's no one willing to review.

      I think this is how large BFDL-style open source projects slowly become less and less relevant over the next few decades.

      • cromka 3 hours ago
        Agreed. The level of aggressive gatekeepers is just crazy, take Linux ARM mailing list for example. I found the Central and Eastern Europeans particularly aggressive there and I'm saying this as on myself. They sure do like to feel special there, with very little soft skills.
      • tossandthrow 1 hour ago
        This will likely be alleviated when Ai first projects take over as important OSS projects.

        Fir these projects everything "tribal" has to be explicitly codified.

        On a more general note: this is likely going to have a rather big impact on software in general - the "engineer to company can not afford to loose" is likely loosing their moat entirely.

    • yu3zhou4 3 hours ago
      Can confirm that it also happens in other complex systems! Still a lot of good time and the novelty factor helps with pushing through
    • seb1204 3 hours ago
      Sand that after so many years these rules are still not written down.
  • ngburke 2 hours ago
    Sign extension bugs are the worst. Silent for ages then suddenly everything is on fire. Spent a lot of time in C doing low-level firmware work and ran into the same class of issue more than once. Nice writeup, congrats on the patch.
  • dingensundso 58 minutes ago
    Nice blogpost. Was an really interesting read. Would be interesting to read about the experience of getting the patch accepted and merged.

    One thing I noticed: The last footnote is missing.

  • ozgrakkurt 36 minutes ago
    Great blog post. Using _BitInt typedefs for integers is a good option for anyone starting a fresh c project. It has worked well for me so far. _BitInt integers don’t promote to signed automatically like regular integers in c
  • NotCamelCase 59 minutes ago
    Lovely article with a happy ending!

    One thing that I am glad to have been taught early on in my career when it comes to debugging, especially anything involving HW, is to `make no assumptions'. Bugs can be anywhere and everywhere.

  • foltik 3 hours ago
    Well done and great writeup! Any idea why the bug hadn’t shown up sooner, like when running self tests?
  • siddyboi 45 minutes ago
    Huge congrats on tracking that down and getting your first Linux kernel patch merged!
  • knorker 1 hour ago
    Integer promotion rules in C are so deceptive.

    I don't believe there's anybody who can reason about them at code skimming speeds. It's probably the best place to hide underhanded code.

  • yu3zhou4 3 hours ago
    Congrats and happy for you, you had a lot of fun and did something genuinely interesting
  • mbana 3 hours ago
    I love these kind of posts.
  • leontloveless 19 minutes ago
    [dead]
  • algolint 2 hours ago
    [dead]