Bombadil: Property-based testing for web UIs by Antithesis

(github.com)

106 points | by Klaster_1 4 days ago

11 comments

NoraCodes 42 minutes ago
My kingdom for a way to stop this godforsaken industry from stripping Tolkien's fiction for parts.
[-]
- jkestner 19 minutes ago
  Let's start naming things after Iain Banks ships.
  [-]
  - patapong 13 minutes ago
    I am in support. In general he was really good with names I thought, they always had an otherwordly flair while being clear to pronounce. Skaffen-Amtiskaw, Anaplian, Elethiomel...
    [-]
    - chrisweekly 6 minutes ago
      Yes!
      "Just Another Victim Of The Ambient Morality" is one of my favorites.
- jzelinskie 11 minutes ago
  I'm just waiting for them to exhaust LotR and move on to Roverandom
- pythonaut_16 17 minutes ago
  Makes me want to name a project or company Sauron in response.
  [-]
  - ffsm8 8 minutes ago
    I've worked at a company that had a team call themselves Sauron before
    So occasionally I got mails by "some colleague on behalf of Sauron" back then
- paulnpace 7 minutes ago
  Bad actors use Tolkien. Good actors use Orwell.
- eclectician 34 minutes ago
  We can go strip Shakespeare instead.
thibran 43 minutes ago
I'm doing propety-based test since years for frontend stuff. The hardest part is, that there is so much between the test inputs and the application under test, that I find 50% of the time problems with the frontend test frameworks/libs and not in our code.
[-]
- terpimost 27 minutes ago
  Are you talking about user flows and multiple interactions that are happening and data exchange that PBT before that wasn't able to address?
  [-]
  - thibran 12 minutes ago
    PBT allows us to test more combinations without writing hundreds of tests. Yes, it's about user flow inside a single module of our gigantic application.
- owickstrom 42 minutes ago
  Interesting. What kind of properties are you checking?
  [-]
  - thibran 14 minutes ago
    I use quicktheories (Java) and generate a determistic random test scenario, then I generate input values and run the tests. This way I can create tests that should fail or succeed, but differ in the steps executed and in the order with "random input".
IanCal 1 hour ago
I'm a huge fan of property based testing, I've built some runners before, and I think it can be great for UI things too so very happy to see this coming around more.
Something I couldn't see was how those examples actually work, there are no actions specified. Do they watch a user, default to randomly hitting the keyboard, neither and you need to specify some actions to take?
What about rerunning things?
Is there shrinking?
edit - a suggestion for examples, have a basic UI hosted on a static page which is broken in a way the test can find. Like a thing with a button that triggers a notification and doesn't actually have a limit of 5 notifications.
[-]
- danbruc 47 minutes ago
  How effective is property based testing in practice? I would assume it has no trouble uncovering things like missing null checks or an inverted condition because you can cover edge cases like null, -1, 0, 1, 2^n - 1 with relatively few test cases and exhaustively test booleans. But beyond that, if I have a handful of integers, dates, or strings, then the state space is just enormous and it seems all but impossible to me that blindly trying random inputs will ever find any interesting input. If I have a condition like (state == "disallowed") or (limit == 4096) when it should have been 4095, what are the odds that a random input will ever pass this condition and test the code behind it?
  Microsoft had a remotely similar tool named Pex [1] but instead of randomly generating inputs, it instrumented the code to also enable executing the code symbolically and then used their Z3 theorem proofer to systematically find inputs to make all encountered conditions either true or false and with that incrementally explore all possible execution paths. If I remember correctly, it then generated a unit test for each discovered input with the corresponding output and you could then judge if the output is what you expected.
  [1] https://www.microsoft.com/en-us/research/publication/pex-whi...
  [-]
  - skybrian 25 minutes ago
    One thing you can find pretty quickly with just basic fuzzing on strings is Unicode-related bugs.
- owickstrom 1 hour ago
  Hey, yeah the default specification includes a set of action generators that are picked from randomly. If you write a custom spec you can define your own action generators and their weights.
  Rerunning things: nothing built for that yet, but I do have some design ideas. Repros are notoriously shaky in testing like this (unless run against a deterministic app, or inside Antithesis), but I think Bombadil should offer best-effort repros if it can at least detect and warn when things diverge.
  Shrinking: also nothing there yet. I'm experimenting with a state machine inference model as an aid to shrinking. It connects to the prior point about shaky repros, but I'm cautiously optimistic. Because the speed of browser testing isn't great, shrinking is also hard to do within reasonable time bounds.
  Thanks for the questions and feedback!
warpspin 1 hour ago
I especially like that it's a single executable according to the docs.
Recently evaluated other testing tools/frameworks and if you're not already running the npm-dependencyhell-shitshow for your projects, most tools will pull in at least 100 dependencies.
I might be old fashioned but that's just too much for my taste. I love single-use tools with limited scope like e.g. esbuild or now this.
Will give this a try, soon.
[-]
- owickstrom 55 minutes ago
  Glad you noticed! I've been putting quite some energy into keeping things this way. VERY worth it, IMO.
picardo 10 minutes ago
For most static UI surfaces, I probably wouldn't use it, but I can see a use case in this for testing generative UI workloads.
css_apologist 8 minutes ago
very cool! does this work? can you describe the kinds of real bugs you've caught with this?
jryio 48 minutes ago
Hey Oskar ~ great project and looks promising. I would be curious to hear what is still work-in-progress for Bombadil.
It's helpful to know what the tool maintainers see as upcoming or incomplete work. It also saves a consultant like me a lot of time to evaluate new tools for clients if I also know the limitations before diving in. Maybe a section in the manual for "What Bombadil can't do".
Great work!
owickstrom 1 hour ago
Author here, happy to answer questions about Bombadil! :)
[-]
- degenerate 9 minutes ago
  From a project management perspective, the 5 examples don't help me understand how/why I might switch from Playwright/Cypress to this framework. It seems like Bombadil is a much lower-level test framework focusing on DOM properties but in the "Why Bombadil?" introduction you say "maintaining suites of Playwright or Cypress tests takes a lot of work" ... I'd like if there was an example showing how this is true, perhaps a 1:1 example of Playwright vs Bombadil for testing something such as notifications clearing when I click clear. Basically, beefing up examples with real-world ones that Playwright users might have written is a good way to foster adoption.
- owickstrom 1 hour ago
  btw, some background on the project: https://wickstrom.tech/2026-01-28-there-and-back-again-from-...
- bombcar 50 minutes ago
  All I can think of is "the Token Ring had no power over him" but then I realized that "token ring" has a completely different meaning now in the age of AI.
  Nice name, now who is he?
elcapitan 58 minutes ago
"Bombadil" means that I'll probably skip most of these tests.
[-]
- philipallstar 21 minutes ago
  I sense a kindred spirit.
sequoia 44 minutes ago
Struggling to understand what this is or how it works.
orliesaurus 1 hour ago
Bombadillo Crocodillo
Ok I will see myself out
(Yes I know it's actually from the Tolkien book)