Those are all much smaller. Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not. In a corporate settings management will say "this would not have happened if you had gone with AWS". its the current version of "no one ever got fired for buying IBM" (we had MS and others in between).
Hetzner provides a much simpler set of services than AWS. Less complexity to go wrong.
A lot of people want the brand recognition too. Its also become the standard way of doing things and is part of the business culture. I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
There is this weird thing that happens with hyperscale - the combination of highly central decision-making, extreme interconnection / interdependence of parts, and the attractiveness of lots of money all conspire to create a system pulled by unstable attractors to a fracturing point (slowed / mitigated at least a little by the inertia of such a large ship).
Are smaller scale services more reliable? I think that's too simple a question to be relevant. Sometimes yes, sometimes no, but we know one thing for sure - when smaller services go down the impact radius is contained. When a corrupt MBA who wants to pump short term metrics for a bonus gains power, the damage they can do is similarly contained. All risk factors are boxed in like this. With a hyperscale business, things are capable of going much more wrong for many more people, and the recursive nature of vertical+horizontal integration causes a calamity engine that can be hard to correct.
Take the financial sector in 08. Huge monoliths that had integrated every kind of financial service with every other kind of financial service. Few points of failure, every failure mode exposed to every other failure mode.
There's a reason asymmetric warfare is hard for both parties - cellular networks of small units that can act independently are extremely fault tolerant and robust against changing conditions. Giants, when they fall, do so in spectacular fashion.
Have you considered that a widespread outage is a feature, not a bug?
If AWS goes down, no one will blame you for your web store being down as pretty much every other online service will be seeing major disruptions.
But when your super small provider goes down, it's now your problem and you better have some answers ready for your manager. And you'll still be affected by the AWS outage anyways as you probably rely on an API that runs on their cloud!
> Have you considered that a widespread outage is a feature
It's a "feature" right up there with planned obsolescence and garbage culture (the culture of throw-away).
The real problem is not having a fail-over provider. Modern software is so abstracted (tens, hundreds, even thousands of layers), and yet we still make the mistake of depending on one, two layers to make things "go".
When your one small provider goes down, no problem, switch over to your other provider. Then laugh at the people who are experiencing AWS downtime...
> Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not.
Hard disagree. A smaller provider will think twice about whether they use a Tier 1 data center versus a Tier IV data center because the cost difference is substantial and in many cases prohibitively expensive.
This. There's a fundamental logic error here. You simply don't hear about downtimes at smaller providers that often because it doesn't affect a significant portion of the internet like it does e.g. for AWS. But that doesn't mean they are more stable in general.
yeah, I'd like to see hard data on uptimes / reliability between these 2 services before declaring that big = bad and small = good.
FlyIO (and Digital Ocean) had horrible up-time when they first got started. In the last 6-12 months, FlyIO been much better. But they would go down all the time or have unexpected CI bugs/changes.
Digital Ocean accidentally hard deleted user's object stores before their IPO.
Not to mention the familiarity of the company, its services and expectations. You can hire people with experience with AWS, Azure or GCP, but the more niche you go, the higher the possibility that some people you hire might not know how to work with those systems and their nuances, which is fine they can learn as they work, but that adds to ramp up time and could lead to inadvertent mistakes happening.
This could also be an anti-pattern for hiring - getting people with Amazing Web Service (tm) certification and missing out on candidates with a solid understanding of the foundational principles these services are built on
I agree, though the industry does this all the time by hiring someone with a degree vs someone who built key infrastructure and has no degree, solely because they have a degree. Remember, the creator of brew couldn't get past a Google interview because they asked him to hand craft some algorithm, I probably would have not done well with those either. Does that make him or me worse developers? Doubtful. Does it mean Google missed out on hiring someone who loves his craft? Yes.
I think that is often the perception, but is usually mistaken.
Smaller providers tend to have simpler systems so it only adds to ramp up time if you hire someone who only knows AWS or whatever. Simpler also means fewer mistakes.
If you stick to a simple set of services (e.g. VPS or containers + object storage) there are very few service specific nuances.
I've actually tried hetzner on and off with 1 server for the past 2 years and keep running into downtime every few months.
First I used an ex101 with an i9-13900. Within a week it just froze. It could not be reset remotely. Nothing in kern.log. Support offered no solution but a hard reboot. No mention of what might be wrong other than user error.
A few months later, one of the drives just disconnects from raid by itself. It took support 1 hour to respond and they said they found no issue so it must be my fault.
Then I changed to a ryzen based server and it also mysteriously had problems like this. Again the support blamed the user.
It was only after I cancelled the server and several months later that I see this so I know it isn't just me.
The good news is that we're just living in a perfect natural experiment:
Cloudflare just caused a massive internet outage costing millions of dollars worldwide, in part due to a very sloppy mistake that definitely ought to have been prevented (using Rust's “unwrap” in production ). Let's see how many customers they lose because of that and we'll see how big are their incentives. (If you look at the evolution of their share value, it doesn't look like the incident terrified their shareholders at least…)
>I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
That's an incredibly bad take lol.
There are times where "The Cloud" makes sense, sure. But in my experience the majority of the time companies over-use the cloud. On Prem is GOOD. It's cheaper, arguably more secure if you configure it right (a challenge, I know, but hear me out) and gives you data sovereignty.
I don't quite think companies realize how bad it would be if EG AWS was hacked.
Any Data you have on the cloud is no longer your data. Not really. It's Amazon, Microsoft, Apple, whoevers.
> I don't quite think companies realize how bad it would be if EG AWS was hacked.
I don't think they'd care. Companies only care about one thing: stock price. Everything rolls up into that. If AWS got hacked and said company was affected by it, it wouldn't be a big deal because they'd be one of many and they'd be lost in the crowd. Any hit to their stock/profits would be minimal and easily forgotten about.
Now, if they were on prem or hosted with Bob's Cloud and got hacked? Different story altogether.
> Companies only care about one thing: stock price.
Its rarely affected in any case. Take a look at the Crowdstrike price chart (or revenue or profits). I think most people (including investors) just take it for granted that systems are unreliable and regard it as something you live with.
I think that's more of a indicator that it hasn't effected their business. They lost nearly 1/5 of their stock price after that incident (obviously not accounting for other factors; I'm not a stock analyst). Investors thought they'd lose customers and reacted in obvious fashion.
But it's since been restored. According to the news, they lost very little customers over the incident. That is why their stock came back. If they continued having problems, I doubt it would have been so rosy. So yes, to your point, a blip here or there happens.
Configuring something on premises to match the capabilities of AWS or Azure or CloudFlare is very, very difficult and involves a lot of local money and expertise that often isn’t available at any affordable price.
>Configuring something on premises to match the capabilities of AWS or Azure or CloudFlare is very, very difficult and involves a lot of local money and expertise that often isn’t available at any affordable price.
A large number of cloud customers dont need the complexity that the cloud can offer. Like, yes, its hard to 1:1 feature replicate the cloud. But so many people just have some VMs and some routes.
> Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not.
I disagree because conversely, outages for larger providers cause millions or maybe even billions of dollars in losses for its customers. They might be more "stuck" in their current providers' proprietary schemes, but these kinds of losses will cause them to move away, or at least diversify cloud providers. In turn, this will cause income losses to the cloud provider.
It does mean that you get fewer services, you have to do more sysadmin internally or use other providers for those which a lot of people are very reluctant to do.
When forced to use AWS I only use the extra features I am specifically told to or that are already in use in order to make the system less tied to AWS and easier for me to manage (I am not an AWS specialist so its easier for me to just run stuff like I would on any server or VPS). I particularly dislike RDS (of things I have used). I like Lightsail because its reasonably priced and very like just getting a VPS.
S3 is something of an exception, but it does not tie you down (everyone provides block storage now, and you can use S3 even if everything else is somewhere else) for me if storing lots of large files that are not accessed very much (so egress fees are low).
My clients (extremely large) AWS based infrastructure experienced no downtime this year.
So, if it's based on some random person's clients, it's not clearly better at all.
I don't use cloud flare for anything, so no comment there.
>GP stated that their clients had experienced no downtime since switching at the start of the year
That's the least useful information.
What matters for his service availability is what he should expect going forward. What matters for reviewing his decision making process is what he should have expected at the time of choosing service providers.
Earlier this year, a Hetzner server I manage was shutdown, and after I started it via the console, it booted to a rescue system. In the same month, it was rebooted without a reason. There was some maintenance notice but the server was not listed as impacted.
Note that I'm not saying Hetzner is bad. Just incidents happen in Europe too. The server didn't have a lot of issues like this over the years.
They've recently introduced bunny.net Shield to add a security layer. I've not made use of it yet so I don't know what the coverage is like or how effective it is: https://bunny.net/shield/
I've done something similar, it's worth noting Scaleway in the same space, for people looking for an AWS replacement more like managed services (equivalents to fargate/lambda/sqs/s3/etc) instead of just bare instance hosting.
+1 for Scaleway. I also use Hetzner for most of my compute. But some stuff just really profits from using managed services. I‘ve used Scaleway‘s Serverless compute offers and managed DBs an been quite happy with them.
well they're not comparable to hetzner anymore, both in terms of features and price. only their dedibox brand could compare, as it's the classic hosting approach vs cloud.
for the hobby crowd it's a shame, for a corporation it's still cheaper than aws with the extra bonus of not having any tie to the us.
We are also looking to migrate off Cloudflare. I thought Bunny.net was mostly a pure CDN, not a reverse proxy like Cloudflare. Am I wrong? One of the most important things for us would be DDoS protection.
American solo developer here. Moved to Hetzner two months ago. They have servers in Oregon for west coast people. My storage box is in Germany but that is okay, it is for backups.
They are based in the UK. That is technically Europe, but I believe for privacy regulations it isn't the same as a EU-country, but I could be very wrong. Would love to be educated on this by someone.
I know you were joking, but responding in seriousness - while in general it's worthwhile asking "Quis custodiet ipsos custodes?", in this particular case, I don't see any issue with Down Detector detecting the Down Detector Down Detector. Assuming they are in different availability zones, using different code, with a different deployment cadence, this approach works quite well in practice.
haha — this is the exact comment i was hoping to see! indeed, i was joking. The Watchmen graphic novel is very important to me as it opened my eyes to the concept of “who watches the watchmen” which I was ultimately eluding to here, albeit extremely facetiously.
"To serve the Emperor. To protect His domains. To judge and stand guard over His subjects. To carry the Emperor's law to all worlds under His blessed protection. To pursue and punish those who trespassed against His word."
Three down detectors walk into a bar. The bartender asks them if they're all up. The first says "I don't know". The second says "I don't know". The third says "Yes".
Had to check, but that is actually beyond what DNS allows. Labels (the part between dots) are limited to 63 characters. We could sneakily drop an s somewhere in there and then it would fit.
It's a centralization vs decentralisation vs distributed system question.
Since down detectors serve to detect failures of centralized (and decentralized systems) the idea would be to at least get that right: a distributed system to detect outages.
You basically run detectors that heartbeat each others. Just a few suffice.
Once you start to see clusters of detectors go silent, you can assume things are falling apart, which is fine so long as a few remain.
Self healing also helps to make the web of nodes resilient to inevitable infrastructure failures.
Thank you for your service! Now, for an even bigger challenge: since it seems the increased demand for the Cloudflare status page brought down Amazon CloudFront for a bit as well, build a new CDN capable of handling that load as well...
But CDNs are made for static content so your comment means I can't run a dynamic website unless I have unlimited file descriptors and flawless connectivity.
"Need" is a strong word. But I think the point is that if you expect wildly spikey traffic/don't want the site to go down if it receives a very sudden influx of requests, going static is a very good answer, much cheaper than "serverless" or over-provisioning.
I think an important caveat here is that down detector was not actually down, the cloudflare human verification component was (AFAIK). I wonder if this downdetector down detector accounts for that aspect? It was technically "not down" but still unusable.
I have similar project like this: https://hostbeat.info/
More like t uptime robot and sure, I was really surprised yesterday how many alerts I have got and how many notifications were sent yesterday for this system users. Good work anyway
I feel like the classic East Dakota reply would be that cloud flare CDN does not host your data and merely proxies it (bonus points if he uses the words "mere conduit" in his reply and therefore cloud flare can't be held responsible yada yada).
I randomly started vibe coding a website monitoring tool last week knowing full well about the mature competitors in this space and questioning myself along the way. Doesn't seem so crazy now.
I made a picture of myself taking a picture of myself taking a picture of my self in a mirror... at some point I solved my halting problem and walked away.
the internet can be divided up into factions like Divergent. AWSubbies (orange), Azure-ants (blue), CloudFlaricons (black) & the Rogues (jester colors, like Google). A proper down detector would identify platform outages based on the number of faction members who are down.
I wonder though where is it hosted? Digital Ocean? :)
As the Web becomes more and more entangled, I don't know if there is any guarantee of what is really independent. We should make a diagram of this. Hopefully no cyclic dependencies there yet.
Cloudflare > Bunny.net
AWS > Hetzner
Business email > Infomaniak
Not a single client site has experienced downtime, and it feels great to finally decouple from U.S. services.
Hetzner provides a much simpler set of services than AWS. Less complexity to go wrong.
A lot of people want the brand recognition too. Its also become the standard way of doing things and is part of the business culture. I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
Are smaller scale services more reliable? I think that's too simple a question to be relevant. Sometimes yes, sometimes no, but we know one thing for sure - when smaller services go down the impact radius is contained. When a corrupt MBA who wants to pump short term metrics for a bonus gains power, the damage they can do is similarly contained. All risk factors are boxed in like this. With a hyperscale business, things are capable of going much more wrong for many more people, and the recursive nature of vertical+horizontal integration causes a calamity engine that can be hard to correct.
Take the financial sector in 08. Huge monoliths that had integrated every kind of financial service with every other kind of financial service. Few points of failure, every failure mode exposed to every other failure mode.
There's a reason asymmetric warfare is hard for both parties - cellular networks of small units that can act independently are extremely fault tolerant and robust against changing conditions. Giants, when they fall, do so in spectacular fashion.
If AWS goes down, no one will blame you for your web store being down as pretty much every other online service will be seeing major disruptions.
But when your super small provider goes down, it's now your problem and you better have some answers ready for your manager. And you'll still be affected by the AWS outage anyways as you probably rely on an API that runs on their cloud!
It's a "feature" right up there with planned obsolescence and garbage culture (the culture of throw-away).
The real problem is not having a fail-over provider. Modern software is so abstracted (tens, hundreds, even thousands of layers), and yet we still make the mistake of depending on one, two layers to make things "go".
When your one small provider goes down, no problem, switch over to your other provider. Then laugh at the people who are experiencing AWS downtime...
> Then laugh at the people who are experiencing AWS downtime...
Let's not stroke our egos too much here, mkay?
Hard disagree. A smaller provider will think twice about whether they use a Tier 1 data center versus a Tier IV data center because the cost difference is substantial and in many cases prohibitively expensive.
FlyIO (and Digital Ocean) had horrible up-time when they first got started. In the last 6-12 months, FlyIO been much better. But they would go down all the time or have unexpected CI bugs/changes.
Digital Ocean accidentally hard deleted user's object stores before their IPO.
Not to mention the familiarity of the company, its services and expectations. You can hire people with experience with AWS, Azure or GCP, but the more niche you go, the higher the possibility that some people you hire might not know how to work with those systems and their nuances, which is fine they can learn as they work, but that adds to ramp up time and could lead to inadvertent mistakes happening.
Smaller providers tend to have simpler systems so it only adds to ramp up time if you hire someone who only knows AWS or whatever. Simpler also means fewer mistakes.
If you stick to a simple set of services (e.g. VPS or containers + object storage) there are very few service specific nuances.
First I used an ex101 with an i9-13900. Within a week it just froze. It could not be reset remotely. Nothing in kern.log. Support offered no solution but a hard reboot. No mention of what might be wrong other than user error.
A few months later, one of the drives just disconnects from raid by itself. It took support 1 hour to respond and they said they found no issue so it must be my fault.
Then I changed to a ryzen based server and it also mysteriously had problems like this. Again the support blamed the user.
It was only after I cancelled the server and several months later that I see this so I know it isn't just me.
https://docs.hetzner.com/robot/dedicated-server/general-info...
However, I would say that the effect of this outage on customer retention will be (relatively) smaller than it would be for a smaller CDN.
Cloudflare just caused a massive internet outage costing millions of dollars worldwide, in part due to a very sloppy mistake that definitely ought to have been prevented (using Rust's “unwrap” in production ). Let's see how many customers they lose because of that and we'll see how big are their incentives. (If you look at the evolution of their share value, it doesn't look like the incident terrified their shareholders at least…)
That's an incredibly bad take lol.
There are times where "The Cloud" makes sense, sure. But in my experience the majority of the time companies over-use the cloud. On Prem is GOOD. It's cheaper, arguably more secure if you configure it right (a challenge, I know, but hear me out) and gives you data sovereignty.
I don't quite think companies realize how bad it would be if EG AWS was hacked.
Any Data you have on the cloud is no longer your data. Not really. It's Amazon, Microsoft, Apple, whoevers.
I don't think they'd care. Companies only care about one thing: stock price. Everything rolls up into that. If AWS got hacked and said company was affected by it, it wouldn't be a big deal because they'd be one of many and they'd be lost in the crowd. Any hit to their stock/profits would be minimal and easily forgotten about.
Now, if they were on prem or hosted with Bob's Cloud and got hacked? Different story altogether.
Its rarely affected in any case. Take a look at the Crowdstrike price chart (or revenue or profits). I think most people (including investors) just take it for granted that systems are unreliable and regard it as something you live with.
But it's since been restored. According to the news, they lost very little customers over the incident. That is why their stock came back. If they continued having problems, I doubt it would have been so rosy. So yes, to your point, a blip here or there happens.
A large number of cloud customers dont need the complexity that the cloud can offer. Like, yes, its hard to 1:1 feature replicate the cloud. But so many people just have some VMs and some routes.
This is fine if you are big. Anything smaller and it is a big problem.
This is also why they do not manage their plumbing and do not run z restaurant in house. Some things are better left to companies that do it at scale
I disagree because conversely, outages for larger providers cause millions or maybe even billions of dollars in losses for its customers. They might be more "stuck" in their current providers' proprietary schemes, but these kinds of losses will cause them to move away, or at least diversify cloud providers. In turn, this will cause income losses to the cloud provider.
This sounds like a good thing.
It does mean that you get fewer services, you have to do more sysadmin internally or use other providers for those which a lot of people are very reluctant to do.
S3 is something of an exception, but it does not tie you down (everyone provides block storage now, and you can use S3 even if everything else is somewhere else) for me if storing lots of large files that are not accessed very much (so egress fees are low).
You can use whatever infrastructure you want for whatever reason, but you may not have an accurate picture of the availability.
This may be true over a long enough timeframe, but GP stated that their clients had experienced no downtime since switching at the start of the year.
That is clearly better than both AWS and Cloudflare during that time.
I don't use cloud flare for anything, so no comment there.
Valid. I should have made it clear that I meant "clearly better from GP's perspective."
That's the least useful information.
What matters for his service availability is what he should expect going forward. What matters for reviewing his decision making process is what he should have expected at the time of choosing service providers.
Note that I'm not saying Hetzner is bad. Just incidents happen in Europe too. The server didn't have a lot of issues like this over the years.
Am I missing something or is bunny.net not actually a replacement for that?
for the hobby crowd it's a shame, for a corporation it's still cheaper than aws with the extra bonus of not having any tie to the us.
MailPace data is also hosted in the EU only
Ah yes, the place for RabbitMQ endpoints.
but who detects the down detector detecting the down detector detecting the down detector
Arbites.
Maybe distributed down detection?
I know there are people here perfectly capable of running with that idea and we might just see a distributed down detector announced on HN :)
https://youtu.be/DpMfP6qUSBo
It's downdetectorsdown all the way down.
https://downdetectorsdowndetectorsdowndetectorsdowndetectors...
https://datatracker.ietf.org/doc/html/rfc1035
Also I think I triggered a nice error log in domaintools just now. https://whois.domaintools.com/downdetectorsdowndetectorsdown...
From there, the "who's watching who?" can become mathematically interesting.
* https://en.wikipedia.org/wiki/Directed_Graph
Since down detectors serve to detect failures of centralized (and decentralized systems) the idea would be to at least get that right: a distributed system to detect outages.
You basically run detectors that heartbeat each others. Just a few suffice.
Once you start to see clusters of detectors go silent, you can assume things are falling apart, which is fine so long as a few remain.
Self healing also helps to make the web of nodes resilient to inevitable infrastructure failures.
Looks like it's hosted in London?
Downdetector was indeed down during the cf outage, but I think the index page was still returning 200 (although I didn't check).
Running a headless browser to take a screenshot to check would probably get you blocked by cf...
script.js calls `fetchStatus()`, which calls `generateMockStatus()` to get the statuses, which just makes up random response times:
Jokes aside, as far as I can tell, https://downdetectorsdowndetector.com/ is NOT using Cloudflare CDN/Proxy
https://downdetectorsdowndetector.com/ is NOT using Cloudflare SSL
However, selesti reports it uses cloudflare DNS?
https://checkforcloudflare.selesti.com/?q=https://downdetect...
https://downdetectorsdowndetector.com/ is using Cloudflare DNS!
Checked 8 global locations, found DNS entries for Cloudflare in 3
Found in: England, Russia, USA
Not found in: China, Denmark, Germany, Spain, Netherlands
So, naturally, the feature request is: who watches the watchmen? We need downdetectorsdowndetectorsdowndetector.com next.
It looks really nice, good job!
I wonder though where is it hosted? Digital Ocean? :)
As the Web becomes more and more entangled, I don't know if there is any guarantee of what is really independent. We should make a diagram of this. Hopefully no cyclic dependencies there yet.
So if any of the things you want to know is down is down, chances are this site will be too ;)
Yo dawg I hear you like downdetector so...