this post was submitted on 01 Nov 2023
23 points (100.0% liked)
Programming
17402 readers
145 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Inform and throttle. Think about how your own computer works. If storage reaches its max capacity, you get a signal back saying “filesystem full” (or whatever), not “internal storage error”. If the CPU gets busy, it doesn’t crash; things start slowing down, queued up, prioritised (and many other complicated mechanisms I’m not across!).
You could borrow those ideas, come up with a way to implement the behaviour in your systems, then present them to whoever could allocate the time & money.
Another approach is try to get a small, resource-constrained version of the system running and hammer it by loading heaps of data like those customers do. How does it behave? What are the fatal errors and what can we deal with later?
That is exactly what we do. The problem is that as a managed service offering. It is on us to scale in response to these alerts.
I think people are misunderstanding my original post. When I say that customer cluster will go into stop writes, that does not mean it is not functional. It is an entirely intended function of the database so that no important data is lost or overwritten.
The problem is more organizational. It's that we have a 5 minute SLA to respond to these types of events and that they can happen at any random customer impulse.
I don't have a problem with customers that can correctly project their load and let us know in advance. Those are my favorite customers. But they're not most of our customers.
As for automation. As I had exhaustedly detailed in another response, we do have another product that does this a lot better. And it's the one that we are mass marketing a lot more. The one where I'm feeling all the pain is actually our enterprise level managed service offering. Which goes to customers that have "special requirements" and usually mean that they will never get as robust automation as the other product line.