jerry

joined 1 year ago
MODERATOR OF
[–] jerry@fedia.io 3 points 2 days ago

Note that even once I get this fixed, there will inevitably be another problem crop up. Posting here is fine, but an email to jerry@infosec.exchange or a ping to @jerry@infosec.exchange (my mastodon account) would probably be faster (note, when federation breaks here, messages to @jerry@infosec.exchange wouldn't get through from fedia.io...

[–] jerry@fedia.io 3 points 2 days ago (2 children)

It’s almost caught up. My apologies. I am trying to get it fixed permanently.

[–] jerry@fedia.io 3 points 2 days ago (4 children)

:( I'm working on it

[–] jerry@fedia.io 2 points 3 days ago (7 children)

ok - the queues are processing again. I will work on a more permanent fix after dinner

[–] jerry@fedia.io 2 points 3 days ago
[–] jerry@fedia.io 5 points 4 days ago (10 children)

ok - rabbitmq started having prroblems with the delivery queue again. I got it going again. Those messages should be delivered soon.

[–] jerry@fedia.io 3 points 4 days ago
[–] jerry@fedia.io 1 points 6 days ago (1 children)

I am not 100% sure. Fedia.io is running on a beast of a server, and so long as it’s working correctly, it should be able to deliver it instantly. But that doesn’t mean that the receiving servers are able to consume and render them that fast.

[–] jerry@fedia.io 5 points 1 week ago

Once it is more reliable, then I’ll agree, but thank you

[–] jerry@fedia.io 2 points 1 week ago

there isn't an inteval per se - I don't yet know why it's not immediate. It's possible that the delay is on the receiving side - the server that fedia.io is quite substantial and unless there is some sort of bug, the processing should happen immediately.

[–] jerry@fedia.io 2 points 1 week ago (2 children)

Outbound federation was indeed broken. I fixed it, but there was a huge backlog the server had to work through, which took about 12 hours to complete. I just checked and everything appears to be working ok. I am going to create some automation that will detect and alert on this (or other issues) happening in the future.

[–] jerry@fedia.io 5 points 1 week ago (1 children)

Not entirely. It looks like the rabbit issue was only impacting one of the queues (“deliver”), though I would have expected that to impact things like microblog too. All I can say with clarity is that the instance was operating in a very unhealthy state.

The queue appears like it’ll take several hours to flush, but it’s working.

 

Hi all. Several of you have reported problems with fedia.io not federating with other instances correctly.

The cause is that rabbitmq crashed, but not all the way. It crashed to the point where new connections would timeout, but the service was still running such that it wouldn't auto restart. I will be creating some automation to detect that proactively and restart rabbitmq if/when it happens again.

 

We made some changes a few minute ago that we hope fixes the problem. There may be some other lingering issues, but I am hoping the voting problem is fixed now. Let me know below if you continue to see that problem.

 

Until I implement a better system to screen out spammers, I will be closing registrations on Fedia.io. That’s not what I want - I’d like for it to be available for legitimate accounts, but the spam is off the hook.

Anyone seeing this can send me an email (jerry@infosec.exchange) and I’ll get an account created for you in the mean time.

 

Hello everyone. Today, I moved fedia.io behind the Fastly CDN. This should make the site consistently fast for everyone, no matter where you are in the world. It'll also help with bandwidth usage and mitigate DDoS attacks.

There were a few hiccups as I set that up today - my apologies if you saw errors or broken images for a bit.
EDIT: I previously said that this was the first time mbin or kbin was put behind a CDN. That is incorrect. kbin.earth has been behind Cloudflare. Apologies.

 

Hi all. I've been having some problems keeping fedia.io running - at the moment, either the message workers or the php web server processes are dying after an hour or so and I have to restart everything. I have been working with the mbin team and installed some updates that we hoped would fix the problems, but no luck. I am going to work on a cron job to automatically restart things once an hour. The down side, is that you'll likely see some error 500's if you happen to hit it when the processes are restarting, but it should happen quickly and refreshing the page should make it work again.

 

Shortly after upgrading to Mbin 1.7.1-rc1, php ran out of workers. I dramatically increased the limit. It isn’t clear to me why that happened and if it’s related to the upgrade or just coincidental. My intuition is that it’s related, but I have no evidence.

 

Hello everyone. I just upgraded fedia.io to mbin 1.7.1-rc1. One of the notable changes is that mbin is deprecating mercure, which is the component that provided streaming updates. As such, you will have to refresh the web page to see new posts and comments.

 

The (relatively new) server that Fedia.io was running on, a Hetzner AX 162-R, died overnight. Hetzner tells me that the main board failed and had to be replaced. In the process of repairing, the raid set got corrupted and would no longer boot.

Every single AX 162 (R or M) I’ve rented from Hetzner has failed now at least once. This was the last one I had. It was on my to do list to move fedia.io to a Dell server with the same specs. I knew this was going to happen, but I didn’t get it done in time.

For those of you who have been following along, Fedia has been cursed from the beginning. The kbin software was a god damned disaster, and very fortunately the mbin team spent an incredible amount of time and patience to help me sort out the many problems, nearly all of which are fixed now.

Except for the random occurrences where federation breaks due to an as-yet-unknown bug, the main stability issue has been hardware. I have had excellent luck with Hetzner’s Dell servers, so I am hopeful that is now fixed as well. The challenge is that the Dell server is quite expensive ($350 per month) so I will be looking to find a more cost effective way to host fedia.io, given the very small number of active users.

 

I will be rehoming fedia.io to a less expensive server the afternoon of July 1 - exact timing is TBD. Downtime should last about 2 hours. The current server is quite expensive and donations are dwindling, which is normally ok, but I am losing my job and have to be a bit more frugal.

 

Yesterday, the fedia.io server locked up. I was able to reboot it remotely and it came up clean. After less than an hour, the server froze again. This happened several more times throughout the day. Unfortunately, there were no logs recording what happened, and nothing on the console - just frozen hardware.

I contacted Hetzner early this morning and they diagnosed the server as having a faulty motherboard. Hetzner replaced the board and rebooted the server, and so far the server has been stable.

I have had pretty bad luck with this particular model of server from Hetzner, so I do not have confidence that this won't happen again, and so will be looking to migrate to a different type of server that is hopefully more stable and less expensive (I am losing my job at the end of June, and so need to save all the cash I can).

 

Fedia.io had a few issues over the past 24 hours - sometimes working find till you click on certain posts, which result in an error 500, and other times just getting an error 500 no matter what.

The first issue I found is that amqproxy, which helps to reduce the load on the server between the queue runners that process incoming and outgoing posts and rabbitmq. I found this morning that amqproxy was consistently failing, despite there being no apparent problem. I bypassed amqproxy, since the server can handle the load fine without amqproxy. That seemed to work and things returned to normal. A few hours later, the site started responding with error 500 to nearly all requests. This happened because the database server ran our of connections. The 300 it was set to should have been plenty, but clearly it was not. I've set that to 3000 and so far, so good.

My apologies for the instability. I continue to learn the nuances here and will keep making the service more reliable as I go.

 

The server fedia.io had been running on started developing stability problems overnight from Thursday April 4 to Friday April 5. By Saturday (today), the system was completely unbootable. After attempting to resolve the hardware issue with Hetzner (the ISP) for about 6 hours, I gave up and moved the site to a new server. All that took quite a lot of time, during which fedia.io was not available. My apologies for this.

view more: next ›