mastodon.social - Degrated performance – Incident details

All systems operational

Degrated performance

Resolved
Degraded performance
Started about 1 year agoLasted about 2 months

Affected

Website & API

Degraded performance from 3:19 PM to 2:52 PM

Background queues

Degraded performance from 3:19 PM to 2:52 PM

Updates
  • Resolved
    Resolved

    Performance has improved after upgrading various OS packages, and upgrading kubernetes to more recent versions with performance improvements. Additional work is still ongoing to identify other areas where performance can be improved, but current performance is now better than it was.

  • Identified
    Identified

    The situation is now stable and services are working, but the performance is not back to the previous levels.

    The problem seems to be tied to the kernel version we are running, which may not have full support for the new hardware we deployed.

    We are working on updating our systems to a newer OS version and tuning the various system settings to ensure everything goes back to expected levels of performance.

  • Update
    Update

    Definite cause of the degradation has yet to be identified, however moving to more powerful nodes for the time being seems to have mitigated the issue. Sidekiq queues are still behind, but are no longer growing, and additional workers have been deployed to help it catch up.

    A ticket has been submitted with Hetzner to see if this could be an issue on their side. Waiting to hear back.

  • Investigating
    Investigating

    There appears to be degraded performance after a move away from deprecated Hetzner nodes types. Currently investigating.