this post was submitted on 17 Aug 2023
4 points (100.0% liked)

Lemmy Server Performance

3 readers
1 users here now

Lemmy Server Performance

lemmy_server uses the Diesel ORM that automatically generates SQL statements. There are serious performance problems in June and July 2023 preventing Lemmy from scaling. Topics include caching, PostgreSQL extensions for troubleshooting, Client/Server Code/SQL Data/server operator apps/sever operator API (performance and storage monitoring), etc.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] RoundSparrow@lemmy.ml 2 points 1 year ago (2 children)

ok, experimenting on a massive test data set of over 5 million posts... this PostgreSQL works pretty well

SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
  (
     SELECT id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank <= 1000
;

This limits any one community to 1000 posts, picking the most recent created posts. This gives a way to age out older data in very active communities without removing any posts at all for small communities.

[–] RoundSparrow@lemmy.ml 1 points 1 year ago* (last edited 1 year ago) (1 children)

An even less-intrusive approach is to not add any new field to existing tables. Establish a reference table say called include_range. There is already an ENUM value for each sort type, so include_range table with these columns: sort_type ENUM, lowest_id BigInt, highest_id BigInt

Run a variation of this to populate that table:

FROM
  (
     SELECT id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank <= 1000

Against every sort order, including OLD. Capture only two BigInt results: the MIN(id) and the MAX(id) - that will give a range over the whole table. Then every SELECT on post_aggregates / post table includes a WHERE id >= lowest_id AND id <= highest_id

That would put in a basic sanity check that ages-out content, and it would be right against the primary key!

[–] RoundSparrow@lemmy.ml 1 points 1 year ago

A core design issue of either approach is that server operators can modify the building of this data without needing to modify or restart the lemmy_server Rust code.

[–] RoundSparrow@lemmy.ml 1 points 1 year ago

3 hours later... I put it into code and am experimenting with it. Some proof of concept results: https://github.com/LemmyNet/lemmy/files/12373819/auto_explain_list_post_community_0_18_4_dullbananas_with_inclusion_run0a.txt