This is a topic that interests me a lot but there's a lot I find surprising since I finally started working with postgres dependent apps. Why for example is the id a good primary key? Joins are not uncommon, but I don't have anyone searching on id in my application and it is not even supposed to be user visible. I would think every possible user search would look at all partitions indexes if I did this instead of creation date.
Or you could just warehouse the daily data into something like ClickHouse and start fresh every day. It's built for this kind of workload and has demonstrated some absolutely insane analytical performance at massive scale. We're currently running it on an $170/month VPS, querying over 500+ billion rows daily without any issues. At that point, partitioning an ever-growing OLTP table starts looking like the harder problem.
For anyone interested in the topic I suggest reading about snowflake id [https://en.wikipedia.org/wiki/Snowflake_ID] or uuid7 the patterns from the article translate cleanly. The bigint is 64 bytes where uuid is 128. There are other caveats but its all about tradeoffs.
4 comments