I wanted to make it so that no matter what happens, you could sleep peacefully and not worry that the project suddenly becomes unavailable for any of the reasons, and then in a calm atmosphere slowly repair everything. In passing, I wanted to add the possibility of free horizontal scaling.
A classic recipe is to run multiple instances of the application on different servers and at the level above to direct the user to a working one. The synchronization point is a database, so it must have no less stability than the application itself.
Because databases and applications are run on different servers and there is a chance of random unavailability, it is necessary to achieve full correct data synchronization when restoring communications. For this, MongoDB has a special mechanism – at least 50% + 1 server is required to select a server as the main server. We also want to participate in this voting also servers with applications – due to the fact that initially there are only 2 servers for databases (50% + 1 from 2 are the same two, that is not enough for voting). An additional Arbiter runs on a backup server located outside the data center.
Total we have 2 database servers online, and 5 arbiters (the database server is also theirs).
Now, no matter what the database instance falls off, the other becomes the main one, and the fallen-down one goes into read-only until the connection is restored. After the restoration of the connection and complete synchronization, it again becomes the main one. If, for some reason, the second server and the backup server fall off, all databases go into read-only mode.
You also need monitoring (monitoring cluster health): that it is available, that the data is in place, that the backup is in order, and so on.
2) Application, Nodejs
Similar to the previous one – at least two different physical servers. All applications are equal, because The synchronization point is the data.
This can be achieved, because The applications themselves do not store any states, so even if one user joins different instances, this will not affect anything.
On each server there are at least 2 instances of the application that are monitored and balanced using pm2 & nginx.
At the top of this is performance monitoring.