Implement reconnect/replay-on-write for database connections

Currently, CiviCRM always connects to a singular DSN. For epic:ro-db, we seek compatibility with a split DB architecture in which one routes MySQL requests to (a) read-only slave DBs and/or (b) read-write master DB. This issue specifically proposes a "reconnect-on-write" or "replay-on-write" (RPOW) mechanism as a general, global baseline.

Technical Overview

One possible technique is to sprinkle flags/hints into the application to indicate which use-cases should be served by slave/rodb or by master/rwdb. Some of this sprinkling is likely happen, but we have a large, open-ended application (with several built-in subsystems and several third-party extensions/addons). Auditing all of these is somewhat daunting task.

RPOW aims to provide a generic baseline that relies primarily on MySQL semantics (and doesn't require auditing every use-case carefully). The general idea is:

Connect optimistically to the read-only slave (expecting a read-only use-case). We can continue using the RODB as long as requests are read-oriented (e.g. SELECT).
If there is an actual write operation (e.g. UPDATE), then reconnect to the read-write master.

Dynamically switching to the read-write master is not quite as simple as it sounds:

Some SQL statements (eg SET @foo and CREATE TEMPORARY TABLE) can be legitimately used in a read-oriented operation (e.g. advanced querying/reporting) -- but they may also be prelude to a write-operation (e.g. building a temp-table with a list of targets and then updating each one). We allow these to execute on the RODB -- but, when/if we reconnect to RWDB, then we replay those statements.
To support replay, we must be able to classify any SQL statement into one of three buckets:
- READ (Ex: SELECT * FROM foo): The SQL statement has no side-effects; it simply reads data.
- BUFFER (Ex: SET @user_id = 123): The SQL statement has no long-term, persistent side-effects; it can, however, have temporary side-effects during the present MySQL session.
- WRITE (Ex: TRUNCATE foo): The SQL statement has long-term, persistent side-effects and must be executed on the master. (Generally, if we can't demonstrate that something is READ or BUFFER, then we assume it is WRITE.)
The MySQL query language has interesting edge-cases (e.g. SELECT @foo := id FROM bar FOR UPDATE) that should be handled correctly.
We don't know anything about the delay in sync'ing between RODB and RWDB. After making a write, we'll continue sending all requests (reads or writes) to RWDB for some period of time. (Ex: After updating a contact in RWDB, the user's browser gets a cookie -- and, for the next 60 seconds, any additional reads should hit RWDB.)

Limitations and Assumptions

RPOW makes sense if user's are primarily reading from MySQL. Stock CiviCRM relies extensively on MySQL for caching and session-state, which leads to frequent writes. However, if you use the Redis integration for caches/sessions/prevnext, then this is significantly reduced.
RPOW aims to mitigate/reduce the need for sprinkling use-case specific hints. However, there may still be scenarios where one wants to sprinkle hints. In particular: the contact-edit screen uses optimistic-locking (which reads the last-modified timestamp before authorizing updates); for correct oplocking, there should be a hint that any POST requests to the contact-edit screen need the RWDB.

Relevant Tasks / Patches / Subtasks

The following are types of patches / subtasks we may expect:

Adding a new DB driver -- an admin can opt-in to using RPOW behavior by setting CIVICRM_DSN to a special value (and registering DSNs for both RODBs and RWDB).
Adding a unit-test to ensure correct classification of a range of SQL examples.
Maintaining a cookie or session-variable to indicate that a user needs to be temporarily directed to RWDB by default.
Updating existing use-cases to reduce gratuitous writes or to send hints about specific write-oriented use-cases.

NOTE: I've currently got a draft project in https://github.com/totten/rpow ; however, I'm filing this issue here because some of this work will need to come back into core.

Pull Requests

Edited Mar 20, 2019 by totten

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information