Implement reconnect/replay-on-write for database connections
Currently, CiviCRM always connects to a singular DSN. For epic:ro-db, we seek compatibility with a split DB architecture in which one routes MySQL requests to (a) read-only slave DBs and/or (b) read-write master DB. This issue specifically proposes a "reconnect-on-write" or "replay-on-write" (RPOW) mechanism as a general, global baseline.
Technical Overview
One possible technique is to sprinkle flags/hints into the application to indicate which use-cases should be served by slave/rodb or by master/rwdb. Some of this sprinkling is likely happen, but we have a large, open-ended application (with several built-in subsystems and several third-party extensions/addons). Auditing all of these is somewhat daunting task.
RPOW aims to provide a generic baseline that relies primarily on MySQL semantics (and doesn't require auditing every use-case carefully). The general idea is:
- Connect optimistically to the read-only slave (expecting a read-only use-case). We can continue using the RODB as long as requests are read-oriented (e.g.
SELECT
). - If there is an actual write operation (e.g.
UPDATE
), then reconnect to the read-write master.
Dynamically switching to the read-write master is not quite as simple as it sounds:
-
Some SQL statements (eg
SET @foo
andCREATE TEMPORARY TABLE
) can be legitimately used in a read-oriented operation (e.g. advanced querying/reporting) -- but they may also be prelude to a write-operation (e.g. building a temp-table with a list of targets and then updating each one). We allow these to execute on the RODB -- but, when/if we reconnect to RWDB, then we replay those statements. -
To support replay, we must be able to classify any SQL statement into one of three buckets:
-
READ
(Ex:SELECT * FROM foo
): The SQL statement has no side-effects; it simply reads data. -
BUFFER
(Ex:SET @user_id = 123
): The SQL statement has no long-term, persistent side-effects; it can, however, have temporary side-effects during the present MySQL session. -
WRITE
(Ex:TRUNCATE foo
): The SQL statement has long-term, persistent side-effects and must be executed on the master. (Generally, if we can't demonstrate that something isREAD
orBUFFER
, then we assume it isWRITE
.)
-
-
The MySQL query language has interesting edge-cases (e.g.
SELECT @foo := id FROM bar FOR UPDATE
) that should be handled correctly. -
We don't know anything about the delay in sync'ing between RODB and RWDB. After making a write, we'll continue sending all requests (reads or writes) to RWDB for some period of time. (Ex: After updating a contact in RWDB, the user's browser gets a cookie -- and, for the next 60 seconds, any additional reads should hit RWDB.)
Limitations and Assumptions
- RPOW makes sense if user's are primarily reading from MySQL. Stock CiviCRM relies extensively on MySQL for caching and session-state, which leads to frequent writes. However, if you use the Redis integration for caches/sessions/prevnext, then this is significantly reduced.
- RPOW aims to mitigate/reduce the need for sprinkling use-case specific hints. However, there may still be scenarios where one wants to sprinkle hints. In particular: the contact-edit screen uses optimistic-locking (which reads the last-modified timestamp before authorizing updates); for correct oplocking, there should be a hint that any POST requests to the contact-edit screen need the RWDB.
Relevant Tasks / Patches / Subtasks
The following are types of patches / subtasks we may expect:
- Adding a new DB driver -- an admin can opt-in to using RPOW behavior by setting
CIVICRM_DSN
to a special value (and registering DSNs for both RODBs and RWDB). - Adding a unit-test to ensure correct classification of a range of SQL examples.
- Maintaining a cookie or session-variable to indicate that a user needs to be temporarily directed to RWDB by default.
- Updating existing use-cases to reduce gratuitous writes or to send hints about specific write-oriented use-cases.
NOTE: I've currently got a draft project in https://github.com/totten/rpow ; however, I'm filing this issue here because some of this work will need to come back into core.
Pull Requests
- #13394 - Reduce unnecessary SQL writes (m)
- #13500 - (REF) Add CRM_Utils_Cache::nack(). Use it for NaiveHasTrait (m)
- #13496 - Implement local array-cache for use with Redis/Memcache (m)
- #13489 - Deprecate CRM_Core_BAO_Cache for I/O. Optionally redirect I/O to Redis or Memcache. #13489 (m)
- #13514 - CRM_Utils_Cache::nack() - Fix format (m)