Deadlocks on acl_cache
One of the prevalent deadlocks we see is on the acl_contact_cache. I've found that I can replicate this locally fairly consistently just by running 2 sets of tests at once.
The thing that is somewhat odd about this is that we don't use ACLs at all so in fact the table is always empty. Notably the table is also extremely slow to truncate - despite being empty.
What happens The flow we have is
- api call to create a contact
- when the contact is created CRM_Contact_BAO_Contact_Utils::clearContactCaches() is called. It hits a deadlock because
-
it has an FK to contact.id (& in our case it has an FK and an index seemingly)
-
it is called a lot
- it retries and succeeds
- we then try to add an email - the add function fails on a foreign key constraint because the earlier deadlock rolled the contact create back
Proposals I think there are a number of things we can do & we should do some combo:
-
Make the delete function less expensive & less locky
-
a) Add an index on modified date - we are constantly deleting where modified_date < x so let's index it (done 5.22)
-
a.1) possibly add a warning for any sites missing the index in a - ie we have a system check but an early return was added
-
remove the index on acl_id - we ALSO have an FK so having both is probably harmless-but-confusing
-
alter the FK on contact ID to be an index not a foreign key. This is perhaps a confusing suggestion but it is also the one that will make a real dent in these deadlocks. Remember this is a cache table not a 'persistent' table. The impact of the rows outliving the existence of the contact for a little bit is basically none because we don't do ACL lookups on non-existent contacts (this is also potentially helpful for better handling 0 or anonymous contact_id where we've had to work around this before
-
Protect against conflict through mysql locks. I think we can use a mysql lock to avoid 2 processes attempting this flush at once. I'll do a PR for this
-
How often do we REALLY need to flush those caches. Maybe we could cache a 'last flushed' key & flush no more than every 30 seconds.
-
Consider checking if the table is empty before attempting to empty it.
-
Review the places where acl cache is called from & decide how necessary there are - here is the list:
CRM_ACL_BAO_Cache::resetCache CRM/Contact/BAO/Contact/Utils.php: CRM_ACL_BAO_Cache::resetCache(); CRM/Core/BAO/Cache.php: CRM_ACL_BAO_Cache::resetCache(); CRM/Utils/System.php: CRM_ACL_BAO_Cache::resetCache(); CRM/ACL/Page/EntityRole.php: CRM_ACL_BAO_Cache::resetCache(); CRM/ACL/Form/ACLBasic.php: CRM_ACL_BAO_Cache::resetCache(); CRM/ACL/Form/EntityRole.php: CRM_ACL_BAO_Cache::resetCache(); CRM/ACL/BAO/ACL.php: CRM_ACL_BAO_Cache::resetCache();
CRM_Contact_BAO_Contact_Utils::clearContactCaches() CRM/Contact/BAO/Contact/Utils.php: public static function clearContactCaches($isEmptyPrevNextTable = FALSE) { CRM/Contact/BAO/GroupContact.php: CRM_Contact_BAO_Contact_Utils::clearContactCaches(); CRM/Contact/BAO/GroupContact.php: CRM_Contact_BAO_Contact_Utils::clearContactCaches(); CRM/Contact/BAO/Contact.php: CRM_Contact_BAO_Contact_Utils::clearContactCaches(); CRM/Contact/Import/Form/Preview.php: CRM_Contact_BAO_Contact_Utils::clearContactCaches(TRUE); bin/cli.class.php: CRM_Contact_BAO_Contact_Utils::clearContactCaches();
[ ] A bigger task perhaps - but permit opportunistic flushing to be turned off in favour of a cron (per group contact cache)
Deadlock output
LATEST DETECTED DEADLOCK
2019-12-19 11:29:45 0x700010cd2000
*** (1) TRANSACTION:
TRANSACTION 34518715, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 11 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 7
MySQL thread id 571, OS thread handle 123145583353856, query id 709677 localhost 127.0.0.1 drupalcivi_lkznr updating
DELETE
FROM civicrm_acl_cache
WHERE modified_date IS NULL
OR (modified_date <= '20191218222444')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 143182 page no 3 n bits 80 index PRIMARY of table drupalcivi_lkznr
.civicrm_acl_cache
trx id 34518715 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 00003c14; asc < ;;
1: len 6; hex 0000020eb6ba; asc ;;
2: len 7; hex 3b000001442ac5; asc ; D* ;;
3: SQL NULL;
4: len 4; hex 00000002; asc ;;
5: SQL NULL;
*** (2) TRANSACTION:
TRANSACTION 34518714, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
66 lock struct(s), heap size 8400, 71 row lock(s), undo log entries 74
MySQL thread id 573, OS thread handle 123145584189440, query id 710022 localhost 127.0.0.1 drupalcivi_lkznr updating
UPDATE civicrm_contact SET contact_type = 'Organization' , contact_sub_type = NULL , email_greeting_id = NULL , postal_greeting_id = NULL , addressee_id = 3 , employer_id = 15055 WHERE ( civicrm_contact.id = 15055 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 143182 page no 3 n bits 80 index PRIMARY of table drupalcivi_lkznr
.civicrm_acl_cache
trx id 34518714 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;