Deadlocks on acl_contact_cache table

We are seeing the acl_contact_cache table locking up our entire system. The query has been seen running for over 6000 seconds on multiple occasions. Slowly, queries stack up behind this and other requests by other users to use acl_contact_cache stack up as well.

We had not previously set the max_execution_time on the database and that has now been set to 450 on the database, which prevents the long-running query from tangling up the entire system. However, the conflict appears to happen when two users are doing an operation that tries to populate the acl_contact_cache table at the same time.

Our database has almost 900,000 contacts, and almost 500,000 activities logged. So a user goes to search > find activities and leaves conditions blank, and this query occurs:

SELECT count(DISTINCT contact_a.id) as rowCount FROM civicrm_contact contact_a LEFT JOIN civicrm_activity_contact
ON ( civicrm_activity_contact.contact_id = contact_a.id ) LEFT JOIN civicrm_activity
ON ( civicrm_activity.id = civicrm_activity_contact.activity_id
AND civicrm_activity.is_deleted = 0 AND civicrm_activity.is_current_revision = 1 ) INNER JOIN civicrm_contact
ON ( civicrm_activity_contact.contact_id = civicrm_contact.id and civicrm_contact.is_deleted != 1 ) INNER JOIN civicrm_acl_contact_cache aclContactCache ON contact_a.id = aclContactCache.contact_id WHERE ( ( civicrm_activity.activity_type_id = 101 AND civicrm_activity.status_id = 9 ) OR ( civicrm_activity.activity_type_id IN (2,3,1,104,109,78,76,85,93,4,5,6,7,8,12,17,19,22,34,40,35,36,37,38,39,44,45,46,47,48,51,58,54,52,80,81,84,94,96,97,99,100,101,103,105,106) ) ) AND aclContactCache.user_id = 26114 AND contact_a.is_deleted = 0 AND (civicrm_activity.activity_type_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 17, 19, 22, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 58, 76, 77, 78, 80, 81, 82, 84, 85, 86, 87, 93, 94, 96, 97, 99, 100, 101, 102, 103, 104, 105, 106, 107, 109) AND civicrm_activity.`id` IN (SELECT `id` FROM `civicrm_activity` WHERE id IN (SELECT activity_id FROM civicrm_activity_contact WHERE contact_id IN (SELECT contact_id FROM civicrm_acl_contact_cache WHERE user_id = 26114) AND contact_id IN (SELECT `id` FROM `civicrm_contact` WHERE is_deleted != 1))));

6000 seconds later, the database is hung, with about 100 queries stuck behind it. Manually killing all processes that are trying to use the acl_contact_cache table will clear out the backlog of other queries.

The table is transitory, as after getting the results there is a truncate almost immediately after.

One proposed solution could be to individualize the acl_contact_cache table to create for each user, eg. acl_contact_cache_{contact_id} … This would allow the acl_contact_cache_XXX table to get locked and not adversely affect every other user on the system.

The system should do a check upon my first search and see if my acl_contact_cache_XXX table is built, if it is not, build it upon all the contacts I have access to (regardless of the particular search).

Additionally the acl_contact_cache_XXX table could persist for a pre-defined period of time (e.g. 1 hour) so it doesn't have to search everyone in the db each time. Same concept for civicrm_group_contact_cache. Could have major performance benefits.

Also finding a way to keep this table from being locked would likely require a reimagination of how the table is actually used. Another consideration would be the use of the MEMORY engine for this table, to permit faster population since its contents are transitory anyway

Thoughts?

@eileen @kcristiano

Similar issue on acl_cache table: #1486.

Edited Nov 01, 2021 by andyburns

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information