Proposal - populate all requested smart groups at once
This is tied to #2480 (closed) but I wanted to split it off as it is related specifically to the existing load() mechanism and where I think we could tweak existing code slightly to make a function that would be more efficient to support APIv4
What currently happens
load() is called for each requested group in turn.
If you want contacts in 8 different groups is is called 8 times.
(take a moment to realise I'm not talking about populating all groups in this gl - only those groups specifically required for the action)
it follows these steps
- checks if it has already been loaded within this php process
- checks if the cache date on the civicrm_group is NULL or longer ago than the smart group cache timeout
- creates a temp table
- populates the temp table with the contents of the saved search
- acquires a mysql lock for the group - if not aquired it returns at this point
- removes records for those groups from the group_contact_cache table
- inserts rows from the temp table into the group_contact_cache table
- drops the temp table
- updates the group contact cache table.
- releases the lock
What I think we could do is have a new function
public function loadGroups($groupIds) {
}
which would start with ids as a list of groups to resolve but
- filter out any ids already loaded in this php process
- filter out any groups with expired caches
- attempt to acquire locks on each group, filter out any groups for which a lock cannot be aquired.
- create a temp table
- populate the temp table with the contents of each of the filtered groups (one by one still - we could test UNION later on if we want - out of scope)
- remove records from the group contact table for all the groups
- insert rows from the temp table for all the groups
- drop the temp table
- update the group contact cache table for all groups
- release all the locks
While this is potentially less php cycles the thing I'm really trying to get to is less inserts into the civicrm_group_contact cache table as these can clash with each other or with delete actions. There is a risk that the number of rows inserted at once would be too many (although this already exists & it's unclear whether this WOULD actually make it worse) - we might iterate through the temporary table only inserting say 50k rows per query if this turns out to be an issue in testing.
I think @mattwire developed the process whereby we build a temp table first and then insert and I believe it has been a big improvement. I think this would get us further. I would want to focus on testing / implementing it in apiv4 rather than everywhere but api v4 could then figure out what groups it wants and request that they be loaded before joining onto the group_contact_cache table (in practice LEFT JOIN civicrm_group_contact_cache UNION civicrm_group_contact if not all groups are smart groups).
I've also been pondering the (out of scope!!!) idea of really splitting out the temporary table building from the insert into smart group contact cache - perhaps the table could be durable & the query could use the temporary table itself and we could store the temporary table name & timestamp and somehow fire off a non-browser process to reap it into the group_contact_cache and drop the table.
@pfigel @colemanw @seamuslee @mattwire @totten @bgm @sluc23 @BjoernE