A/B mailing report very slow for databases with many mailings
The report of A/B mailings have become unusable for us, due to how long it takes to load.
When opening the page, a n/a
is displayed instead of each figure while waiting for an AJAX call to retrieve all the open/click/etc rates. In our set-up this call takes close to 5 minutes to complete.
I pinned it down to what seems (to me) to be a bug in the query optimizer of MySQL. For example, the query generated by civi to count the number of opens of a mailing looks like this:
SELECT COUNT(civicrm_mailing_event_opened.id) as opened
FROM civicrm_mailing_event_opened
INNER JOIN civicrm_mailing_event_queue
ON civicrm_mailing_event_opened.event_queue_id = civicrm_mailing_event_queue.id
INNER JOIN civicrm_mailing_job
ON civicrm_mailing_event_queue.job_id = civicrm_mailing_job.id
AND civicrm_mailing_job.is_test = 0
WHERE civicrm_mailing_job.mailing_id = 16574 AND civicrm_mailing_event_opened.time_stamp <= 20180807115642
MySQL 5.7.22 executes it with this query plan (I removed some columns for better readability):
+-------------+------------------------------+--------+-----------------------------------+---------------------------------------------+----------+
| select_type | table | type | key | ref | rows |
+-------------+------------------------------+--------+-----------------------------------+---------------------------------------------+----------+
| SIMPLE | civicrm_mailing | const | PRIMARY | const | 1 |
| SIMPLE | civicrm_mailing_job | ref | FK_civicrm_mailing_job_mailing_id | const | 10 |
| SIMPLE | civicrm_mailing_event_opened | ALL | NULL | NULL | 43351509 |
| SIMPLE | civicrm_mailing_event_queue | eq_ref | PRIMARY | civicrm_mailing_event_opened.event_queue_id | 1 |
+-------------+------------------------------+--------+-----------------------------------+---------------------------------------------+----------+
As you can see, it performs a full table scan of the opened
table before joining on the queue
table, which does make a lot of sense. For some reason MySQL seems to be put off by the check on the timestamp: if I remove it from the where clause I get a much more sensible query plan:
+----+-------------+------------------------------+-------+------------------------------------------------+--------------------------------+----------+
| id | select_type | table | type | key | ref | rows |
+----+-------------+------------------------------+-------+------------------------------------------------+--------------------------------+----------+
| 1 | SIMPLE | civicrm_mailing | const | PRIMARY | const | 1 |
| 1 | SIMPLE | civicrm_mailing_job | ref | FK_civicrm_mailing_job_mailing_id | const | 10 |
| 1 | SIMPLE | civicrm_mailing_event_queue | ref | FK_civicrm_mailing_event_queue_job_id | civicrm_mailing_job.id | 11450265 |
| 1 | SIMPLE | civicrm_mailing_event_opened | ref | FK_civicrm_mailing_event_opened_event_queue_id | civicrm_mailing_event_queue.id | 2 |
+----+-------------+------------------------------+-------+------------------------------------------------+--------------------------------+----------+
Another interesting fact is that Percona server 5.7.22 is not affected by this issue, it produces the optimum query plan in both cases. I don't have any other version of MySQL to test, if anyone has I'd be curious to see.
One solution I can see is to replace the INNER JOIN
s by STRAIGHT_JOIN
and reorder the table names. I tried it on our set-up by applying on the queries for open, click and bounce events: it seems to produce correct results and brought the report computation time from about 270 seconds down to 2.7 (2 orders of magnitude less!).
Is it acceptable to have such query optimizations in CiviCRM's code? I don't see any reason against it for that particular one, because even if new column / indexes were to be added in the future, the join order mailing -> job -> queue -> event
would still be the optimal one and so it makes sense to force it. By the way, the mailing table could be removed from the query, it is not used at all.