Address CiviCRM Mailing table complexity - make queries easier & data more prunable
Issue
- on large databases the civicrm_mailing screen is slow-to-impossible to load as the query cannot get the status of the mailing from the civicrm_mailing table and must do a many-to-one join with DISTINCT to get the status of the mailing. This is not well suited to paging because of the many to 1 so many more records are retrieved than are needed
- the mailing job records are really just a form of queue management. ie their logic is temporary and related to active sending. Once the send has happened it should be possible to delete them - but the mailing_job records are the only way the civicrm_mailing_event_queue table links to Bounce entries
- the verp hash needs to be of limited length. While efforts were made in the past to reduce the length the ids of the values in the hash can get large enough to start to blow it up. It really doesn't need to have a queue_id AND a job_id along with the hash
- complexity complexity complexity. The mailing_job has fields like 'parent_id' and job_type' = 'child'. It's just confusing as well as bloated and unnecessary to refer to it outside it's 'real' function - tracking the actual sending
- I have some historical nervousness about the civimail system sending out stuff it shouldn't - I feel like great transparency about loading from civicrm_mailing & then processing would be possible with
Proposal
Stage 1
- adding the field mailing_id to civicrm_mailing_event_queue
- altering the verp calculation to stop including the job_id on new mailings(we would need to handle it's presence for a while in bounce processing)
- altering the job_id index in civicrm_event_queue to set to NULL on delete
- add columns for status_id, start_date, end_date to civicrm_mailing, update as appropriate.
- add a batch script to allow people to populate the columns above in their own time.
- use more efficient queries on sites that have fully migrated (eg. conditional on status_id being populated for all mailings then the civicrm_mailing dashboard could use a more performant query)
Handy query for getting row counts in these tables
``` SELECT 'civicrm_mailing' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing UNION SELECT 'civicrm_mailing_job' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_job UNION SELECT 'civicrm_mailing_event_queue' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_queue UNION SELECT 'civicrm_mailing_event_bounce' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_bounce UNION SELECT 'civicrm_mailing_event_confirm' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_confirm UNION SELECT 'civicrm_mailing_event_delivered' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_delivered UNION SELECT 'civicrm_mailing_event_forward' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_forward UNION SELECT 'civicrm_mailing_event_opened' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_opened UNION SELECT 'civicrm_mailing_event_reply' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_reply UNION SELECT 'civicrm_mailing_event_subscribe' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_subscribe UNION SELECT 'civicrm_mailing_event_trackable_url_open' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_trackable_url_open UNION SELECT 'civicrm_mailing_event_unsubscribe' as mailing_table, COUNT(*) as number_rows FROM civicrm_mailing_event_unsubscribe ```Stage 2
- deprecate the field civicrm_mailing.is_completed
- set up some processes around deleting mailing_job records once no longer needed. 8)remove the interim queries for the partially migrated mailing records
Stage 3 7) eventually the civicrm_mailing.is_completed column could go 8) stop handling the job id in the verp (probably after a few years) 9) add self-cleanup on civicrm_mailing_job records (how would we ideally clean them up once completed?) 10) migrate mailing_job to our queue system
Technical notes
The way in which mailing events link back to the mailing looks like (eg. for bounce)
- civicrm_mailing_event_bounce.event_queue_id => civicrm_mailing_event_queue.id
- civicrm_mailing_event_queue.job_id => civicrm_mailing_job.id
- civicrm_mailing_job.mailing_id => civicrm_mailing.id
A similar pattern is followed for civicrm_mailing_event_delivered civicrm_mailing_event_forward civicrm_mailing_event_opened civicrm_mailing_event_reply civicrm_mailing_event_subscribe
civicrm_mailing_event_confirm links back through civicrm_mailing_event_subscribe
Still confusing
How do civicrm_mailing_recipients & civicrm_mailing_event_queue relate to each other?**** UPDATE - it seems like mailing_event_recipients doesn't add much extra over civicrm_mailing_event_queue but it is used to serve 2 purposes
-
it fulfills the role of a temp table when generating the entries for civicrm_mailing_queue
-
it is used for include / exclude queries - this could be done by event_queue too but perhaps would be more blocking
It does feel like it doesn't add anything that just using queue wouldn't, if we pointed the various queries at queue
What is the status with this historic evilness #4385 (closed)