Contact export is very slow, creates huge temporary tables
Overview
When exporting contacts with the default output fields, a query is performed which can generate huge temporary files, even for databases that are not very greatly populated. I was in fact alerted to this problem when an export of only 80 contacts was performed that maxed out our 40GB of spare disk space and crashed the entire server. The following is a study of this single query, which I'll call Query1, and how it can be optimized to eliminate the problem.
Exponential Growth of Temporary Tables
The issue is caused by the joins of multiple tables, each having multiple rows for each of the exported contacts, combined with the fact MySQL creates temporary tables if the queries contain GROUP_CONCAT (among other causes). Although the query returns only one row per exported contact, the temporary table that is built in the process has a row count which is a sum of products.
e.g. Assume a join of tables A, B and C and a WHERE clause which includes N contacts. The temporary table generated will contain
(A1*B1*C1) + (A2*B2*C2) … + (AN*BN*CN)
rows, where A1, for example, is the number of rows in table A for contact 1. Each additional exported field having multiple rows per contact causes exponential growth in the temporary table size, while the number of exported contacts increases the table size arithmetically.
A Tale of Two Contacts
To illustrate a real-world case, the row counts for the tables of interest of two contacts in my organization are as follows:
Table | Contact_2 | Contact_1016 |
---|---|---|
civicrm_group_contact | 91 | 39 |
civicrm_group_contact_cache | 30 | 17 |
civicrm_tag | 18 | 6 |
civicrm_note | 6 | 11 |
civicrm_email | 4 | 3 |
civicrm_phone | 2 | 4 |
civicrm_address | 2 | 3 |
PRODUCT | 4,717,440 | 1,575,288 |
SUM OF PRODUCTS | ... | 6,292,728 |
Query1, shown along with its slow query log in Queries, was captured from an export operation with default output fields of these two contacts. That query joins each of these tables, among others. "SUM OF PRODUCTS" indicates the number of rows in the resulting temporary table.
An export of 2000 similar contacts rather than just these 2 would be expected to generate a temporary table 1000 times this size (6 billion rows)!
Query Optimization
Query1 was modified to create the functionally equivalent Query2 (also shown with its slow query log in Queries), employing three subqueries to eliminate the product terms due to the first three DB tables shown above. Performance was vastly improved as shown in the following table:
Parameters from Slow Query Log | Query1 | Query2 |
---|---|---|
Execution Time (S) | 67 | 0.38 |
Temp file size (MB) | 4,473 | 30 |
Rows Examined | 29,579,813 | 138,786 |
Just to be clear, these measurements were made running the query for only the two contacts described above.
Recommended Action
When building queries for export operations, subqueries should be used in the join clause for any table which might return more that one row per contact to ensure that it contributes only one row per contact to the temporary table. At the very least, this should be done for those fields that might contain numerous rows per contact, such as groups (smart and non-smart), tags, and notes.
Additional Issues with Contact Export
I would like additionally to note the following secondary issues with contact export:
- Smart groups are not rebuilt upon an export with the 'groups' field. If civicrm_group_contact_cache has been cleared (e.g. due to adding a contact), before running an export, the export will not include any smart groups in the ‘groups’ column.
- The groups, tags and notes filed are each truncated to a length of 1024 characters due to the default limitation of GROUP_CONCAT. This limit could be increased by executing “SET SESSION group_concat_max_len = 100000;”, for example.
- With the default output fields, is_primary is not specified in the email/phone/address join clauses. The results returned are therefore indeterminate for each of these fields when more than one row exists per contact.
Test Environment
- Query1 was captured using CiviCRM 5.24.3.
- To eliminate tainted timing measurements due to possible throttling or over-allocation by my VPS service provider, the above measurements were made on a freshly built Ubuntu 18.04 virtual machine on my local PC where only MariaDB 10.4 was installed (no Apache, PHP, CMS), along with a copy of my site’s Drupal database. Increasing RAM to 12GB from 8GB and parameter tuning using MySQLTuner led to only modest improvements in the execution speed.
References
Queries
Query1: Slow query log and SQL
# Thread_id: 53 Schema: drupal QC_hit: No
# Query_time: 67.547173 Lock_time: 0.000418 Rows_sent: 2 Rows_examined: 29579813
# Rows_affected: 0 Bytes_sent: 12505
# Tmp_tables: 5 Tmp_disk_tables: 2 Tmp_table_sizes: 4472889344
# Full_scan: No Full_join: Yes Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
SELECT
contact_a.id AS contact_id,
contact_a.contact_type AS `contact_type`,
contact_a.contact_sub_type AS `contact_sub_type`,
contact_a.sort_name AS `sort_name`,
contact_a.display_name AS `display_name`,
contact_a.do_not_email AS `do_not_email`,
contact_a.do_not_phone AS `do_not_phone`,
contact_a.do_not_mail AS `do_not_mail`,
contact_a.do_not_sms AS `do_not_sms`,
contact_a.do_not_trade AS `do_not_trade`,
contact_a.is_opt_out AS `is_opt_out`,
contact_a.legal_identifier AS `legal_identifier`,
contact_a.external_identifier AS `external_identifier`,
contact_a.nick_name AS `nick_name`,
contact_a.legal_name AS `legal_name`,
contact_a.image_URL AS `image_URL`,
contact_a.preferred_communication_method AS `preferred_communication_method`,
contact_a.preferred_language AS `preferred_language`,
contact_a.preferred_mail_format AS `preferred_mail_format`,
contact_a.hash AS `hash`,
contact_a.source AS `contact_source`,
contact_a.first_name AS `first_name`,
contact_a.middle_name AS `middle_name`,
contact_a.last_name AS `last_name`,
contact_a.prefix_id AS `prefix_id`,
contact_a.suffix_id AS `suffix_id`,
contact_a.formal_title AS `formal_title`,
contact_a.communication_style_id AS `communication_style_id`,
contact_a.email_greeting_id AS email_greeting_id,
contact_a.postal_greeting_id AS postal_greeting_id,
contact_a.addressee_id AS addressee_id,
contact_a.job_title AS `job_title`,
contact_a.gender_id AS `gender_id`,
contact_a.birth_date AS `birth_date`,
contact_a.is_deceased AS `is_deceased`,
contact_a.deceased_date AS `deceased_date`,
contact_a.household_name AS `household_name`,
IF(contact_a.contact_type = 'Individual',
NULL,
contact_a.organization_name) AS organization_name,
contact_a.sic_code AS `sic_code`,
contact_a.user_unique_id AS `user_unique_id`,
contact_a.employer_id AS `current_employer_id`,
contact_a.is_deleted AS `contact_is_deleted`,
contact_a.created_date AS `created_date`,
contact_a.modified_date AS `modified_date`,
contact_a.addressee_display AS addressee_display,
contact_a.addressee_custom AS addressee_custom,
contact_a.email_greeting_display AS email_greeting_display,
contact_a.email_greeting_custom AS email_greeting_custom,
contact_a.postal_greeting_display AS postal_greeting_display,
contact_a.postal_greeting_custom AS postal_greeting_custom,
IF(contact_a.contact_type = 'Individual',
contact_a.organization_name,
NULL) AS current_employer,
civicrm_address.id AS address_id,
civicrm_location_type.id AS location_type_id,
civicrm_location_type.name AS `location_type`,
civicrm_address.street_address AS `street_address`,
civicrm_address.street_number AS `street_number`,
civicrm_address.street_number_suffix AS `street_number_suffix`,
civicrm_address.street_name AS `street_name`,
civicrm_address.street_unit AS `street_unit`,
civicrm_address.supplemental_address_1 AS `supplemental_address_1`,
civicrm_address.supplemental_address_2 AS `supplemental_address_2`,
civicrm_address.supplemental_address_3 AS `supplemental_address_3`,
civicrm_address.city AS `city`,
civicrm_address.postal_code_suffix AS `postal_code_suffix`,
civicrm_address.postal_code AS `postal_code`,
civicrm_address.geo_code_1 AS `geo_code_1`,
civicrm_address.geo_code_2 AS `geo_code_2`,
civicrm_address.manual_geo_code AS `manual_geo_code`,
civicrm_address.name AS `address_name`,
civicrm_address.master_id AS `master_id`,
civicrm_address.county_id AS county_id,
civicrm_address.state_province_id AS state_province_id,
civicrm_address.country_id AS country_id,
civicrm_phone.id AS phone_id,
civicrm_phone.phone_type_id AS `phone_type_id`,
civicrm_phone.phone AS `phone`,
civicrm_phone.phone_ext AS `phone_ext`,
civicrm_email.id AS email_id,
civicrm_email.email AS `email`,
civicrm_email.on_hold AS `on_hold`,
civicrm_email.is_bulkmail AS `is_bulkmail`,
civicrm_email.signature_text AS `signature_text`,
civicrm_email.signature_html AS `signature_html`,
civicrm_im.id AS im_id,
civicrm_im.provider_id AS `im_provider`,
civicrm_im.provider_id AS provider_id,
civicrm_im.name AS `im`,
civicrm_openid.id AS openid_id,
civicrm_openid.openid AS `openid`,
civicrm_worldregion.id AS worldregion_id,
civicrm_worldregion.name AS `world_region`,
civicrm_website.id AS website_id,
civicrm_website.url AS `url`,
CONCAT_WS(',',
GROUP_CONCAT(DISTINCT IF(civicrm_group_contact.status = 'Added',
civicrm_group_contact.group_id,
'')),
GROUP_CONCAT(DISTINCT civicrm_group_contact_cache.group_id)) AS `groups`,
GROUP_CONCAT(DISTINCT (civicrm_tag.name)) AS tags,
GROUP_CONCAT(DISTINCT (civicrm_note.note)) AS notes
FROM
civicrm_contact contact_a
LEFT JOIN
civicrm_address ON (contact_a.id = civicrm_address.contact_id)
LEFT JOIN
civicrm_country ON (civicrm_address.country_id = civicrm_country.id)
LEFT JOIN
civicrm_email ON (contact_a.id = civicrm_email.contact_id)
LEFT JOIN
civicrm_phone ON (contact_a.id = civicrm_phone.contact_id)
LEFT JOIN
civicrm_im ON (contact_a.id = civicrm_im.contact_id)
LEFT JOIN
civicrm_openid ON (civicrm_openid.contact_id = contact_a.id)
LEFT JOIN
civicrm_location_type ON civicrm_address.location_type_id = civicrm_location_type.id
LEFT JOIN
civicrm_group_contact ON contact_a.id = civicrm_group_contact.contact_id
LEFT JOIN
civicrm_group_contact_cache ON contact_a.id = civicrm_group_contact_cache.contact_id
LEFT JOIN
civicrm_entity_tag ON (civicrm_entity_tag.entity_table = 'civicrm_contact'
AND civicrm_entity_tag.entity_id = contact_a.id)
LEFT JOIN
civicrm_note ON (civicrm_note.entity_table = 'civicrm_contact'
AND contact_a.id = civicrm_note.entity_id)
LEFT JOIN
civicrm_worldregion ON civicrm_country.region_id = civicrm_worldregion.id
LEFT JOIN
civicrm_group_contact `civicrm_group_contact-5e9e3db3d6d9a` ON (contact_a.id = `civicrm_group_contact-5e9e3db3d6d9a`.contact_id
AND `civicrm_group_contact-5e9e3db3d6d9a`.status IN ('Added'))
LEFT JOIN
civicrm_tag ON civicrm_entity_tag.tag_id = civicrm_tag.id
LEFT JOIN
civicrm_website ON contact_a.id = civicrm_website.contact_id
WHERE
(((`civicrm_group_contact-5e9e3db3d6d9a`.group_id IN ('581'))))
AND (contact_a.is_deleted = 0)
AND contact_a.is_deleted != 1
AND contact_a.id IN (2,1016)
GROUP BY contact_a.id
LIMIT 0 , 100000;
Query2: Slow query log and SQL
# Thread_id: 53 Schema: drupal QC_hit: No
# Query_time: 0.382351 Lock_time: 0.000376 Rows_sent: 2 Rows_examined: 138786
# Rows_affected: 0 Bytes_sent: 12525
# Tmp_tables: 8 Tmp_disk_tables: 5 Tmp_table_sizes: 30154752
# Full_scan: No Full_join: Yes Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
SELECT
contact_a.id AS contact_id,
contact_a.contact_type AS `contact_type`,
contact_a.contact_sub_type AS `contact_sub_type`,
contact_a.sort_name AS `sort_name`,
contact_a.display_name AS `display_name`,
contact_a.do_not_email AS `do_not_email`,
contact_a.do_not_phone AS `do_not_phone`,
contact_a.do_not_mail AS `do_not_mail`,
contact_a.do_not_sms AS `do_not_sms`,
contact_a.do_not_trade AS `do_not_trade`,
contact_a.is_opt_out AS `is_opt_out`,
contact_a.legal_identifier AS `legal_identifier`,
contact_a.external_identifier AS `external_identifier`,
contact_a.nick_name AS `nick_name`,
contact_a.legal_name AS `legal_name`,
contact_a.image_URL AS `image_URL`,
contact_a.preferred_communication_method AS `preferred_communication_method`,
contact_a.preferred_language AS `preferred_language`,
contact_a.preferred_mail_format AS `preferred_mail_format`,
contact_a.hash AS `hash`,
contact_a.source AS `contact_source`,
contact_a.first_name AS `first_name`,
contact_a.middle_name AS `middle_name`,
contact_a.last_name AS `last_name`,
contact_a.prefix_id AS `prefix_id`,
contact_a.suffix_id AS `suffix_id`,
contact_a.formal_title AS `formal_title`,
contact_a.communication_style_id AS `communication_style_id`,
contact_a.email_greeting_id AS email_greeting_id,
contact_a.postal_greeting_id AS postal_greeting_id,
contact_a.addressee_id AS addressee_id,
contact_a.job_title AS `job_title`,
contact_a.gender_id AS `gender_id`,
contact_a.birth_date AS `birth_date`,
contact_a.is_deceased AS `is_deceased`,
contact_a.deceased_date AS `deceased_date`,
contact_a.household_name AS `household_name`,
IF(contact_a.contact_type = 'Individual',
NULL,
contact_a.organization_name) AS organization_name,
contact_a.sic_code AS `sic_code`,
contact_a.user_unique_id AS `user_unique_id`,
contact_a.employer_id AS `current_employer_id`,
contact_a.is_deleted AS `contact_is_deleted`,
contact_a.created_date AS `created_date`,
contact_a.modified_date AS `modified_date`,
contact_a.addressee_display AS addressee_display,
contact_a.addressee_custom AS addressee_custom,
contact_a.email_greeting_display AS email_greeting_display,
contact_a.email_greeting_custom AS email_greeting_custom,
contact_a.postal_greeting_display AS postal_greeting_display,
contact_a.postal_greeting_custom AS postal_greeting_custom,
IF(contact_a.contact_type = 'Individual',
contact_a.organization_name,
NULL) AS current_employer,
civicrm_address.id AS address_id,
civicrm_location_type.id AS location_type_id,
civicrm_location_type.name AS `location_type`,
civicrm_address.street_address AS `street_address`,
civicrm_address.street_number AS `street_number`,
civicrm_address.street_number_suffix AS `street_number_suffix`,
civicrm_address.street_name AS `street_name`,
civicrm_address.street_unit AS `street_unit`,
civicrm_address.supplemental_address_1 AS `supplemental_address_1`,
civicrm_address.supplemental_address_2 AS `supplemental_address_2`,
civicrm_address.supplemental_address_3 AS `supplemental_address_3`,
civicrm_address.city AS `city`,
civicrm_address.postal_code_suffix AS `postal_code_suffix`,
civicrm_address.postal_code AS `postal_code`,
civicrm_address.geo_code_1 AS `geo_code_1`,
civicrm_address.geo_code_2 AS `geo_code_2`,
civicrm_address.manual_geo_code AS `manual_geo_code`,
civicrm_address.name AS `address_name`,
civicrm_address.master_id AS `master_id`,
civicrm_address.county_id AS county_id,
civicrm_address.state_province_id AS state_province_id,
civicrm_address.country_id AS country_id,
civicrm_phone.id AS phone_id,
civicrm_phone.phone_type_id AS `phone_type_id`,
civicrm_phone.phone AS `phone`,
civicrm_phone.phone_ext AS `phone_ext`,
civicrm_email.id AS email_id,
civicrm_email.email AS `email`,
civicrm_email.on_hold AS `on_hold`,
civicrm_email.is_bulkmail AS `is_bulkmail`,
civicrm_email.signature_text AS `signature_text`,
civicrm_email.signature_html AS `signature_html`,
civicrm_im.id AS im_id,
civicrm_im.provider_id AS `im_provider`,
civicrm_im.provider_id AS provider_id,
civicrm_im.name AS `im`,
civicrm_openid.id AS openid_id,
civicrm_openid.openid AS `openid`,
civicrm_worldregion.id AS worldregion_id,
civicrm_worldregion.name AS `world_region`,
civicrm_website.id AS website_id,
civicrm_website.url AS `url`,
/*
CONCAT_WS(',',
GROUP_CONCAT(DISTINCT IF(civicrm_group_contact.status = 'Added', civicrm_group_contact.group_id, '')),
GROUP_CONCAT(DISTINCT civicrm_group_contact_cache.group_id)) AS `groups`,
*/
CONCAT_WS(',', sel_group_list.group_list, sel_smartgroup_list.group_list) `groups`,
#GROUP_CONCAT(DISTINCT (civicrm_tag.name)) AS tags,
tags,
GROUP_CONCAT(DISTINCT (civicrm_note.note)) AS notes
FROM
civicrm_contact contact_a
LEFT JOIN
civicrm_address ON (contact_a.id = civicrm_address.contact_id)
LEFT JOIN
civicrm_country ON (civicrm_address.country_id = civicrm_country.id)
LEFT JOIN
civicrm_email ON (contact_a.id = civicrm_email.contact_id)
LEFT JOIN
civicrm_phone ON (contact_a.id = civicrm_phone.contact_id)
LEFT JOIN
civicrm_im ON (contact_a.id = civicrm_im.contact_id)
LEFT JOIN
civicrm_openid ON (civicrm_openid.contact_id = contact_a.id)
LEFT JOIN
civicrm_location_type ON civicrm_address.location_type_id = civicrm_location_type.id
LEFT JOIN
civicrm_group_contact ON contact_a.id = civicrm_group_contact.contact_id
#LEFT JOIN civicrm_group_contact_cache ON contact_a.id = civicrm_group_contact_cache.contact_id
#LEFT JOIN civicrm_entity_tag ON (civicrm_entity_tag.entity_table = 'civicrm_contact' AND civicrm_entity_tag.entity_id = contact_a.id)
LEFT JOIN civicrm_note ON (civicrm_note.entity_table = 'civicrm_contact' AND contact_a.id = civicrm_note.entity_id)
LEFT JOIN civicrm_worldregion ON civicrm_country.region_id = civicrm_worldregion.id
#LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e9e3db3d6d9a` ON (contact_a.id = `civicrm_group_contact-5e9e3db3d6d9a`.contact_id AND `civicrm_group_contact-5e9e3db3d6d9a`.status IN ('Added'))
#LEFT JOIN civicrm_tag ON civicrm_entity_tag.tag_id = civicrm_tag.id
LEFT JOIN (
SELECT contact_id, CONCAT_WS(',', GROUP_CONCAT(cgc.group_id)) group_list FROM civicrm_group_contact cgc
LEFT JOIN civicrm_contact contact_a
ON contact_a.id = cgc.contact_id AND cgc.status = "ADDED"
WHERE contact_a.id in (2,1016)
GROUP BY contact_a.id
) AS sel_group_list ON contact_a.id = sel_group_list.contact_id
LEFT JOIN (
SELECT contact_id, CONCAT_WS(',', GROUP_CONCAT(cgcc.group_id)) group_list FROM civicrm_group_contact_cache cgcc
LEFT JOIN civicrm_contact contact_a
ON contact_a.id = cgcc.contact_id
WHERE contact_a.id in (2,1016)
GROUP BY contact_a.id
) AS sel_smartgroup_list ON contact_a.id = sel_smartgroup_list.contact_id
LEFT JOIN (
SELECT contact_a.id, cet.entity_id, GROUP_CONCAT(DISTINCT(ct.name)) tags FROM civicrm_contact contact_a
LEFT JOIN civicrm_entity_tag cet ON contact_a.id = cet.entity_id
LEFT JOIN civicrm_tag ct ON cet.tag_id = ct.id
WHERE contact_a.id in (2,1016)
GROUP BY contact_a.id
) AS sel_tag_list ON contact_a.id = sel_tag_list.entity_id
LEFT JOIN civicrm_website ON contact_a.id = civicrm_website.contact_id
WHERE
#(((`civicrm_group_contact-5e9e3db3d6d9a`.group_id IN ('581')))) AND
(contact_a.is_deleted = 0)
AND contact_a.is_deleted != 1
AND contact_a.id IN (2,1016)
GROUP BY contact_a.id
LIMIT 0 , 100000;