Deduping with multiple fields doesn't work with more than 999 contacts using MariaDB 10.3+
After upgrading to MariaDB 10.3 or later, finding dupes with a rule than uses multiple fields creates queries that runs for hours when checking 1000 contacts, but works fine for 999 contacts.
Here are the steps to replicate:
- MariaDB 10.3+
- Create a dedupe rule with First Name and Last Name
- Use the rule on a group with 1000 or more contacts (or use the Deduper extension to limit to 1000 contacts)
- Result: query runs for a very long time
- Use the rule on a group with 999 or fewer contacts
- Result: results returned quickly as expected
Turns out this is due to a setting that was added to MariaDB in 10.3: In Predicate Conversion Threshold, which converts the long list of ids in the query that CiviCRM builds to a temp table when the length of the list of ids exceeds the setting, which is 1000 by default. Changing the setting to 0 fixes the issue by disabling the conversion.
Not clear why the temp table is so slow. I'm going to try creating temp tables with indexes explicitly to see if that helps. If not and nothing else suggests itself as a solution, will at least add some documentation unless others have thoughts on how to resolve this so it doesn't require adjusting settings for everyone using MariaDB.