Allow dedupe rules to be used in place of hard-coded matching rules
Overview
Form a user perspective a Contact can have a many to many relationship with external and internal data. With the confidence behind these relationships being of varying weights. While I work for a small non-profit I always remind the staff that Google, Facebook, Salesforce, etc. all make their money off of profiling users. And that is exactly what we use CiviCRM for. To keep track of relationships and interactions with our Constituents.
Current behaviour
Currently CiviCRM has a number of ways for determining what it considers matching data.
First there are UNIQUE database keys. I.e. contact_id, transaction_id, etc.
Second there are hard-coded rules. I.e. drupal email equals civicrm email, external_identifier.
Third there are user-configurable rules for matching a contact during various operations. I.e. when someone makes a contribution, creates a CMS account or signups for an event, when a staff person creates a record, when a staff member searches and merges duplicates or when importing external data.
Proposed behaviour
I believe the hard-coded rules should be migrated to use the Dedupe mechanism. There are number of places where matching on either email or external_identifier is enforced. Thus limiting those connections to a one to one relationship.
I think the Dedupe mechanism should be extended to other entity types. Like location types so, I don't end up with duplicate home addresses or phone numbers.
In the long run I'd like to see the UNIQUE database keys replaced by a relationship which includes confidence level. This could be as simple as keeping the UNIQUE database keys but then a extension could have a table to hold "confidence" of the relationship and even the source (or sources) of the data.
Comments
As I finish this, I realize the bulk of this can be done in a external module. But only if there is a mechanism for an external module to determine what is or is not a match. The clean up of the import code certainly addressed a number of these issues. So, there is plenty of flexibility of what constitutes a match for a Contact. And I'm using this so I can have a one to many relationship with external data instead of being limited to the one to one relationship restriction of using the external_identifier.
And I as mention above I can use hook_civirm_import to keep track of my confidence of the imported data.
But what constitutes a match for entities other then Contacts is available and is hard-coded. I.e. Contributions, Activities.