Default dedupe rules make no sense
I was asked by CiviCamp Hartford organizers to present on deduping. So I've been looking closely at the dedupe functionality from a beginner's perspective. I have concerns!
Primarily - the default Unsupervised rule is looser than the "Supervised" rule - and the General rule is so specific as to be almost useless.
The default Unsupervised rule is "email only", which seems sensible. The default Supervised rule is "first name AND last name AND email". I believe a more sensible default is "(first name AND last name) OR email".
If I get some buy-in on the idea, I have a PR to submit. However, the current Supervised rule has a bug, which is that it would treat two users with identical names but NULL email as not-dupes. Fixing that bug slows down the query by two orders of magnitude; so does using my proposed alternative rule.