Api support for deduping
I'm somewhat belatedly raising this as a way to track api support for deduping.
The goal is to facilitate a LeXIM approach to the dedupe screen. LeXIM is leap by extension, iterate by month. The monthly iteration part is necessary to expose relevant core workings via the api so an extension can leverage it. Generally this also includes improving test cover & code cleanup. Ideally we will also migrate the core form to use api end points.
We've already added some api to support this. In the course of trying to leverage this via an angular extension I've hit a bunch of gaps. Many are already merged into 5.18 or earlier. Note that the dedupetools is not necessarily going to be the 'preferred' dedupe form but the apis needed for that extension are probably the same as would be needed for any other api-based interface.
Outstanding as of now are
Open up Permissions for user with merge duplicate contacts Since the js api relies on api permissions these need to be suitably open https://github.com/civicrm/civicrm-core/pull/15187 https://github.com/civicrm/civicrm-core/pull/15188
Allow refreshing the search results This is similar to the 'refresh duplicates' in the existing screen (if we merge this I'll add this functionality to the deduper) https://github.com/civicrm/civicrm-core/pull/15196
Performance For batch deduping geocoding every address is a real problem. They are already geocoded & we copy them in their entirety. I'm pretty sure my changes to better support extension geocoding accidentally caused geocoding to start happening https://github.com/civicrm/civicrm-core/pull/15154
Tangental cleanup https://github.com/civicrm/civicrm-core/pull/15184 https://github.com/civicrm/civicrm-core/pull/15156
Still to do
-
- Support removing a row from the dedupe cache. The use case here is that someone goes throw a set of dedupe results and takes action on a bunch and marks a bunch 'ask me later'. At the end if they want to do a bulk action (e.g mark the remaining ones as non-duplicates) they will also mark the ones they excluded - so by facilitating deletion from a result set we allow them to better act on it.
We already have Dedupe.delete to remove a row but we would ideally support the calculation of the cachekey - this might look like Dedupe.delete api optionally accepting the params to calculate the cacheKey or an new api fn - Dedupe.uncache or similar.
-
Enhance Dedupe.getstatisics to return a count regardless or whether a dupe has been attempted. Currently if you call Dedupe.getstatisics after a dedupe has run it will tell how many were merged & how many were skipped. Before a dedupe is done, however, it does not return a count of the number of rows cached to be deduped.
-
add an api to support a bulk mark duplicates of result set. There is a prototype for this in dedupe tools but unlike the other api in this extension it is not upstreamed as yet
-
Consider porting to apiv4 - think about inputs & outputs we would change.