Add REGEXP/NOT REGEXP operators to API4
We have a few SearchKit use-cases that require regular expressions in where
, so it would be nice for API4 and SearchKit to support MySQL's REGEXP
operator. I have a working patch to add this feature here.
One detail I got stuck on is that for typical Civi deployments (i.e. using the default collation), MySQL's REGEXP
is case-insensitive, which is not what you would usually expect when dealing with regular expressions. In Search Builder, where we also also offer the REGEXP
operator, we've chosen to force case-sensitivity via BINARY
; I feel like while that might make sense for a user-facing component like Search Builder, it's not clear that this is the right default for a more low-level component like API4 - but you could make that argument for SearchKit, so it would be nice for API4 to at least support case-sensitivity when explicitly requested.
This could be done by making REGEXP
a ternary operator where the last operand is roughly equivalent to match_type
in MySQL's REGEXP_LIKE()
1. Queries could look like:
->addWhere('first_name', 'REGEXP', ['^A', 'c']) // case-sensitive
->addWhere('first_name', 'REGEXP', ['^A', 'i']) // case-insensitive
// optionally, also support the simpler notation
->addWhere('first_name', 'REGEXP', '^A']) // use defaults (i.e. rely on collation)
The above could be added after the fact in a backwards-compatible way, so adding the operator in the simpler version from my patch (and following the collation's case-sensitivity by default) would also be an option.
-
REGEXP_LIKE()
is only available in MySQL 8+, so initially we'd have to implement this usingBINARY
and only support thec
value for the last operand. Once our minimum requirements are bumped to MySQL 8 (in a few decades😬 ), we could switch the implementation over toREGEXP_LIKE()
and support other values.↩