Deadlocked queries cause an instant error for end-users, but are retried in other circumstances
We have a lot of smart groups, and consequently a lot of deadlocks. There's already code in Civi that detects deadlock Exceptions, and then retries the query a few times (https://github.com/civicrm/civicrm-packages/pull/197/files). This is really helpful - I have extra logging on this, and I can see that it prevents a lot of errors. But this code doesn't kick in for people directly using the UI - they get the 'unknown error' yellow-screen-of-death immediately. This can happen on exports, or when opening complex groups. I dug into this and sort-of know why it's happening:
- When a deadlock happens, an exception is thrown if $GLOBALS['_PEAR_default_error_options'] is set to 'exceptionHandler'. This exception triggers the code which retries the query.
- An exception is not thrown if $GLOBALS['_PEAR_default_error_options'] is set to 'handle'.
- $GLOBALS['_PEAR_default_error_options'] is set to 'exceptionHandler' when you generate smart groups via the API, and in most other circumstances
- But $GLOBALS['_PEAR_default_error_options'] is set to 'handle' in the UI in general (?) - at least, it is on exports and when using 'Manage Groups'. So in this case no exception is raised. And if there's no exception, the try-catch blocks have nothing to work with and front-end users get 'unknown error' immediately.
This is WP and Civi 5.3.1. We've done / commissioned a lot of work to optimize smart groups to ease the strain on the server in general. But we're always going to have some dynamic smart groups, so it'd be nice to get around this one if possible.
I am happy to fund a fix for this, but I'm not sure what I'm getting myself into. Is this a bug, or is it a consequence of how error handling needs to work for end users?