Support utf8mb4 so notes can save Emojis and other good things
(Joe is highjacking this bug report issue to deal with general support for utf8mb4 as suggested by @gharris at #392 (comment 8614))
As part of moving to support MySQL8, we should plan on better support for utf8mb4 charset.
This issue was originally about bug with emojis that will be solved by this. It needs planning and agreement on approach as well as eventual development.
Now is a good time to consider shifting from charset utf8 (which only supports 3 bytes and doesn't fully implement the UTF8 standard https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434) to the new default charset of utf8mb4, and at same time change the collation from utf8_unicode_ci to the new default of utf8mb4_0900_ai_ci. This will allow new emoticons as well as more languages and their characters to be properly stored in CiviCRM. My sense is that this is not difficult to code in GenCode. However, extensions that define fields in other collations and then compare them to core fields will likely need to be modified. Maybe an extension that changes the charset and collation of all core fields, and that hooks into the dynamic creation of fields for custom fields, is a way to start down this path.
---------- Original description below
Apparently Notes don't like emoji characters. Kind of funny that watching the logs via SSH the emoji renders, but it won't save. After only removing the emoji characters from the text, the note saved correctly.
Characters that were removed:
Here's a backtrace:
Aug 17 13:15:39 [info] $backTrace = #0 /basedir/civicrm/civicrm/CRM/Core/Error.php(232): CRM_Core_Error::backtrace("backTrace", TRUE)
#1 /basedir/civicrm/civicrm/packages/PEAR.php(921): CRM_Core_Error::handle(Object(DB_Error))
#2 /basedir/civicrm/civicrm/packages/DB.php(985): PEAR_Error->__construct("DB Error: unknown error", -1, 16, (Array:2), "INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...")
#3 /basedir/civicrm/civicrm/packages/PEAR.php(575): DB_Error->__construct(-1, 16, (Array:2), "INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...")
#4 /basedir/civicrm/civicrm/packages/PEAR.php(223): PEAR->_raiseError(Object(DB_mysqli), NULL, -1, 16, (Array:2), "INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...", "DB_Error", TRUE)
#5 /basedir/civicrm/civicrm/packages/DB/common.php(1907): PEAR->__call("raiseError", (Array:7))
#6 /basedir/civicrm/civicrm/packages/DB/mysqli.php(933): DB_common->raiseError(-1, NULL, NULL, "INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...", "1366 ** Incorrect string value: '\xF0\x9F\x98\xB3\xE2\x9D...' for column 'not...")
#7 /basedir/civicrm/civicrm/packages/DB/mysqli.php(403): DB_mysqli->mysqliRaiseError()
#8 /basedir/civicrm/civicrm/packages/DB/common.php(1216): DB_mysqli->simpleQuery("INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...")
#9 /basedir/civicrm/civicrm/packages/DB/DataObject.php(2415): DB_common->query("INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...")
#10 /basedir/civicrm/civicrm/packages/DB/DataObject.php(1040): DB_DataObject->_query("INSERT INTO civicrm_note (entity_table , entity_id , note , contact_id , modi...")
#11 /basedir/civicrm/civicrm/CRM/Core/DAO.php(571): DB_DataObject->insert()
#12 /basedir/civicrm/civicrm/CRM/Core/BAO/Note.php(173): CRM_Core_DAO->save()
#13 /basedir/civicrm/civicrm/CRM/Note/Form/Note.php(194): CRM_Core_BAO_Note::add((Array:26), (Array:0))
#14 /basedir/civicrm/civicrm/CRM/Core/Form.php(489): CRM_Note_Form_Note->postProcess()
#15 /basedir/civicrm/civicrm/CRM/Core/QuickForm/Action/Upload.php(169): CRM_Core_Form->mainProcess()
#16 /basedir/civicrm/civicrm/CRM/Core/QuickForm/Action/Upload.php(136): CRM_Core_QuickForm_Action_Upload->realPerform(Object(CRM_Note_Form_Note), "upload")
#17 /basedir/civicrm/civicrm/packages/HTML/QuickForm/Controller.php(203): CRM_Core_QuickForm_Action_Upload->perform(Object(CRM_Note_Form_Note), "upload")
#18 /basedir/civicrm/civicrm/packages/HTML/QuickForm/Page.php(103): HTML_QuickForm_Controller->handle(Object(CRM_Note_Form_Note), "upload")
#19 /basedir/civicrm/civicrm/CRM/Core/Controller.php(351): HTML_QuickForm_Page->handle("upload")
#20 /basedir/civicrm/civicrm/CRM/Contact/Page/View/Note.php(182): CRM_Core_Controller->run()
#21 /basedir/civicrm/civicrm/CRM/Contact/Page/View/Note.php(225): CRM_Contact_Page_View_Note->edit()
#22 /basedir/civicrm/civicrm/CRM/Core/Invoke.php(309): CRM_Contact_Page_View_Note->run((Array:4), NULL)
#23 /basedir/civicrm/civicrm/CRM/Core/Invoke.php(84): CRM_Core_Invoke::runItem((Array:14))
#24 /basedir/civicrm/civicrm/CRM/Core/Invoke.php(52): CRM_Core_Invoke::_invoke((Array:4))
#25 /basedir/civicrm/civicrm.php(1246): CRM_Core_Invoke::invoke((Array:4))
#26 /wp-base-dir/wp-includes/class-wp-hook.php(286): CiviCRM_For_WordPress->invoke("")
#27 /wp-base-dir/wp-includes/class-wp-hook.php(310): WP_Hook->apply_filters("", (Array:1))
#28 /wp-base-dir/wp-includes/plugin.php(453): WP_Hook->do_action((Array:1))
#29 /wp-base-dir/wp-admin/admin.php(224): do_action("toplevel_page_CiviCRM")
#30 {main}
Edit: Reordered emojis to match the way they appeared in original log file.
Summary of current state (edited Oct. 2, 2018):
Proposed ideas:
I. Create Extension for changing the way database definitions are handled in order to fit the LExIM approach
- Create blog post and extension developer heads up materials based on proposed changes
- Create new global variables that reside in civicrm.settings.php, thus creating new defaults for new installations
a. dbChar=utf8mb4 - To define the character set used by the installation
b. dbColl=utf8mb4_unicode_520_ci - To define the collation used by the installation
c. dbEngine=InnoDB - To define whether the table is innodb or myisam or whatever comes next - Define new global variables that can replace some of code (I believe that JoeMurray has expressed concerns with this step. If I've misread which concerns are being raised, please redirect.)
a. dbEngineCharColl - a concatenation that would end up looking like "ENGINE=InnoDB DEFAULT CHARACTER SET utf8 COLLATE utf8mb4_unicode_520_ci"
b. dbCharColl - another common concatenation that looks like it is reused multiple times throughout the code - During bootstrap, check if the new variables are set, and if they are empty, fill them with the current defaults of utf8 and utf8_unicode_ci, this keeps current installs from breaking on upgrade
- Replace all instances of database definitions with the appropriate variables (I think this is a set of five SED commands that I haven't built yet!)
II. Create an extension to upgrade existing installs to the new defaults, similar to Eileen's innodbtriggers extension, perhaps utilizing something similar to the script posted above, though it should probably utilize SQL commands directly, rather than a bash script
- Check current installed version of MySQL to see if utf8mb4_0900_ai_ci is available and act accordingly.
a. For MySQL 5.7 and below the current preferred collation seems to be utf8mb4_unicode_520_ci.
b. For MySQL 8.0 the current preferred collation seems to be utf8mb4_0900_ai_ci. - Convert existing database to the determined collation above
a. There's a bash script below to get started (#339 (comment 8461))
b. Is this done via PHP script or via MySQL commands?
c. Should this conversion be limited to only the core fields as to not break existing extensions? - Change the value of civicrm.settings.php dbChar and dbColl variables to match above via PHP script.
- Create a warning in System Status if my.cnf defaults do not match dbEngine, dbChar, and dbColl variables
- Create a hook into post event on creation of custom fields (https://docs.civicrm.org/dev/en/latest/hooks/hook_civicrm_post/) to change the charset and collation of tables and fields created to support newly defined custom fields.
III. Figure out what to do about other Extensions that have hard coded collation/charsets
- Create helpful documentation for converting code to the Database variable model
- Create template-able MySQL conversion scripts to be included in the update. Or can this be done with the hook?
IV. Rejoice in the sound of crickets instead of phone calls!