Translation issueshttps://lab.civicrm.org/dev/translation/-/issues2021-08-12T19:25:30Zhttps://lab.civicrm.org/dev/translation/-/issues/67Canonize API for storing translated data2021-08-12T19:25:30ZtottenCanonize API for storing translated data# Goal
Enable richer user experiences which incorporate data-translation. Specifically, provide a CRUD API for administrative applications that need to read/write alternate versions of a string in the database.
# Background
* This is ...# Goal
Enable richer user experiences which incorporate data-translation. Specifically, provide a CRUD API for administrative applications that need to read/write alternate versions of a string in the database.
# Background
* This is most immediately motivated by https://lab.civicrm.org/dev/mail/-/issues/83, which aims to improve the process+experience of drafting+testing workflow templates. For this case, the string that is being edited (ie `civicrm_msg_template.msg_html`) is a relatively rich piece of content (with HTML tags, tokens, Smarty expressions - which in turn may vary based on the context for which the template will be used). The richness of the text implies that one should have more features available (token-pickers, syntax-highlighting, ad nauseum). Editing a translation of this content in a generic textbox (as with multilingual UI, Transifex UI, or POEdit) would be difficult and error-prone.
* This is intended as a step in support of https://lab.civicrm.org/community/feature-request/-/issues/26, which is a broad effort (initiated by @ayduns @BjoernE) to re-conceive how the multilingual subsystem works. TLDR: Current multilingual requires significant MySQL schema manipulation. This works for 1-3 languages but does not scale to 10 languages. Resolving it requires changes in the storage/lifecycle of translated data.
* Inspired by this discussion, Eileen wrote a proof-of-concept extension https://github.com/eileenmcnaughton/civi-data-translate. The scope of `civi-data-translate` mostly matches the scope of this filing, but not quite perfectly. It matches insofar as it introduces an APIv4 interface and a MySQL table for strings. It diverges insofar as it specifically touches on `MessageTemplate`. (The work for `MessageTemplate` is left as a separate matter.) Its biggest obstacle is dependency-hell: it requires a skilled administrator to maintain a deployment, which disincentivizes development and usage.
# Approaches
Working within the limits of available code and capacity, it appears feasible to adapt `civi-data-translate` to this purpose. Either:
1. Move its APIv4 interface and data-storage to core-proper, or...
2. Move its APIv4 interface and data-storage to core-extension.
# Comments
* Having an API to edit the strings would be meaningless if we did not have a data-store.
* There is a performance question about using MySQL for a string table. (Most FOSS applications use `gettext` MO files which are optimized for fast lookup of static strings. This is how Civi handles translation of its numerous app-strings.) In prior discussions with @BjoernE @ayduns etal, we identified this balance:
* There is a difference between *administration* (browsing/editing strings) and *runtime lookup* (substituting 1000 strings during a page-load).
* For administration, there is no question about whether the performance of a MySQL string-table would be acceptable. It would be. In fact, many different tools/workflows/stores can be acceptable.
* The performance question is relevant to *runtime lookup of heavily used strings*. The performance question is not necessarily closed, and it depends on other variables (*the #data-strings, the use-case, the hardware, etc*).
* If one does need to optimize lookup, the best known approach is to compile to gettext. To wit: Read strings from whatever source is handy, aggregate them, and [write them](https://github.com/pear/File_Gettext/blob/master/File/Gettext/MO.php) to a cache folder in `*.mo` format. (You can see [de.systopia.l10nmo](https://github.com/systopia/de.systopia.l10nmo) as a foray into this approach of blending/merging string sources.)
* I was worried about proposing this - specifically, worried that it might conflict with a more optimized dataflow. However, on reflection, I think it is complementary progress. Suppose you wanted to patch `l10nmo` to include a feed of strings provided by web-based administrators. If each web UI stored strings differently, then you'd probably give up. But if they use the same (shared/documented) string API, then it's easier to pull from there.
* (*I mention this as a hypothetical. In practice, some things like dev/mail#83 can be achieved without this level of optimization. The upshot is that we can bite off a chunk of work here on the API/storage side and make some incremental progress.*)