Multi-lingual for multinational/global-scale
(This issue is not a singular problem with a singular fix. Think of it more as a long-term epic or general problem statement - it may entail several different approahces, issues, extensions, patches, phases, or experiments. My hope is to capture some discussion from sprints+calls over the years, esp points from @ayduns @BjoernE @bgm @eileen @ejegg etal)
Background
Internationalization (i18n) is the process of adapting a software system to support different languages and locales. For some organizations, their i18n needs are met by flagging one language for the entire system (e.g. an English organization uses the English language; a French organization uses the French language).
Other organizations require a more advanced form of i18n called multilingual -- this means that one organization offers its business-programs (eg events, conferences, newsletters, donation pages) in multiple languages concurrently. There are a few contexts in which multilingual makes sense:
- The organization targets a single region with multiple endemic languages. For example, in Canada, you may have an organization serving significant populations of French and English speakers. Similarly in Belgium (Flemish/French), southwestern US (English/Spanish), Switzerland (Swiss German/Swiss French/Swiss Italian), and so on. As a rule-of-thumb, few locales would have more than 3 endemic languages.
- The organization targets a large number of countries with diverse languages. For example, a pan-European organization might have 8 languages, and a global organization might have 30 languages.
When adapting business software to a multilingual organization, one might take an informal approach or a formal approach, by which I mean:
-
Informal translation (duplicates): The software remains effectively unchanged. If the user needs more languages, then he simply adds more records. Thus, if the user wants to make a bilingual newsletter, then he actually makes two distinct
Mailing
s (one per language). For a bilingual fundraising campaign, he makes twoContributionPage
s. (This requires less upfront technical work, but it also requires on-going training. Fine-tuning workflows+reports may be tricky.) -
Informal translation (in-line): As above, the software remains effectively unchanged. If the user needs more languages, then he just does everything in-situ. Thus, a bilingual newsletter has one "Subject" with two languages (
Subject: Hello world / Bonjour tout le monde
). (This again requires less upfront technical work, but it's progressively uglier as you add each new language. Fine-tuning workflows+reports may still be tricky.) -
Formal translation: The software is updated to allow translation of different records. Thus, a bilingual newsletter is one
Mailing
with two differentSubject
s. A bilingual fundraising campaign is oneContributionPage
with two descriptions. (This requires more upfront technical work, but it allows more tuning of the workflows and reports.)
There is some documentation about CiviCRM and i18n, eg
- https://docs.civicrm.org/user/en/latest/the-civicrm-community/localising-civicrm/
- https://docs.civicrm.org/dev/en/latest/translation/
Problem/Goal (General)
In its default/basic mode, CiviCRM supports single language (which can also be used for informal translation). Additionally, the Civi administrator can enable formal multilingual. The formal mode is well suited to some organizations and challenging for others.
The next section will summarize some specific technical issues, and the needs obviously vary case-by-case, but (broadly speaking) I found it helpful to consider two types of multilingual organizations:
- (Loosely) Bilingual orgs: For regional organizations supporting 2-3 endemic languages (loosely - "bilingual" orgs), Civi's formal multilingual is "pretty good"; it's often "optimal"; and (even at its worst) it is at least "acceptable".
- (Loosely) Multinational orgs: For far-flung organizations supporting 8+ languages (loosely - "multinational" or "global" or "pan-continental"), the design is... less optimal.
Problems (More specific)
- MySQL Columns: For every language, Civi's MySQL schema replicates ~100 DB columns in ~25 tables. (For 8 languages, that would be a total ~800 columns.) This can max-out some hard limits in MySQL.
-
Roles: For an organization supporting 1-3 endemic languages, there are many staff who speak each of the languages, and (e.g.) it's realistic to say that the event-manager role is responsible for translating the event-description. However, with 10+ languages, it is not realistic -- so you may create more specialized roles that don't exist in a unilingual/bilingual organization. For example, one person may set the initial terms of the
ContributionPage
, and 9 other people translate the description. More nuanced roles mean you also need to think about:-
Workflows: If your role is "Italian translator", then you don't want to manually fish through all the
Event
s andContributionPage
s to see which ones need translation. The workflow should draw attention to the things that need you. -
Permissions: If your role is "Italian translator", then you certainly need permission to edit the Italian "Description" of an
Event
... but maybe you shouldn't have permission to change the registration-deadlines, the price-structure, or the Spanish translation.
-
Workflows: If your role is "Italian translator", then you don't want to manually fish through all the
-
Negotiation/Fallback/Sparsity: With a larger number of languages, it takes a lot of labor to fill every translatable field on every record in every locale in a timely way. There will be gaps in the translation matrix -- whether purposeful ("it's not worthwhile to translate this one between en_US and en_CA") or incidental ("the translator hasn't gotten to this yet"). With sparser translations, the fallback/language-negotiation becomes more important. (Ex: If
en_US
is missing, then fallback toen_CA
or oren_GB
. Iffr_CA
is missing, then fallback tofr_FR
.) -
Entities/Fields: The multilingual translation support is targeted at specific entities+fields. However, the value of formally translating any specific entity or field may be assessed differently. Consider two opposing examples:
-
ContributionPage: For a regional organization with bilingual constituents, you might commit to formally translating every
ContributionPage
. In a global organization with different countries, you might find that each country has so many differences (language, pricing, taxes, etc) that you prefer to create newContributionPage
for each.... so you never use formal translation ofContributionPage
s. (Or, if we're really exacting, you might have oneContributionPage
for each country... but within bilingual countries, you'd want different translations of the page!) -
MessageTemplate: For a regional organization with bilingual constituents, you may decide it's preferable to send bilingual notifications (ie send a receipt with both English+French text), so you don't need formal translation of
MessageTemplate
s. However, in a global organization, it's crazy to put 10 languages into 1 receipt -- instead, you should formally distinguish the translations for each language.
-
ContributionPage: For a regional organization with bilingual constituents, you might commit to formally translating every
Considerations
For purposes of this issue, I would say that support is "complete" when it is possible (by a mix of extensions/patches) to configure the MySQL schema/roles/workflows/permissions/fallbacks/entities/fields in a way that satisfies (a) a regional/bilingual organization and (b) an multinational/global organization.
However, this still leaves considerable room for interpretation/variation. Ask yourself: is it true that multilingual users fall in two distinct buckets (bilingual vs multinational)? Or are those idealized extremes, with most organizations taking some place in between?
- If it is truly two categories, then you might address this by implementing two separate subsystems:
- The existing "bilingual" subsystem provides permissions/workflows/entities that are suitable for bilingual orgs.
- A new "multinational/global" subsystem provides permissions/workflows/entities for multinational orgs.
- If there are many shades of grey, then you might address this by giving more subtlty to each aspect, eg
- The data-storage layer might use localized columns, or a dedicated string table, or something else. (Perhaps you can swap the data-storage while keeping the rest.)
- Different modules might provide different workflows/UIs - so one module allows editing translations in-situ (per-entity), while another another module provides more centralized translations (per-language). (Perhaps these different workflows/UIs can used separately - or perhaps they can coexist giving alternate access to the same data.)
- The list of localized entities/fields might be configurable.
- The process of configuring l10n/i18n might be softened - instead of a system-level flag (configuring translation for everything), it could be more case-by-case/opportunistic -- where you can optionally "attach" n-ary translations to any given field.