Escape-on-Input => Escape-on-Output
Introduction
In a multi-tier application like CiviCRM, a piece of data (such as a first_name
) may be passed through several systems and formats (MySQL/PHP/Firefox/etc; SQL/HTML/URL/JSON/CSV/PDF/etc). In each system, this piece of data may be encoded in a different way. The differences are sometimes imperceptible to a casual reader (ex: "banana" looks the same in SQL, HTML, URL, JSON, and CSV), but mistakes and confusion about encoding are still problematic. The impact ranges from quirky (odd characters appearing occasionally in unexpected places) to malicious (security vulnerabilities).
CiviCRM has a long-standing technical debt with respect to encoding. This debt is not specifically a bug, feature, or vulnerability in itself. (If you know a specific piece of data that is misencoded, then that is often a security problem that should be reported to the security team for discrete resolution.) Rather, this debt is an unusual convention ("Escape-on-Input") which has unintuitive consequences and thereby invites bugs.
See also: Dev Guide: Sanitize inputs or outputs?
At the recent Barcelona sprint, there was a partial meeting of the security team, and attendees agreed that this was the root issue behind multiple problems and merited special attention. Unfortunately, this convention is deeply embedded. Changing it is difficult and risky, so it has lingered for several years without substantive improvement, and (with its current framing/visibility) would remain indefinitely. The aim of this filing is to have a public discussion/record which works out the what/how/why of a (possible) cleanup.
Definitions
- Escape-on-Input (aka Esc-In): An architectural style in which data is taken as input and immediately escaped; it is then written to disk in escaped (HTML-esque) format. Consequently, subsequent searches+reads yield HTML-esque strings. These can be outputted literally on HTML pages without any extra processing, but (for other media) they require double processing (decode/re-encode).
- Escape-on-Output (aka Esc-Out): An architectural style in which data is is taken as input and stored in its canonical format. When the data is presented as output, it is escaped in a way that matches the current output-format.
-
Canonical (String format): An encoding in which strings look... like they should. For example, if a user types
first_name
of:<
, then the string is:<
-
HTML-esque (String format): An encoding in which key HTML characters are escaped as entities. For example, canonical string
:<
becomes HTML-esque string:<
. In HTML-esque, one does not intend to convey HTML tags. -
O(1) Plan/Task: A plan/task in which the number of tasks is fixed. (Ex: To change behavior of all "quickforms", you might patch+test the base class
HTML_Quickform
one time.) It may strictly be 1 or 2 or 3 steps... but the number of steps does not depend on how many screens/use-cases are impacted. - O(n) Plan/Task: A plan/task in which the number of tasks depends on how many screens/use-cases are impacted. (Ex: To change behavior of all "quickforms", you might patch+test every subclass.)
- Smarty Filter/Autoescape: A Smarty filter allows one to programmatically change the behavior/content of all Smarty templates. For example, you can write a Smarty prefilter to automatically add HTML "escape" commands to all output.
-
Linter: A programmer tool which scans source code and checks for compliance. For example, a linter could have the rule "all Smarty variable include the
|escape
or|noescape
modifier."
Safe and unsafe situations
The status quo is peculiar. It is loosely safe in some common situations, but it's surprisingly unsafe in other situations. Consider this table:
Read/write data via SQL/DAO/BAO | Read/write data via APIv3/v4 | |
---|---|---|
Process inputs+outputs via HTML_Quickform+Smarty (HTML-only) | OK | Caution |
Process inputs+outputs via most frameworks (Drupal Form API, AngularJS, DOM API, PHPExcel, etc) | Caution | OK |
Interchange data between SQL/DAO/BAO and APIv3/v4 | Caution | Caution |
In "OK" situations, you can take a string and send it to the screen -- and there's a high probability it will end-up encoded correctly without any thinking/effort.
In "Caution" situations, you should treat most strings with suspicion -- many would merit extra encoding/decoding steps.
The general aim of this cleanup issue is to be "OK" on the entire grid.
Relevant background/reading
- (Jan 2010) CRM-5667: Introduction of "Escape-on-Input"
- (Dec 2012) svn log: Mitigation for APIv3 consumers
- (Dec 2012) CRM-11532: First filing about the problematic status-quo. (Shorter)
- (Apr 2018, et al) security/core#6: Second filing about the problematic status-quo. (Longer)
- (2018) Dev Guide: Sanitize inputs or outputs?: Reference material for developers
- (Oct 2019) POC: Auto-escaping in Smarty v2
- (Oct 2019) Report: Categorize DB fields by escaping implication
- (Oct 2019) Report: List of Smarty expressions
Candidate Approaches
-
Do Nothing:
- How: Kick off your shoes. Pull up Netflix.
- Pro: Less work
- Con: Encourages developers to write buggy code. Feels embarrassing whenever they realize this is a thing.
-
Epic w/Smarty Filter:
- Pitch: Smarty does all HTML-escaping automatically and by default.
-
How:
- Remove Esc-In policy (
HTMLInputCoder
) - Write a generic upgrade step to recode all stored strings.
- To restore output-coding, add a Smarty prefilter to auto-escape output.
- Remove Esc-In policy (
- Pro: Cleans up the database.
- Con: The main problem is Smarty templates are not homogeneous - most generate HTML, but some generate other formats. They often pass-through data from the database, but they also display data which is computed and/or loaded from a different source. Consequently, there is an O(n) task of re-testing all screens/fields and patching a large number of exceptions.
-
Con: Additionally, there's the
r-tech
issue - even if we correctly change every screen/field/line in the main application, there is an open-set of third-party integrations. If these use SQL/DAO/BAO and work correctly in the current system, then they'll likely break as soon as the epic change is made, which will create stress for upgrade-scheduling. - Con: Smarty is difficult to write unit tests for.
-
Epic w/DAO Filter:
- Pitch: DAO does all HTML-escaping automatically and by default.
-
How:
- Remove Esc-In policy (
HTMLInputCoder
) - Write a generic upgrade step to recode all stored strings.
- To restore output-coding, update DAO/BAO with an optional escaping mode. (Take the current skiplist - everything else defaults to autoescaping.)
- Remove Esc-In policy (
- Pro: Cleans up the database.
- Pro: Looks like an O(1) plan.
- Pro: Restoring output coding can be done at a single point - no need to fix existing templates.
- Pro: Output coding can be unit-tested.
-
Con: Only addresses DAO-based query-building (
$d=new DAO_Foo(); $d->bar=123; $d->fetch()
). Does not addressexecuteQuery('SELECT foo AS bar...')
,executeQuery('UPDATE foo SET bar...')
, or other query-building. This seems to be left as O(n) task? - Question: Do we get read/write consistency?
-
Wait by Extension Approach:
- Pitch: Grow new UIs on top of APIv3/v4. Obsolete everything all UIs that rely on DAO/BAO/SQL.
- How: Do nothing specifically for this problem. Instead, wait for extensions like Afform or Data Processor to gradually replace the HTML_Quickform/Smarty UIs. Once there's a critical mass, remove all old UIs and do an "Epic" switch.
- Pro: Don't have to do anything now! Creates pressure to do other useful changes.
- Con: You may be waiting a long time.
-
Incremental, Per-Component/Per-Field:
- Pitch: Convert fields incrementally
- How: Make a list of all relevant DB fields (~500). In the first month, take a subset (such as "all CiviEvent fields") and transition/test the subset. In the second month, do another subset of fields (such as "all CiviCampaign fields").
- Pro: This allows tasks to spread out over time with more manageable/affordable chunks.
-
Con: This still has the
r-tech
issue for third-party integrations. It creates a long-term drip-drip of compatibility breaks -- every month, some subset of fields have their data-format changed. This will make it quite challenging for extensions/integrations to keep in sync. - Comment: Bespoke screens (e.g. "New participant") are amenable to simple grep+divide+conquer, but we might get boxed-in with how to update metadata-driven screens which tend to treat all strings the same way (e.g. import/export).
-
Incremental, Setting Plus Smarty Linting:
- Pitch: Add toggle to allow old+new storage-formats. Incrementally enhance code to make new-storage-format work.
-
How:
- Define a new setting to indicate whether strings are stored as "html-esque" (initial-default; long-term-deprecate) or "canonical" (initial-experimental; long-term-goal).
- Define a migration script - which can be run whenever one changes the setting.
- Update Smarty/Quickform/API to check the setting and filter accordingly. Define corresponding Smarty filter. (Naive example:
{$records[$cid]->display_name|crmEscape:'Contact.display_name'}
means "escape this data based on the escaping-rules for theContact
fielddisplay_name
) - Write a linter to report on which Smarty templates use the filter. Over time (month-by-month), incrementally update Smarty templates to use the new filters and re-test those screens.
- Pro: This allows tasks to be spread out over time with more manageable/affordable chunks. The linter provides a guide for helping us track "cleared ground".
- Pro: Site-implementers and extension/integration-authors will have greater flexibility on scheduling the change - they don't have to update all codes and all databases at the same time.
- Con: Requires exposing some new filters/functions/settings.
-
Incremental, Setting Plus DAO Filter:
-
How:
- Define a new setting to indicate whether strings are stored as "html-esque" (initial-default; long-term-deprecate) or "canonical" (initial-experimental; long-term-goal).
- Define a migration script - which can be run whenever one changes the setting.
- Update DAO/API to check the setting and filter accordingly.
- Pro: At a glance, this looks like an O(1) plan.
- Pro: Site-implementers and extension/integration-authors will have greater flexibility on scheduling the change - they don't have to update all codes and all databases at the same time.
-
Con: Only addresses DAO-based query-building (
$d=new DAO_Foo(); $d->bar=123; $d->fetch()
). Does not address literal queries (executeQuery('SELECT foo AS bar...')
,executeQuery('UPDATE foo SET bar...')
) or other forms of query-building (CRM_Utils_SQL
/APIv4/adhoc PHP). This seems to be left as O(n) task? - Question: Do we get read/write consistency?
-
How:
-
Incremental, SQL Views with Deprecation Alerts:
- Pitch: Support both formats in MySQL at the same time. Borrow the idea of wide-spread VIEWs from multilingual.
-
How:
- Every entity (eg
Contact
) should have a concrete TABLE and a derived VIEW (egcivicrm_contact
andcivicrm_contact_new
) which present the same data in different formats (old/HTML-esque and new/canonical). -
executeQuery()
generates a deprecation warning whenever there's a query that hits old-format table. The warning says something like "civicrm_contact_old
is deprecated. Please usecivicrm_contact_new
instead. Be sure to update the escaping on any consumers." - At the mid-point during transition, swap the SQL schema (eg the old-format is given by SQL VIEW and the new-format is given by SQL TABLE).
- Eventually, remove the old-format VIEWs completely.
- Every entity (eg
- Pro: Existing (unpatched) consumers in any medium continue to work throughout the transition. But they start to emit warnings immediately.
- Pro: Extension/integration-authors (and site-admins/upgraders) have greater flexibility on scheduling changes - they don't have to update all codes and all databases at the same time.
- Con: Query performance on the filtered VIEW should be slower than the canonical TABLE. During the transitional process, queries will flop-flop around between faster/slower.