Import code cleanup
I've been working to rationalise the import code prior to fixing a couple of larger things. This ticket is intended to explain what is happening in the current code as I make sense of it....
Basic structure
The import has 4 screens which are (in order) DataSource MapField Preview Summary
The first 3 screens each call the relevant Parser class in the submit function - in each case in a different 'mode' with the goal of achieving different things.
The code then 'sets' submitted values and values from the parser on the form to pass it through to the next form - this setting appears to be for 2 reasons
- because the early civi devs didn't know how else to pass values submitted on one screen to the others
- because they had calculated certain values and perceived it to be more performant not to do so again.
Passing form values - the new way
The import forms can now all get the submitted values using the getSubmittedValue
function - this can fetch a field submitted on the DataSource
form - even if the contact is currently on the MapField form based on a coded array of which values are on which form - so in the new code we use getSubmittedValue
rather than the endless set+get routine to get the submitted values (this is in progress - see
Running the parser
Here is the goal of the parser at each step
DataSource
The DataSource screen doesn 'really' use the parser although it calls it in MAPFIELD mode- it actually only uses the csv parsing functionality of it it to get
- rows - it actually requests the first 100 rows & then uses 2 of them to pass to the MapField screen as sammple data - these are assigned to the template as
dataValues
- column headers - if there are some are assigned to the template
3)various rowCount, columnCount variables to be assigned to the template to use with the smarty
{section}
tag DataSource parsing- the new way The DataSource form doesn't really need to call the Parser class at all. The MapField can get it's own values once - these functions are available on all imports (currently only Contact)
$this->assign('columnNames', $this->getColumnHeaders());
$this->assign('columnCount', $this->getNumberOfColumns());
$this->assign('dataValues', array_values($this->getDataRows(2)));
- The rewrite of MapTable.tpl done for Contact import is done for the others - allowing us to stop needing all those count variables.
The datasource class is also doing way too much handling of the temp tables - once https://github.com/civicrm/civicrm-core/pull/23273 is merged the datasource classes (CSV and SQL which are currently used by the contact import and are a todo for the other imports) will handle creating the tables and dropping them and adding the status columns
MapField
The mapfield class calls the parser in PREVIEW mode - the goal here is to validate the rows in the datasource and to provide access to download data about rows that have not imported
MapField parsing- the new way The goal here is to separate out the validate function and call only that not 'run' - the validate function would update the output directly rather than doing this weird array_shifting pushin on errors.
Preview
The preview class calls the parser in import mode - this is where the importing happend
Summary This class does not call the parser - however there is a postProcess there to drop the temp table - the intent in fact is that we RETAIN the temp table to export output on the fly rather than in CSV files - see https://github.com/civicrm/civicrm-core/pull/23291
Managing the mappings
The way it works is that the selection is passed to the postProcess on MapField
looking something like
[['first_name'], ['phone_type_id', '1', 2], ['2_a_b', 'phone_type_id', '1', 2]]
This is interpreted in 2 ways
- to be saved in civicrm_mapping_field - for this purpose the above is translated to
[
[
'name' => 'First Name'
],
[
'name' => 'Phone',
'location_type_id' => 1,
'phone_type_id' => 2,
],
[
'name' => 'Phone',
'relationship_type_id' => 2,
'relationship_direction' => 'a_b',
'location_type_id' => 1,
'phone_type_id' => 2,
],
]
This conversion becomes more readable with this change
Note that the label rather than name is saved - this is a todo to fix - see https://github.com/civicrm/civicrm-core/pull/23288
- to be set on the parser - (trigger warning) - this is passed as a series of arrays like
$phoneTypeIDs = [NULL, 2, 2,]
$relationshipDirections = [NULL, NULL, 'a_b']
etc etc - the parser then (in a hugely unreadable way) matches all these arrays up by index to get the sort of info visible in the above mapping array - it does that in setActiveFields
which converts the row to a $params
keyed in a semi-meaningful way
-
Stop dropping Temp Tables on completion (contact import only) https://github.com/civicrm/civicrm-core/pull/23291 -
Save Mappings by name not label - Contact - https://github.com/civicrm/civicrm-core/pull/23288