CiviCRM geographical data is never accurate
Overview
We are constantly finding problems with our lists of country names, state/province names, and state/province abbreviations. We're missing some, others are incorrect. Some folks have thought it'd be nice to automate it.
Unfortunately, this isn't very straightforward. The ISO source we claim to follow isn't publicly available. Even at the country level - many official names differ from what we use (e.g. "United Kingdom" vs. "United Kingdom of Great Britain and Northern Ireland"). And even "official names" differ depending on whether we reference Wikipedia, the CIA, etc.
So some of our data is wrong by everyone's standard - but is it worth fixing state/provinces if we don't fix countries?
The code
For giggles, I wrote a dirty script to identify discrepancies between Civi data and published data called civiregioncheck. It's not complete because given the volume of necessary changes, I don't want to sink effort into finishing it if we can't take action on it.
Attached is a partial list of discrepancies:
- Country names that don't match ISO-3166-1.
- State/provinces that have a matching ISO-3166-1 country and ISO-3166-2 name, but the abbreviation is incorrect.
Ideally we also find state/provinces that are missing (or dissolved), and state/provinces with a matching abbreviation but incorrect name.
What next?
Are we really changing the UK (above) or "United States" to "United States of America"? And if not, where do we draw the line? ISO-3166-1 current specifies the official country name of Taiwan as "Taiwan, Province of China". And we include Kosovo despite it not having an official ISO 3166-1 code.
If this is too thorny to hash out, we'll go back to our patchwork process - but our data has hundreds of discrepancies.