totten · 6721f3e1
--- a/sample-data.md
+++ b/sample-data.md
+<span dir="">This is a discussion-document/braindump circa Jan 2022. </span>It aims to provide a broad snapshot of the backlog of issues facing sample-data. As a discussion-document, it might be used to assess priorities, compare approaches, or breakout issues.
+
+## Issues
+
+* **<span dir="">Workflow</span>**<span dir="">: The workflow of “civicrm_generated.mysql” is tedious/onerous. It is prone to merge-conflicts and difficult to trace or patch.</span>
+* **<span dir="">Ossification</span>**<span dir="">: Because it is difficult to maintain “civicrm_generated.mysql”, we avoid it - so many areas/topics have weak coverage in the sample data-set.</span>
+* **<span dir="">D8/D9</span>**<span dir="">: If you use D8/D9, the default installation flow bypasses the install screen - it is highly unlikely that an evaluator/tester will discover that sample data is available.</span>
+* **<span dir="">Extension Entities</span>**<span dir="">: There is no consistent approach to sample data for extensions (say, CiviVolunteer or Mosaico).</span>
+* **<span dir="">Reproducibility/E2E</span>**<span dir="">: With RNG, a small change can bubble out to make a larger change in the overall sample data-set. It is therefore unreliable to use sample-data for E2E testing.</span>
+* **<span dir="">POV/Tuning</span>**<span dir="">: Different evaluators/testers may have different interests (wrt #records, choice of subsystems, complexity of configurations). The use of a singular sample data-set makes it difficult to attune to all those interests.</span>
+* **<span dir="">Naive Adopter Cleanup</span>**<span dir="">: A naive adopter may load samples as a way to learn/explore. While learning, they start to add their own data on top. Later, they realize that they have created a confusing mix of data - which must be cleaned or migrated. There is no clear on/off (hide/show; import/delete).</span>
+
+## \
+Goals
+
+Combining that list of issues with some of the existing/desirable characteristics, we might give a list of positive goals/features:
+
+* <span dir="">Sample data-set(s) should provide broad coverage for different subsystems, locales, and scales (sizes).</span>
+* <span dir="">Sample data-set(s) should be readily/easily available for use by evaluators/demo’ers, trainers/learners, developers/testers, and (E2E) automated-test-systems.</span>
+* Sample data-set(s) should be maintainable. <span dir="">The workflow for patching sample-data and seeing it in a working site should be simple/short.</span> When a change is required in the data-model, it should be straight-forward to discover+revise corresponding sample data-set(s).
+* <span dir="">Sample data-set(s) should be reproducible (for consistency in tutorials, E2E, etc) - but they should be allowed to use controlled-randomization</span> (to generate large sets; and to allow primitive fuzz-testing).
+* <span dir="">Sample data-set(s) should be available for different deployments/environments (Drupal/WordPress/Backdrop/Standalone; </span>[<span dir="">localhost/VM/VPS/Aegir</span>](http://localhost/VM/VPS/Aegir)<span dir="">; etc).</span>
\ No newline at end of file