This issue is an experiment. The goal -- brainstorm for Little Ideas that will improve the quality-assurance regime. I'm not looking to replay past discussions -- if something was controversial before, it's probably still controversial. For purposes of this issue, humor me. Suppose there's a realistic budget -- like a week of senior developer/designer/documenter/administrator time. What process change could they perform that might make a dent at improving QA? Perhaps something that makes PR review more thorough? Or perhaps it improves engagement with RCs? Or perhaps it lowers the barrier to testing?
Please post one idea per comment. If you like or dislike an idea, put a thumbsup or thumbsdown emoji on the comment. To discuss the idea in more depth, ping the commenter on Mattermost (https://chat.civicrm.org under dev).
Emojis:
- Good idea. Simple, clear steps. Would probably improve quality. Achievable within budget.
- Bad idea. Wouldn't improve quality.
- Don't really want to judge. It sounds good, but it's not simple enough or not clear enough or too long. Maybe if the idea was refined more.
- The comment is conversation.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Task: Perform a phonebank/canvas operation to reach out to partners and active contributors. Ask about their testing trends -- how often they test CiviCRM releases; how long it takes when they do; how that time is funded; how it's scheduled. Record this in master spreadsheet/database.
Cost: Probably few days of time (for canvassing) and a few more (for analysis/follow-up).
Rationale: We don't have much of a strong/consistent metric for community's overall testing effort. It's hard to determine what's a reasonable expectation.
Benefits: If we have a bigger picture on this, then maybe we can better align efforts to improve efficacy/efficiency.
Risks:
We do the canvassing and don't get any useful information -- because people are non-committal/unsure about their own testing work.
We do the canvassing and get unfortunate information -- that the ecosystem is already at maximal efficiency, or that no optimization is possible because everyone is locked-in to their current approach.
Task: Update the PR issue template and PR review guidelines to add a new criterion: every PR author must state what resources they (or their organization) will commit to RC period.
Cost:
For the project administration, a few days to discuss/write the criterion. A part-time effort in subsequent weeks to ensure that all mergers enforce the criterion.
For all contributors, an on-going increase in the cost of submitting PRs.
Rationale: During PR review, we have a bit of leverage with the submitter (and their client/boss). Moreover, it reinforces a natural incentive: if you want the change to get into release X, then you probably want release X to be generally stable/usable.
Benefits: Increase capacity on RC testing
Risks:
Some prospective contributors may be turned-off because this raises the bar to contribution.
Once we merge, we lose leverage. If someone were to misrepresent their commitment to RC testing, then we wouldn't have much recourse. (At least, not until they submit another PR.)
If you turn RC testing into a transactional relationship, then you may lose some people who are more keen to participate non-transactionally.
Task: Implement a sign-up mechanism and a bot which automatically adds a signpost to Github PRs. The signpost includes automated messages along these lines:
If the PR modifies CRM/Mailing/** or templates/CRM/Mailing/**, then post a message: "@totten, a new CiviMail PR is pending. Please review the concept and explanation."
If the PR modifies CRM/Contribute/BAO/InternalThing::thatICalledAnyway(), then post a message "@extauthor, a new PR modifies an internal function that you depend on. Please check that your extension continues to work."
Cost:
For project administrators: A few days to discuss design. A few days to implement and document. A few days of follow-up. Periodic usage questions.
For all contributors: Read/understand docs. Configure signposts. Pay attention to them.
Benefits: Streamline communications, making it easier to match people with the PR's that they care about.
Risks:
People ignore their pings.
The discoverability of the signposts -- or the UX for configuring signposts -- sucks.
The current system of code review (someone has to review and sign off before a PR is merged) works well, but I don't think it increases the amount of time spent kicking the tyres. In addition to a code review, each PR submitter should request a community star to manually test the change without viewing the code (or, before viewing the code).
Could be in the form of a blockchain, "CiviCoins" or similar (but wouldn't have to be) to incentivise development efforts.
CiviCoins could be exchanged for testing, bug fixes, and so on. 100cc could be (for example) 1 hour of core team time. Other studios/developers would set their own rates accordingly.
Coins could be earned for:
Reporting a bug which meets certain criteria (e.g. repro steps)
Writing PRs that get merged
Reviewing / Bling Testing PRs
Releasing a good extension and maintaining it
Doing lighting talks / presentations at CiviCon
Helping another community member out - they can give you some of their CC
I find this stuff is phenomenal for end-to-end making sure bugs don't happen. A senior dev, flat out on this for a week, would dramatically reduce the amount of bugs making their way into RCs.
This was discussed multiple times, and possibly approved at some point in time. Have the core team change the test infrastructure so it can test on partner-provided databases, or on partner-provided remote staging instances.
Pros:
dramatically increase the test coverage as tests would run in other contexts than the testing database
Cons:
dramatically increase the testing time, so should only be done for RCs (vs every commit)
might generate more false positive (but then will help us refine the tests to get rid of these)
My thoughts on a few of the proposals so far. To be interpreted as "yes, and ..." :-)
Testing the RC on staging instances / Leading lights:
as someone who deploys RC releases to production clients, I think that partners have to test it on their infrastructure. For example, I use PHP 7.0 and MariaDB 10.2, so I tend to discover more MariaDB 10.2 bugs. Often, the bugs we discover are things that do not have test coverage. We rely on: the reporterror extension, and increasingly towards central logging with Elasticsearch (which I'm adding to reporterror).
Of course, we don't want to cause too much disruption, so we upgrade progressively our clients, and we don't upgrade every month (usually every 3-4 months, with backports of patches).
We deploy to prod some PRs that we review.
While I can imagine people telling us we are taking too many risks: it's no different than deploying to prod a final release that has not had much testing. Someone has to be an early adopter. Every 3-4 releases, that's us. Another release, it's (hopefully) another shop.
CiviCRM databases are difficult to anonymise. There might be custom extensions, Drupal configs, that can give away too much information.
Measuring RC participation / Canvassing:
We do have some metrics, because we do see the PRs against an RC branch, or PRs fixing a newly introduced feature. We should find an easy way to label those PRs. It would give us the info we need.
Before a release, the dev-post-release channel (which I also see as the RC-channel) could hold brief status reports from people testing the RC.
Contributors can use the "contribution log" on civicrm.org to report their hours spent on RC testing.
CiviCoins:
We do have the contributor log. This is not limited to partners. Non-partners who get to a certain level of contributions will be shown on the civicrm.org/experts page.
Task: Set up an event for each RC release. When the RC is released, send an event invitation to a list of potential contributors, the invitation to include:
a description of different ways of participating in RC testing
accept/decline links.
Cost: Identify suitable platform, e.g. CiviEvent. Set up events. Send invitations. Curate contributor list.
Rationale: Those who are regularly involved in the QA & release process have the release schedule burned into their brains but this may not be the case for the potential contributors we wish to attract. Some people may have the resources to help occasionally but not every month. Some may not know how to participate or believe they don't have the skills. This approach:
publicises the RC and the importance of testing
educates about how to participate in RC testing
draws people into making a positive decision about whether to participate
allows identifying & following up those who have said they will participate
Benefits: If successful, increases the pool of RC testers.
Get some constructive processes about dealing with regressions
We all know regressions happen but I don't think we are really gathering information about them & learning from them beyond the level of 'I should stop trying because I got burnt' or 'I should beat this list of people up because there was a regression'.
We can't achieve all QA within the review process - it's very squeezed on resources and unlike rc testing it's not something 'anyone can do' but I also think that if we had good processes around identifying and doing retrospectives on regressions we could learn from them & hopefully reduce them.
In some cases the reason regressions happen involve our wider community environment. For example I can think of regressions where I saw the PR and noticed the risk but didn't comment, for complex reasons which I don't think make sense to go into outside the scenario of 'here we have a new regression how did that happen'.
The challenge with learning from regressions is that it involves open tracking & discussion of problems and it requires a no-blame environment - which is something I think we historically had but have had blips in. One way in which I think that can be achieved is treating the day we start this process as 'day zero' and not entertaining discussion of any changes prior to that day and tracking regressions solidly from then. We also need to commit to understanding that people can't get it all right (that's where rc testing is so important) and also realise that most of the mistakes will be made by the people doing most of the work.
Refining the automated tests on staging instances: I support the idea of being able to test on real databases. The ideal would be to download a docker image ready to run all the tests that just takes the path to a CiviCRM database and then spits out a report of the test results. Then, we would all run the tests ourselves on our own dev machines. I realize this might fall out of scope of relatively easy projects but I think submitting client databases to be tested is a non-starter for privacy reasons, so we'd need something along these lines for this approach to be practical.
Re: implementation of automated tests on staging instances. Customer instances could require custom extensions, and could have dependencies on their hosting environment. These might be difficult to replicate in a standard docker container. Another possibility is to produce an extension that would run the tests, produce a report in a standard format from the output of phpunit, and have a 'Send to mothership' button. This way partners could run the tests on their own, check the results, correct the false positives, and send the failures in a standard format to higher authorities (micro-service? Jenkins? post-dev channel?). This would have the additional benefit of testing in various environments (OS family and version, PHP/MySQL/MariaDB/Apache/ngnix flavors and versions, combination of extensions and their versions, etc.)
submitting client databases to be tested is a non-starter for privacy reasons
Agreed. I don't think it's a complete solution (input welcome!), but org.civicrm.contrib.anonymize is intended to sanitize CiviCRM DBs for you to safely move them to a testing environment (whether local to you, or centralised).
+1 Nicolas comments re infrastructure and custom code.