Integrate an open source Static Analysis tool as part of CIVICRM continuous integration toolchain

@prondubuisi I think we want to determine the extent of code coverage that our existing automated test suites provide. Hmm...maybe that is not something that a static code analysis tool can provide.

My sense is we likely want to get some good baselines for each of our repos, and maybe separate reports for certain sensitive directories like api/, CRM/Financial/, CRM/Contribute/. It would be good to see how merging a PR will cause or fix errors picked up by a static code analysis tool. We probably want to tool writing the output for each job into specific directories, and making that available on pages like http://site-list.test-1.civicrm.org:8001/?filter=core-13816-*

We would also like to be able to generate lists of issues that will arise in our codebase when installed with newer versions of PHP, eg show statements that will no longer work when upgrading from 7.2 to 7.3.

@prondubuisi we may want to use different tools for different purposes if their features are sufficiently distinct. Eg a fast tool that generates few false positives to run as PRs are submitted, but a longer running tool with noisier output on demand to check for other issues, eg PHP version errors prior to a release.

@totten or @colemanw or @bgm would you be able to suggest someone who could help supervise a project that would do static code analysis, perhaps on PRs, maybe via a command in comment on PR (eg Jenkins, please do static analysis).

We would want to make this project focussed not on a proof of concept, but on getting a polished, working new set of testing integrated into our infrastructure.

Okay @JoeMurray We want different tools for different analysis on the code base. I believe all the cases for which we will be needing analysis tools integrated might actually come under this project. Once they can be achieved within the GSOC Timeline. I will make my research based on the needs you mentioned and ask questions on any bottlenecks I encounter.

For code coverage civiccrm uses PHPUNIT for testing and Logging for code coverage is turned on here phpunit.xml we probably just need to make jenkins generate the reports

This is a neat idea. psalm looks impressive, and it'd be grand for civicrm-core (and other projects) to have more accurate type information. I'm not sure I'd have capacity to mentor, but let add some quick feedback.

It is, indeed, pretty easy to run composer require -- and executing on a project like this is also harder than that:

I ran psalm locally on the civicrm-core project, and it crashed. This is not surprising -- when I first tried phpcs a few years ago, it also crashed. Ditto phpmd. It's been a million years since I tried PHPUnit's code-coverage functionality, but I also feel it had a similarly glaring challenge (...like killing test performance or being unworkable with the E2E/WebTest suites). This, basically, is because it's hard to make QA tools perfect, and CiviCRM is big+complicated (compared to a typical PHP package), so there's a high probability of encountering something weird.

Of course, it's all just software, and you can run it repeatedly in isolation, so it's amenable to problem-solving (e.g. pulling up a debugger; e.g. hacking in log/print statements; e.g. iteratively changing the scope of the work). I suppose it's an opportunity to sharpen one's debugging skills and/or deep-dive into the static-analyzer, but I can't tell how hard that'll be. (30min? 14days? Dunno.)
IMHO, for something like psalm to be useful, the installation has to be done in tandem with a non-trivial code-cleanup effort. Turning on the tool produces an instant-backlog of code-cleanup issues. Now, psalm has some cool-looking options (like --set-baseline and <issueHandlers>) that can help one manage the roll-out. But the backlog is still there. If we just install the tool and ignore its advice, then we haven't made the product any better (and we haven't made our people any smarter).

It may be interesting to consider how phpcs rolled out before. The process I followed there was:

Locally, install phpcs with a ruleset that was close to Civi's conventions. Run it on civicrm-core. Investigate and work-around the crashes. Get some empirical data, look for patterns, figure a way to prioritize, try some bulk cleanups. As much as possible, submit those cleanups as digestable PRs (and think hard about the framing/timing for the commits+PRs -- another person will need to validate them). Tweak/relax rules that are too noisy/overly persnickity. Hopefully, this will narrow the backlog down to something tough-but-manageable (like a few hundred oddball problems instead of several thousand identical problems).
Try to engage other developers -- so that they can help with the remaining backlog. Try to get some influential folks to help (and to be loud about how they're helping). Take whatever advice you give the others -- and do it yourself. Be prepared to hold some hands. Lead by example.
After there's been a good chunk of progress, do a last round to figure out anything important which has fallen through the cracks.
Setup CI -- so that the ruleset continues to be enforced automatically in the future.

For better or worse, after step How to edit a project (or add a new one) (#1)#2 (closed), we made very few changes to the phpcs ruleset -- so the depth of the initial cleanup determined the depth of the long-term policy.

Notably, the bulk of the project was in steps 1-3, and those steps didn't require any special access to the infrastructure -- it needed just a local dev environment, Github, and chat/email. That's a good thing, because it means there aren't as many roadblocks, and there's a wider pool of developers (potential mentors) who'd be qualified to help.

Special thanks @totten. Nice breakdown. I will continue tinkering on what would be feasible for everyone. I will ask more questions as I continue my research. Thanks again.

Hello @JoeMurray @totten apart from civicrm-core which other repositories here or elsewhere do you think will potentially benefit from a static analysis tool integration?

Additional options https://github.com/civicrm/cv https://github.com/civicrm/civicrm-buildkit https://github.com/civicrm/civicrm-buildkit-vagrant https://github.com/civicrm/civicrm-statistics https://github.com/civicrm/civicrm-infra https://github.com/civicrm/org.civicrm.testapalooza

Thanks @JoeMurray I will see what I can come up with. Hopefully We can get mentors conversant with the above repositories and willing to help on this project.

changed title from Integrate an open source Static Analysis tool as part of CIVICRM continuos integration toolchain to Integrate an open source Static Analysis tool as part of CIVICRM continuous integration toolchain

changed the description

If I recall the outcome here, there was some initial work that ran static code analysis against the CiviCRM codebase. This preliminary step generated so much noisy failures that it was not useful. There wasn't adequate tuning of the rules to use in the analysis in order to lower the noise to a level sufficient to make the outcome helpful.

Nor was it possible to work on addressing the 'noise' by refactoring the code in a comprehensive and safe and timely way. So the initiative was helpful as exploration, but not so much for improving the codebase or setting up on-going static analysis.

@prondubuisi can you correct/supplement this analysis, or maybe provide a link to the project's report?

closed

reopened

Hello @JoeMurray Great to hear from you! Your conclusions are right! Here is a link to my report. Cheers.

Thanks, @prondubuisi !

Integrate an open source Static Analysis tool as part of CIVICRM continuous integration toolchain

Child items ...

Activity