I am interested in the above project because from research it will lead to less developer frustration by reducing time spent on manual PR code reviews, as well as help maintain a healthy and consistent code base among others.
Apart form PHPSTAN, Psalm from Vimeo and Phan can also help us run static analysis on the codebase. These three open source tools were the most reoccurring in my research.
Personally I prefer psalm as I found it's error messages more descriptive. I have been finding it difficult to set up phan as is requires some not so clear configurations.
I have some questions though
What would be the major determinant for selecting a tool? I am guessing a static analysis tool for CIVICRM is just required to detect code errors early enough but most of them can do more than this
I have been wondering if integrating any of these open source tools will take up to 3 months as per GSOC required time frame since configuring Composer packages on Jenkins servers doesn't take much time
Also on which repositories will this static analysis tools be integrated?
Lastly this integration does not seem to require much coding, I might be wrong but Jenkins plugins do most of the trick. I would love a bit of enlightenment on this as I think GSOC favours code centric projects
I am looking forward to your reply thanks
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
@prondubuisi I think we want to determine the extent of code coverage that our existing automated test suites provide. Hmm...maybe that is not something that a static code analysis tool can provide.
My sense is we likely want to get some good baselines for each of our repos, and maybe separate reports for certain sensitive directories like api/, CRM/Financial/, CRM/Contribute/. It would be good to see how merging a PR will cause or fix errors picked up by a static code analysis tool. We probably want to tool writing the output for each job into specific directories, and making that available on pages like http://site-list.test-1.civicrm.org:8001/?filter=core-13816-*
We would also like to be able to generate lists of issues that will arise in our codebase when installed with newer versions of PHP, eg show statements that will no longer work when upgrading from 7.2 to 7.3.
@prondubuisi we may want to use different tools for different purposes if their features are sufficiently distinct. Eg a fast tool that generates few false positives to run as PRs are submitted, but a longer running tool with noisier output on demand to check for other issues, eg PHP version errors prior to a release.
@totten or @colemanw or @bgm would you be able to suggest someone who could help supervise a project that would do static code analysis, perhaps on PRs, maybe via a command in comment on PR (eg Jenkins, please do static analysis).
We would want to make this project focussed not on a proof of concept, but on getting a polished, working new set of testing integrated into our infrastructure.
Okay @JoeMurray We want different tools for different analysis on the code base. I believe all the cases for which we will be needing analysis tools integrated might actually come under this project. Once they can be achieved within the GSOC Timeline. I will make my research based on the needs you mentioned and ask questions on any bottlenecks I encounter.
For code coverage civiccrm uses PHPUNIT for testing and Logging for code coverage is turned on here phpunit.xml we probably just need to make jenkins generate the reports
This is a neat idea. psalm looks impressive, and it'd be grand for civicrm-core (and other projects) to have more accurate type information. I'm not sure I'd have capacity to mentor, but let add some quick feedback.
It is, indeed, pretty easy to run composer require -- and executing on a project like this is also harder than that:
I ran psalm locally on the civicrm-core project, and it crashed. This is not surprising -- when I first tried phpcs a few years ago, it also crashed. Ditto phpmd. It's been a million years since I tried PHPUnit's code-coverage functionality, but I also feel it had a similarly glaring challenge (...like killing test performance or being unworkable with the E2E/WebTest suites). This, basically, is because it's hard to make QA tools perfect, and CiviCRM is big+complicated (compared to a typical PHP package), so there's a high probability of encountering something weird.
Of course, it's all just software, and you can run it repeatedly in isolation, so it's amenable to problem-solving (e.g. pulling up a debugger; e.g. hacking in log/print statements; e.g. iteratively changing the scope of the work). I suppose it's an opportunity to sharpen one's debugging skills and/or deep-dive into the static-analyzer, but I can't tell how hard that'll be. (30min? 14days? Dunno.)
IMHO, for something like psalm to be useful, the installation has to be done in tandem with a non-trivial code-cleanup effort. Turning on the tool produces an instant-backlog of code-cleanup issues. Now, psalm has some cool-looking options (like --set-baseline and <issueHandlers>) that can help one manage the roll-out. But the backlog is still there. If we just install the tool and ignore its advice, then we haven't made the product any better (and we haven't made our people any smarter).
It may be interesting to consider how phpcs rolled out before. The process I followed there was:
Locally, install phpcs with a ruleset that was close to Civi's conventions. Run it on civicrm-core. Investigate and work-around the crashes. Get some empirical data, look for patterns, figure a way to prioritize, try some bulk cleanups. As much as possible, submit those cleanups as digestable PRs (and think hard about the framing/timing for the commits+PRs -- another person will need to validate them). Tweak/relax rules that are too noisy/overly persnickity. Hopefully, this will narrow the backlog down to something tough-but-manageable (like a few hundred oddball problems instead of several thousand identical problems).
Try to engage other developers -- so that they can help with the remaining backlog. Try to get some influential folks to help (and to be loud about how they're helping). Take whatever advice you give the others -- and do it yourself. Be prepared to hold some hands. Lead by example.
After there's been a good chunk of progress, do a last round to figure out anything important which has fallen through the cracks.
Setup CI -- so that the ruleset continues to be enforced automatically in the future.
Notably, the bulk of the project was in steps 1-3, and those steps didn't require any special access to the
infrastructure -- it needed just a local dev environment, Github, and chat/email. That's a good thing, because it means there aren't as many roadblocks, and there's a wider pool of developers (potential mentors) who'd be qualified to help.
Special thanks @totten. Nice breakdown. I will continue tinkering on what would be feasible for everyone. I will ask more questions as I continue my research. Thanks again.
Hello @JoeMurray@totten apart from civicrm-core which other repositories here or elsewhere do you think will potentially benefit from a static analysis tool integration?
Thanks @JoeMurray I will see what I can come up with. Hopefully We can get mentors conversant with the above repositories and willing to help on this project.
JoeMurraychanged title from Integrate an open source Static Analysis tool as part of CIVICRM continuos integration toolchain to Integrate an open source Static Analysis tool as part of CIVICRM continuous integration toolchain
changed title from Integrate an open source Static Analysis tool as part of CIVICRM continuos integration toolchain to Integrate an open source Static Analysis tool as part of CIVICRM continuous integration toolchain
If I recall the outcome here, there was some initial work that ran static code analysis against the CiviCRM codebase. This preliminary step generated so much noisy failures that it was not useful. There wasn't adequate tuning of the rules to use in the analysis in order to lower the noise to a level sufficient to make the outcome helpful.
Nor was it possible to work on addressing the 'noise' by refactoring the code in a comprehensive and safe and timely way. So the initiative was helpful as exploration, but not so much for improving the codebase or setting up on-going static analysis.
@prondubuisi can you correct/supplement this analysis, or maybe provide a link to the project's report?