Infrastructure issueshttps://lab.civicrm.org/groups/infra/-/issues2024-01-05T16:53:09Zhttps://lab.civicrm.org/infra/ops/-/issues/1018Move latest.civicrm.org from OSUOSL to paella.civicrm.org (OVH)2024-01-05T16:53:09ZbgmMove latest.civicrm.org from OSUOSL to paella.civicrm.org (OVH)* [x] OVH: assign an IP address, add a "virtual mac" of type OVH, use the full hostname as the vMAC name
* [x] KVM server: create a ZFS volume for the VM
* ex: `zfs create -s -V 70G [pool]/[name-of-vm]` (see `zpool status; zfs list`)
*...* [x] OVH: assign an IP address, add a "virtual mac" of type OVH, use the full hostname as the vMAC name
* [x] KVM server: create a ZFS volume for the VM
* ex: `zfs create -s -V 70G [pool]/[name-of-vm]` (see `zpool status; zfs list`)
* [x] Ansible: copy a relevant example from `host_vars/[vm]` for the new server, adapt values (hostname and IPs)
* [x] Ansible: add the hostname in the `hosts` file
* [x] Ansible: add the hostname as a preseed in `host_vars/kvm-foo.example.org` (the parent server)
* [x] Ansible: generate the preseed file: `ansible-playbook -l kvm-foo.example.org --tags kvm-server-preseeds ./site.yml`
* [x] KVM server: start the installation
* `ssh root@x[...].example.org`
* `/etc/preseeds/[hostname]/start.sh`
* [x] Change the preseed password
* [x] Ansible (create deploy user): `ansible-playbook -l xxxx.example.org -u myuser --become-user=root --ask-become-pass ./setup.yml`
* [x] Ansible (full installation): `ansible-playbook -l xxxx.example.org ./site.yml`
* [x] Test that the VM reboots cleanly
* [x] Migrate the services from latest.civicrm.org
* [x] latest.civicrm.org pingback service (and mysql DB)
* [x] stats.civicrm.org (static? deprecate?)
* [x] releaser files
* [x] Monitoring: update the host in Icinga and re-enable icinga on the new VM (was disable to avoid conflicts)
* [x] Update the A/AAAA records for latest.civicrm.org and stats.civicrm.org
* [x] OVH: configure rDNS for the IPv4 and IPv6 addresses
* [x] Backups: double-check that backups are running (they were initially disabled, not to caused conflicts with the current production VM)
* [x] Verify that monitoring is green
* [ ] Shutdown the old VM?
Old DNS records:
- 140.211.167.189
- 2605:bc80:3010:102:0:3:5:0
New DNS records:
- 192.95.2.135
- 2607:5300:203:6713:700::https://lab.civicrm.org/infra/ops/-/issues/1017Test runs do composer install before applying the PR patch, meaning changes t...2023-12-29T22:43:01ZDaveDTest runs do composer install before applying the PR patch, meaning changes to composer.json/lock aren't being testede.g. https://github.com/civicrm/civicrm-core/pull/28813
Possibly because of https://github.com/civicrm/civicrm-buildkit/commit/3081d83c648f6aa244f33380e7a3054deb515bf7#diff-652e839cd6819e54de3eafe2ac9126fa3da1d288d271a0fa681f28048111fa2...e.g. https://github.com/civicrm/civicrm-core/pull/28813
Possibly because of https://github.com/civicrm/civicrm-buildkit/commit/3081d83c648f6aa244f33380e7a3054deb515bf7#diff-652e839cd6819e54de3eafe2ac9126fa3da1d288d271a0fa681f28048111fa22R25 ?https://lab.civicrm.org/infra/ops/-/issues/1016Upgrade Java v17 for jenkins and nodes2023-12-11T20:28:09ZbgmUpgrade Java v17 for jenkins and nodeshttps://lab.civicrm.org/infra/ops/-/issues/1014Deprecate the l10n tar.gz2024-02-07T19:23:27ZbgmDeprecate the l10n tar.gzIf [PR#28139](https://github.com/civicrm/civicrm-core/pull/28139) is merged, we could get rid of the civicrm-l10n.tar.gz once the last ESR that relies on it is unsupported.
- [ ] Make sure that install docs and i18n guide (wiki page) ha...If [PR#28139](https://github.com/civicrm/civicrm-core/pull/28139) is merged, we could get rid of the civicrm-l10n.tar.gz once the last ESR that relies on it is unsupported.
- [ ] Make sure that install docs and i18n guide (wiki page) have instructions about the new way to install languages since https://github.com/civicrm/civicrm-core/pull/28061 (CiviCRM 5.69)
- [ ] dev/drupal#193 Deprecate the old installer (including from the Drupal7 drush module)
- [ ] Remove it from buildkit builds
- [ ] Remove the link from the civicrm.org/download page
- [ ] Stop generating the l10n tarballshttps://lab.civicrm.org/infra/ops/-/issues/1013mysql 8.0.29 being used for "max" but it can't even be downloaded anymore bec...2023-11-12T20:09:23ZDaveDmysql 8.0.29 being used for "max" but it can't even be downloaded anymore because it has a bad bugIt came up in another context that 8.0.29 is what's being used, but see https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html
> This release is no longer available for download. It was removed due to a critical issue that cou...It came up in another context that 8.0.29 is what's being used, but see https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html
> This release is no longer available for download. It was removed due to a critical issue that could cause data in InnoDB tables having added columns to be interpreted incorrectly. Please upgrade to MySQL 8.0.30 instead.
I don't know if the version in use is from a particular distribution that backports patches so maybe is not affected, but also the latest is 8.0.35.https://lab.civicrm.org/infra/ops/-/issues/1011Moving services off the OSUOSL cluster2024-01-04T22:26:43ZbgmMoving services off the OSUOSL clusterThe OSUOSL cluster was setup around 2014-2015, close to 10 years. They have been reliable and very cost-effective, thanks to OSUOSL. However, for performance and for planning the future, we should start slowly moving some services off th...The OSUOSL cluster was setup around 2014-2015, close to 10 years. They have been reliable and very cost-effective, thanks to OSUOSL. However, for performance and for planning the future, we should start slowly moving some services off those machines. Notably, when we had incidents upgrading some VMs, restoring MySQL backups was extremely slow.
The following VMs are on OSUOSL:
- lab.civicrm.org
- chat.civicrm.org
- www-prod.civicrm.osuosl.org (mostly docs, and some services, such as community-messages)
- latest.civicrm.org (stats/pingbacks)
- test-2.civicrm.org (jenkins node)
Less critical:
- manage.civicrm.osuosl.org (used mostly as an Ansible ProxyJump host, but some servers might still be configured to use LDAP)
Not used / mostly shutown:
- cxnapp-2
- www-cxn-2
We currently have VMs also at Linode (www-prod-2) and OVH (paella and test-3). The `test-3` server is dedicated to running tests, but paella runs these services:
- botdylan.civicrm.org (github bot),
- test-1.civicrm.org
- www-demo (for sandbox sites)
- www-test (not used, was for hosting a test site for civicrm.org)
Paella has around 230 GB of free disk space.
Internal reference: https://chat.civicrm.org/civicrm/pl/ocbbd61xsbg9xphwrjk4eq8wswhttps://lab.civicrm.org/infra/ops/-/issues/1010extdir: upgrade to a more recent PHP version (ideally 8.0 or later)2023-10-25T17:37:22Zbgmextdir: upgrade to a more recent PHP version (ideally 8.0 or later)Currently runs on PHP 7.2.
civicrm.org runs on PHP 8.0.
Repo: https://lab.civicrm.org/infrastructure/extdir.gitCurrently runs on PHP 7.2.
civicrm.org runs on PHP 8.0.
Repo: https://lab.civicrm.org/infrastructure/extdir.githttps://lab.civicrm.org/infra/ops/-/issues/1009Extension directory times out when uncached2023-11-01T14:26:49ZJonGoldExtension directory times out when uncachedCivi's Guzzle connections time out after the number of seconds specified in the `http_timeout` setting. Which, by default, is 5 seconds.
As the extension directory has grown, the amount of time needed to generate the results of, say, `...Civi's Guzzle connections time out after the number of seconds specified in the `http_timeout` setting. Which, by default, is 5 seconds.
As the extension directory has grown, the amount of time needed to generate the results of, say, `https://civicrm.org/extdir/ver=5.68.alpha1|uf=Drupal|status=stable|ready=` now exceeds 5 seconds.
This leads to timeouts when accessing the directory - difficult to troubleshoot because the next time, the cached results are returned quickly.
I thought I'd raise this as an infra issue first. If reducing the time needed to generate the results isn't possible, I'll submit a PR to raise the connection timeout specifically when loading the extension directory.https://lab.civicrm.org/infra/ops/-/issues/1008promtail: ansible config for nginx logs on Gitlab2023-09-29T15:57:48Zbgmpromtail: ansible config for nginx logs on GitlabI updated the promtail config on most webservers, but need to add some adjustment in Ansible so that promtail can ingest this file on lab.c.o : `/var/opt/gitlab/nginx/logs/gitlab_error.log`
(I had mostly used a template that was designe...I updated the promtail config on most webservers, but need to add some adjustment in Ansible so that promtail can ingest this file on lab.c.o : `/var/opt/gitlab/nginx/logs/gitlab_error.log`
(I had mostly used a template that was designed for standard Debian servers, where nginx logs are in `/var/log/nginx/access.log`)
and latest.civicrm.org logs are missing toobgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/1007Move civicrm extension extraction to Gitlab Pipeline2023-09-24T18:54:22ZbgmMove civicrm extension extraction to Gitlab PipelineI find the current Jenkins job to update extensions on Transifex inefficient to do on a daily basis:
- it fetches all extensions in the directory (filters by those available in-app)
- looks for new releases
- extracts and updates Transi...I find the current Jenkins job to update extensions on Transifex inefficient to do on a daily basis:
- it fetches all extensions in the directory (filters by those available in-app)
- looks for new releases
- extracts and updates Transifex
Eventually it does other things, like fetch Transifex translations, commit to repo, build the mo files. Those things make sense on a daily basis.
Sometimes we want to re-run the extraction on a single extension. For example, recently we had an issue with the gdpr extension, because the maintainers are mixing `vX.Y` and `X.Y` tags.
I did a test to move the "update transifex" process to a Gitlab Pipeline. Personally, I like that Gitlab uses Docker to manage the environment, so it's more clear/self-documented how the job is setup.
The proof of concept can be seen here:
https://lab.civicrm.org/dev/translation/-/blob/master/.gitlab-ci.yml
Note: the config is split in two tasks, so that for testing we can more easily run only one or the other (extract / commit). We can merge them once it's well-tested.
What's missing:
- [x] Configure the Transifex token in the Gitlab CI/CD settings of the project
- [x] Configure a Github token so that the pipeline can commit (personal token added to my `mlutfy-civicrm` account)
- [ ] Setup a Gitlab webhook so that we can trigger the pipeline for new releases
- [ ] Call the webhook when new releases are published on civicrm.org (extdir module, `modules/custom/extdir/extdir.drush.inc`)https://lab.civicrm.org/infra/ops/-/issues/1005Migrate all backups from rdiff-backup to borg/borgmatic2024-01-15T01:38:58ZbgmMigrate all backups from rdiff-backup to borg/borgmaticPriority:
- [x] #970 lab.civicrm.org - runs Ubuntu 22.04
- [x] www-prod.civicrm.osuosl.org - runs Ubuntu 22.04
- [x] latest.civicrm.org - runs Ubuntu 20.04
- [x] chat.civicrm.org - runs Ubuntu 20.04
- [x] spark-1.civicrm.org - needs upg...Priority:
- [x] #970 lab.civicrm.org - runs Ubuntu 22.04
- [x] www-prod.civicrm.osuosl.org - runs Ubuntu 22.04
- [x] latest.civicrm.org - runs Ubuntu 20.04
- [x] chat.civicrm.org - runs Ubuntu 20.04
- [x] spark-1.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm
- [x] spark-2.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm (spark-2 was already running borg, because we backup to a EU server)
- [x] www-prod-2.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), ~~then Debian Bookworm~~ (done)
- [x] www-prod-2: also backup to backups-1.c.o
Followed by:
- [ ] botdylan.civicrm.org - needs upgrade to Debian Bullsye, then Debian Bookworm
- [x] test.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm
- [ ] www-demo.civicrm.org (rdiff currently broken) - needs upgrade to Debian Bookworm
Low priority:
- [ ] backups-1.civicrm.org
- [ ] barbecue.civicrm.org
- [ ] padthai.civicrm.org
- [ ] paella.civicrm.org
- [ ] test-1.civicrm.org
- [ ] test-2.civicrm.org
- [ ] test-3.civicrm.org
These can probably be ignored:
- [x] cxnapp-2.civicrm.org (offline)
- [x] www-cxn-2.civicrm.osuosl.org (offline)
- [x] manage.civicrm.osuosl.org (not used anymore, except as a ProxyJump)
- [x] www-test.civicrm.org (offline?)
For each server:
- Verify includes/excludes
- Setup with Ansible
- Update monitoring
We have 116 GB available on sushi, 180 GB used, so presumably we will run out of space if we do them all at once, instead of waiting a bit to purge some old rdiff backups (after .. 6 months?). Although Gitlab is one of the bigger backups, and it's already cleaned up.https://lab.civicrm.org/infra/ops/-/issues/1004(Test Systems) Java out-of-memory leads to zombie worker2023-09-07T20:34:42Ztotten(Test Systems) Java out-of-memory leads to zombie worker(Originated on MM chat: https://chat.civicrm.org/civicrm/pl/otynrhfqeintdbwqpnjjakjdia. Note that it starts out with two different problems; one rc tarball problem is cleared up quickly. This issue is about the other problem. I'm taking ...(Originated on MM chat: https://chat.civicrm.org/civicrm/pl/otynrhfqeintdbwqpnjjakjdia. Note that it starts out with two different problems; one rc tarball problem is cleared up quickly. This issue is about the other problem. I'm taking the observations about it and trying to compose a full hypothesis of the problem.)
Suppose you have a job like https://test.civicrm.org/job/CiviCRM-Core-Matrix-PR/4551/BKPROF=dfl,SUITES=phpunit-crm,label=bknix-tmp/console -- the job is interrupted because one of the Java based agents (Jenkins master or Jenkins agent) runs out of memory.
```
Installing build4test_qpf3m database
ok 1806 - CRM_Dedupe_MergerTest::testBatchMergeSelectedDuplicates
ok 1807 - CRM_Dedupe_MergerTest::testBatchMergeAllDuplicates
ok 1808 - CRM_Dedupe_MergerTest::testGetCidRefs
ok 1809 - CRM_Dedupe_MergerTest::testGetMatches
ok 1810 - CRM_Dedupe_MergerTest::testGetMatchesExcludeDeleted with data set #0 (true)
ok 1811 - CRM_Dedupe_MergerTest::testGetMatchesExcludeDeleted with data set #1 (false)
ok 1812 - CRM_Dedupe_MergerTest::testGetMatchesIgnoreLocationType
ok 1813 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatched
ok 1814 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatchedWithLimit
ok 1815 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatchedWithSearchLimit
ok 1816 - CRM_Dedupe_MergerTest::testGetMatchesNoCriteria
ok 1817 - CRM_Dedupe_MergerTest::testGetMatchesNoCriteriaButLimit
ok 1818 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaNotMatched
FATAL: command execution failed
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:102)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected reader termination
at hudson.remoting.SynchronousCommandTransport$ReaderThread.lambda$new$1(SynchronousCommandTransport.java:50)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused: java.io.IOException: Backing channel 'test-4' is disconnected.
at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
at com.sun.proxy.$Proxy74.isAlive(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1215)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1207)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:164)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
at hudson.model.Run.execute(Run.java:1900)
at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
at hudson.model.ResourceController.execute(ResourceController.java:107)
at hudson.model.Executor.run(Executor.java:449)
FATAL: Unable to delete script file /tmp/jenkins8376836365637261492.sh
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:102)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected reader termination
at hudson.remoting.SynchronousCommandTransport$ReaderThread.lambda$new$1(SynchronousCommandTransport.java:50)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@4a72872d:test-4": Remote call on test-4 failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:993)
at hudson.FilePath.act(FilePath.java:1186)
at hudson.FilePath.act(FilePath.java:1175)
at hudson.FilePath.delete(FilePath.java:1722)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:164)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
at hudson.model.Run.execute(Run.java:1900)
at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
at hudson.model.ResourceController.execute(ResourceController.java:107)
at hudson.model.Executor.run(Executor.java:449)
Build step 'Execute shell' marked build as failure
ERROR: Step ‘Publish xUnit test result report’ failed: no workspace for CiviCRM-Core-Matrix-PR/BKPROF=dfl,SUITES=phpunit-crm,label=bknix-tmp #4551
Finished: FAILURE
```
Here's what happens next:
* Jenkins kills the communication channel.
* Jenkins assumes that the worker-node kills any ongoing work for the test-job.
* Jenkins establishes a new communication channel and begins running new jobs.
* **But** the worker-node did *not* kill everything. (*I'm not clear exactly what it did do -- eg if any POSIX signals were sent; eg if worker processes are running or suspended.*) For example, `mysqld` is present in the process-table, and it retains a hold on TCP port 5601.
* When Jenkins begins another job, it finds the worker-image (`/home/dispatcher/images/bknix-dfl-2.img`) is in use. In fact, all of the images are in use. So it creates a new one (`bknix-dfl-5.img`).
* When Jenkins starts using `bknix-dfl-5.img`, it tries to launch a new mysqld on TCP port 5601. But it can't; the port is conflicted. You get problems [like this](https://test.civicrm.org/job/CiviCRM-Core-Matrix-PR/4553/BKPROF=dfl,SUITES=phpunit-api4,label=bknix-tmp/console):
```
[mysql] Start daemon: mysqld --datadir="/home/homer/_bknix/ramdisk/worker-3/mysql/data"
[mysetup] Initialize folder: /home/homer/_bknix/ramdisk/worker-3/mysetup
Waiting for MySQL (maxWait=300, interval=0.5, windDown=0.5)...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/home/homer/_bknix/ramdisk/worker-3/mysql/run/mysql.sock' (2)
```
If you have several jobs running at the moment of `OutOfMemory`, then you may wind up doing this multiple times (e.g. 3 jobs die; 3 zombies left behind; 3 new images created; 3 tcp ports blocked).
----------
> (Follow-up) *I'm not clear what the exact status is -- if any POSIX signals sent; if worker processes are running or suspended.*
What I did observe was that these zombie processes were still around ~2 hours after the original. But after ~2h30m, they had gone way on their own.
If the jobs were quietly executing in a headless fashion, then they should've wrapped up in <30min. So they're probably not running -- it suggests they're somehow suspended (and then some other timeout/reaper mechanism comes in after 2hr). But this is pure speculation.
---------
Brainstorming...
* Maybe figure out which java process ran out of memory -- and why.
* Maybe figure out what - if any - signals are being emitted when this OOM happens. Find a way to kill the zombies/orphans properly.
* Maybe introduce some firmer timeouts within the jobs. (We usually rely on Jenkins to timeout jobs; but obviously that doesn't work here. Try sprinkling `timeout` calls into `CiviCRM-Core-Matrix-PR.job` or similar leverage-point.)
* Maybe change the port-allocation function.
* Maybe enable network-namespaces for purposes for any non-interactive jobs.tottentottenhttps://lab.civicrm.org/infra/ops/-/issues/1003Investigate SPF "Soft fail"2023-09-07T01:33:13ZtottenInvestigate SPF "Soft fail"I got one of the emails from the `civicrm.org` security announcements. In Gmail, there's an option "Show original" which has an interesting report. The message makes it to my inbox (perhaps because of the history; perhaps because of the ...I got one of the emails from the `civicrm.org` security announcements. In Gmail, there's an option "Show original" which has an interesting report. The message makes it to my inbox (perhaps because of the history; perhaps because of the DKIM), but it shows a failure about SPF.
@bgm I'm not sure how we're routing mail right now. Perhaps we need some DNS tweak?
> ![Screen_Shot_2023-09-06_at_6.24.14_PM](/uploads/156f371c5e9e404b73c971535be156cf/Screen_Shot_2023-09-06_at_6.24.14_PM.png)https://lab.civicrm.org/infra/ops/-/issues/1002Migrate/integrate download.civicrm.org with civicrm.org2023-09-01T10:02:50ZtottenMigrate/integrate download.civicrm.org with civicrm.orgIn some side discussion about #1001, @colemanw and @bgm suggested migrating or integrating `download.civicrm.org` with `civicrm.org`. I wanted to record an issue to capture this.
How:
* Add a D9/D10 module on `civicrm.org` which either...In some side discussion about #1001, @colemanw and @bgm suggested migrating or integrating `download.civicrm.org` with `civicrm.org`. I wanted to record an issue to capture this.
How:
* Add a D9/D10 module on `civicrm.org` which either:
1. Migrates the PHP logic from the `download.civicrm.org`, or
2. Forwards HTTP sub-requests to `download.civicrm.org`.
Upshots:
* This lets you inherit the navigation, site-wide theming, and analytics.
* It's written with Symfony page-controllers and Twig, which are also supported by D9/D10.
* It doesn't have any interdependencies on Drupal content ("nodes" and "files"), so it should be fairly easy to install/maintain such a module on a local dev-site.
There are a few things to bear in mind:
* `download.civicrm.org` has a few areas of functionality: autobuild info (eg https://download.civicrm.org/latest/), redirects (eg `https://download.civicrm.org/civicrm-X.Y.Z-foo.tar.gz`), and release info (eg https://download.civicrm.org/about/). Each has a few subpages/features.
* Its basic purpose is to list/filter/cache information about the available builds (from Google Cloud Storage). It blends in some additional data from (1) release-notes in Github and (2) JSON files provided by each build.
* From the POV of a general reader on `civicrm.org`, some functionality (like "inspecting the git input used by a candidate build") is niche. But it's still useful for release-management. Migrating/integrating means you may have to reconcile more opinions about what to present.
* It's not currently designed around composable/mixable `block`s. It's just a couple HTML pages. But in Drupal, in the long-run, it probably makes sense to do more of the `block` stuff.https://lab.civicrm.org/infra/extdir/-/issues/8allow sorting (e.g. "Newest") in Extensions Directory2023-06-16T18:58:12ZAllenShawallow sorting (e.g. "Newest") in Extensions DirectoryI'm trying to encourage people to keep up with new extensions by periodically checking the Extensions Directory. This would be a lot easier if users could sort by one or more of these options:
1. Created date for the Drupal node represe...I'm trying to encourage people to keep up with new extensions by periodically checking the Extensions Directory. This would be a lot easier if users could sort by one or more of these options:
1. Created date for the Drupal node representing the extension
2. Date of latest release
Since the directory is a View, it seems like \#1 would be pretty easy; not sure about \#2, but would be nice.
Any thoughts @bgm?https://lab.civicrm.org/infra/stats-collection/-/issues/15Cleanup old data2023-04-03T15:51:03ZbgmCleanup old dataThe tables have become rather huge and not all of it is relevant.
```
--- /var/lib/mysql/stats
8.9 GiB [##########] extensions.MYD
6.5 GiB [####### ] entities.MYD
3.5 GiB [### ] entities.MYI
2.6 GiB [## ...The tables have become rather huge and not all of it is relevant.
```
--- /var/lib/mysql/stats
8.9 GiB [##########] extensions.MYD
6.5 GiB [####### ] entities.MYD
3.5 GiB [### ] entities.MYI
2.6 GiB [## ] extensions.MYI
1.8 GiB [## ] stats.MYD
```bgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/988Provide demo-triage sites for multiple versions2023-02-22T23:37:29ZtottenProvide demo-triage sites for multiple versions## Goal
Provide demo sites for several recent versions. At time of writing, that would mean (say) `5.59-rc`, `5.58-stable`, `5.57`, `5.56`, and so on (perhaps going back 6-9 months).
## Motivation
The current demo sites are often used...## Goal
Provide demo sites for several recent versions. At time of writing, that would mean (say) `5.59-rc`, `5.58-stable`, `5.57`, `5.56`, and so on (perhaps going back 6-9 months).
## Motivation
The current demo sites are often used to determine whether a bug is easy to reproduce. However, there is another common question when triaging a bug: *When was the bug introduced?* (This helps you determine the specific cause and helps to establish the priority for the bug.)
Even for somebody who has a local dev system and knows how to switch around builds, it usually takes 5+ minutes just to setup an environment for evaluating the bug on one specific version. For folks with fewer resources, it may take longer or be impossible.
## Obstacle
The main reason we haven't done this is security -- if you run a 9 month old version, then the odds are that it has some known/published security vulnerabilities. Of course, I don't think we should worry that Civi contributors will abuse such sites. The problem is about creating an easy target for bots and scriptkids.
We want to find a simple way to facilitate triage while keeping out the riffraff.
## Brainstorming Solutions (Dislike)
* _HTTP Basic (Static/Shared)_: If it leaks to the riffraff, then it's easy to abuse. Changing is a problem for anyone who uses the system. Also, it obfuscates testing of Civi-CMS functionality that involves HTTP Basic.
* _HTTP Basic (LDAP/civicrm.org)_: I don't think we should submit real `civicrm.org` credentials directly to a hackable box. Also, it obfuscates testing of Civi-CMS functionality that involves HTTP Basic.
* _Port Knocking_: Same leakage problem as "HTTP Basic (Static/Shared)". Also, fairly obscure.
## Brainstorming Solutions (Like)
* _OpenID Connect_: If you try to access an older demo site, it redirects to a web-page that shows a warning and does some kind of identity check (e.g. `civicrm.org` account or `github.com` account or `contributer-key.yml` entry). Then it redirects back and sets a cookie. (*The access cookie should be separate from the one usually used by PHP/CMS/Civi.*)
* _Good_: No special setup on the user's workstation. Just a redirect in the web-browser.
* _Bad_: Only works well in web-browser. Using `curl`, `mysql`, etc is annoying.
* _Private Network_: Put older demos on a private network. Connect through some kind of VPN (e.g. "IKEv2", "OpenVPN") or mesh overlay (e.g. "Tailscale", "Nebula").
* _Good_: Works with most tools+protocols (*browser, curl, mailcatcher, mysql, etc*)
* _Bad_: Requires special setup on workstation. Public wifi (cafe/library/train/etc) may block access.https://lab.civicrm.org/infra/gitlab/-/issues/43Gitlab Bot: Identify newbies2022-10-31T12:34:11ZbgmGitlab Bot: Identify newbiesRelated: community/community-engagement#24
- [ ] Post to the mattermost channel for community relations
- [ ] Add a Gitlab label to remind us to be niceRelated: community/community-engagement#24
- [ ] Post to the mattermost channel for community relations
- [ ] Add a Gitlab label to remind us to be nicebgmbgmhttps://lab.civicrm.org/infra/gitlab/-/issues/42Gitlab Bot: When a PR is posted in an issue, add label "has-PR", gather stats...2022-12-03T21:08:09ZbgmGitlab Bot: When a PR is posted in an issue, add label "has-PR", gather stats in civicrm.orgRelated to community/community-engagement#24
This will help identify issues that need review or that can be closed (if the PR has been merged but we forgot to close).
- [x] Add "has-pull-request" label to issues that mention a PR
- [x]...Related to community/community-engagement#24
This will help identify issues that need review or that can be closed (if the PR has been merged but we forgot to close).
- [x] Add "has-pull-request" label to issues that mention a PR
- [x] Keep some stats in civicrm.org (last active, etc, see comments)
- [ ] Add webhooks everywhere (still todo: extensions)
- [ ] Modify auto-close bot so that issues with a PR are pinged, but not closedbgmbgmhttps://lab.civicrm.org/infra/gitlab/-/issues/40Automated reminders for Issues in Gitlab2022-09-13T07:26:33Zjustinfreeman (Agileware)Automated reminders for Issues in GitlabIt would be good if there was a way in Gitlab to automatically send email reminders of issues that they have reported or been assigned, which need action. Because unless we are actively working on a project, support request or it's a per...It would be good if there was a way in Gitlab to automatically send email reminders of issues that they have reported or been assigned, which need action. Because unless we are actively working on a project, support request or it's a personal project, we have no way to be reminded and triggered to come back and work on a specific CiviCRM issue. And there are many CiviCRM issues that still need work!
To solve this problem, it would be good to implement automated reminders for Issues in Gitlab so that the Reporter and Assignee can be reminded on a weekly basis that there is an Issue open that needs their attention.
There should be a method for disabling the reminder on a per issue basis (sssh!) or for all issues. I think it would be reasonable if the automated reminder was enabled by default for all users, provided there was a method of disabling.
If someone has taken the time and effort to report an issue, then there is a reasonable expectation that they will also be interested in following up, providing more information or marking it as closed if the problem has been resolved. However, if the system does not remind them about the issue, then it is easily lost in the noise and to be honest with you, can be quite hard to find in Gitlab.
Given that there is no such system in place currently, the chances of issues being reported and not followed up on are pretty high.
If someone has been assigned an Issue then it would be good to remind them of that Assignment. If there is a bug report which requires more information, then assign the issue back to the Reporter along with the request for more details.
Raising this request in response to my own note on https://lab.civicrm.org/dev/core/-/issues/3750#note_80079