ops issueshttps://lab.civicrm.org/infra/ops/-/issues2019-01-03T17:10:09Zhttps://lab.civicrm.org/infra/ops/-/issues/493INFRA-260 Monthly member payments not triggering receipts2019-01-03T17:10:09Zjoshjosh@civicrm.orgINFRA-260 Monthly member payments not triggering receiptsA handful of partners and members have noticed that the monthly membership payments in civicrm.org do not trigger receipts. Most notable is contact ID 8275 (I've been manually triggering receipts).A handful of partners and members have noticed that the monthly membership payments in civicrm.org do not trigger receipts. Most notable is contact ID 8275 (I've been manually triggering receipts).https://lab.civicrm.org/infra/ops/-/issues/546INFRA-206 cxnapp: Allow services to detect civicrm.org memberships2019-04-04T20:37:46ZtottenINFRA-206 cxnapp: Allow services to detect civicrm.org membershipsWe'd like to support agreements where CiviCRM Members (on civicrm.org) can receive extra services. For services based on https://github.com/civicrm/cxnapp/ , this means validating that the callback URL is associated an active membership.We'd like to support agreements where CiviCRM Members (on civicrm.org) can receive extra services. For services based on https://github.com/civicrm/cxnapp/ , this means validating that the callback URL is associated an active membership.https://lab.civicrm.org/infra/ops/-/issues/843padthai: faulty disk is causing performance issues2019-01-03T17:08:54Zbgmpadthai: faulty disk is causing performance issuesCurrrent status:
* [x] Replace all 3 disks in padthai.c.o
* [x] Re-install padthai OS from scratch
* [x] Configure ~~test-ubu1204-5.c.o~~ test-1.c.o
* [x] (moved to #863) Configure ~~test-ubu1604-1.c.o~~ test-2.c.o
* [x] Restore botdyla...Currrent status:
* [x] Replace all 3 disks in padthai.c.o
* [x] Re-install padthai OS from scratch
* [x] Configure ~~test-ubu1204-5.c.o~~ test-1.c.o
* [x] (moved to #863) Configure ~~test-ubu1604-1.c.o~~ test-2.c.o
* [x] Restore botdylan.c.o
* [x] Restore www-test.c.o
* [ ] Ensure that backups are correctly configured on padthai/test-1/test-2, and that relevant files are backed up (i.e. anything other than `/etc`).
Initial ticket: What seems to be a faulty disk is causing performance issues.
```
root@padthai:~# zpool status
pool: zpadthai
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zpadthai ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sda5 ONLINE 0 0 0
sdb5 ONLINE 0 0 0
sdc5 ONLINE 3 0 0
errors: No known data errors
```
```
# dmesg | grep sd
[54960868.851963] sd 0:0:2:0: [sdc] tag#3 CDB: Read(10) 28 00 1d 3a f4 90 00 00 f8 00
[54960868.859534] sd 0:0:2:0: [sdc] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[54960868.860477] sd 0:0:2:0: [sdc] tag#3 Sense Key : Medium Error [current]
[54960868.861381] sd 0:0:2:0: [sdc] tag#3 Add. Sense: Unrecovered read error
[54960868.862272] sd 0:0:2:0: [sdc] tag#3 CDB: Read(10) 28 00 1d 3a f4 90 00 00 f8 00
[54960868.863147] blk_update_request: critical medium error, dev sdc, sector 490402960
[54962799.895947] sd 0:0:2:0: [sdc] tag#1 CDB: Read(10) 28 00 0b 6a ec 40 00 00 18 00
[54962799.903136] sd 0:0:2:0: [sdc] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[54962799.904018] sd 0:0:2:0: [sdc] tag#0 CDB: Read(10) 28 00 0b 6a f3 90 00 00 08 00
[54962799.904884] blk_update_request: I/O error, dev sdc, sector 191558544
[54962799.905730] sd 0:0:2:0: [sdc] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[54962799.906568] sd 0:0:2:0: [sdc] tag#2 CDB: Read(10) 28 00 0b 6a f9 50 00 00 10 00
[54962799.907392] blk_update_request: I/O error, dev sdc, sector 191560016
[54962799.908224] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[54962799.909032] sd 0:0:2:0: [sdc] tag#1 Sense Key : Medium Error [current]
[54962799.909828] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Unrecovered read error
[54962799.910607] sd 0:0:2:0: [sdc] tag#1 CDB: Read(10) 28 00 0b 6a ec 40 00 00 18 00
[54962799.911373] blk_update_request: critical medium error, dev sdc, sector 191556672
```https://lab.civicrm.org/infra/ops/-/issues/855c-i: Add in monitoring of MySQL Queries2018-10-16T01:59:44Zseamusleec-i: Add in monitoring of MySQL Queries@bgm @totten
I think we should probably put in some monitoring of long queries on the 3 MySQL instances if possible. I think we should probably alert if a query is running > 300s. I would find it strange on any of our test jobs that a ...@bgm @totten
I think we should probably put in some monitoring of long queries on the 3 MySQL instances if possible. I think we should probably alert if a query is running > 300s. I would find it strange on any of our test jobs that a query would be running for longer than 300shttps://lab.civicrm.org/infra/ops/-/issues/861Drupal demo sites failed to rebuild because of missing sendmail2018-11-13T06:50:27ZbgmDrupal demo sites failed to rebuild because of missing sendmailSince 2018-11-09 01:00 UTC, Drupal demo sites (dmaster/dcase) failed to rebuild because of missing sendmail:
```
Starting Drupal installation. This takes a few seconds ... [ok]
sh: 1: /usr/sbin/sendmail: not found
WD ma...Since 2018-11-09 01:00 UTC, Drupal demo sites (dmaster/dcase) failed to rebuild because of missing sendmail:
```
Starting Drupal installation. This takes a few seconds ... [ok]
sh: 1: /usr/sbin/sendmail: not found
WD mail: Error sending e-mail (from admin@example.com to [error]
admin@example.com).
Installation complete. User name: admin User password: ZUxW5Ai0soy4 [ok]
Unable to send e-mail. Contact the site administrator if the problem [error]
persists.
+ set +x
```https://lab.civicrm.org/infra/ops/-/issues/883Backup lab.c.o and chat.c.o to backups-1.c.o2022-12-14T01:50:40ZbgmBackup lab.c.o and chat.c.o to backups-1.c.oCurrently only backed up to sushi.
* [ ] lab.c.o
* [ ] chat.c.o
but both seem pretty critical.Currently only backed up to sushi.
* [ ] lab.c.o
* [ ] chat.c.o
but both seem pretty critical.bgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/893Develop regular gitlab/mattermost crons to feed data to CiviCRM2019-08-29T14:43:40Zjoshjosh@civicrm.orgDevelop regular gitlab/mattermost crons to feed data to CiviCRMDevelop regular crons to feed data to CiviCRM from GitLab and MatterMost. Ideally, we are able to delineate in CiviCRM users that are actively contributing vs. those that are not or are no longer. Currently, the contributor log is only t...Develop regular crons to feed data to CiviCRM from GitLab and MatterMost. Ideally, we are able to delineate in CiviCRM users that are actively contributing vs. those that are not or are no longer. Currently, the contributor log is only tracking/valuing contributions within a rolling 12 months, so for the sake of consistency it makes sense to base "active" status on participation within the past rolling 12 months. That said, ideally we're able to differentiate between past contributors (i.e. people that have participated in some manner) vs. those that have registered but not participated.
Notes from chat with @bgm
- active mattermost user
- active gitlab user
- ex: commented in the past X months.bgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/895For in-app distribution, rename "CiviMobile" to "CiviMobile Web App"2019-12-10T15:11:24ZtottenFor in-app distribution, rename "CiviMobile" to "CiviMobile Web App"Goal: The extensions `com.webaccessglobal.module.civimobile` and `com.agiliway.civimobileapi` are both called "CiviMobile", but the pretty names should each have an extra descriptor to disambiguate, i.e.
* "CiviMobile Web App" (`com.we...Goal: The extensions `com.webaccessglobal.module.civimobile` and `com.agiliway.civimobileapi` are both called "CiviMobile", but the pretty names should each have an extra descriptor to disambiguate, i.e.
* "CiviMobile Web App" (`com.webaccessglobal.module.civimobile`)
* "CiviMobile API" (`com.agiliway.civimobileapi`)
Caveat: Ensure that the web URL for the published CiviMobile extension remains the same or provides a redirect.https://lab.civicrm.org/infra/ops/-/issues/904Proactively restart mysqld on test nodes2019-07-10T01:43:11ZtottenProactively restart mysqld on test nodes__Issue__: MySQL periodically crashes, and we have not been able to find a concrete reason in the logs. It appears to happen most on `test-1` (which also handles the most test runs).
__Proposed Intervention__: Periodically, proactively ...__Issue__: MySQL periodically crashes, and we have not been able to find a concrete reason in the logs. It appears to happen most on `test-1` (which also handles the most test runs).
__Proposed Intervention__: Periodically, proactively restart mysqld.
You could easily add Jenkins job which just restarts the daemon; however, the challenge is that there may be some mix of concurrent jobs which are actively using the mysqld. You need to wait for (or create) an opportunity to restart the daemon.
[flock](https://linux.die.net/man/1/flock) seems like it might do the job, as in:
1. Pick a naming convention for a lock file (e.g. `~/bknix-dfl/var/mysql-admin-lock`)
2. At the start of every test job (`CiviCRM-Core-PR`, `CiviCRM-Core-Matrix`, etc), wrap all the work in a call to `flock` which acquires a *shared/read lock*.
3. In some cleanup job (eg `CiviCRM-PR-Cleanup`), wrap the `mysqld restart` work in an *exclusive/write lock*.
Alternatively, https://plugins.jenkins.io/build-blocker-plugin might do the job.https://lab.civicrm.org/infra/ops/-/issues/907alert.civicrm.org: upgrade to PHP 7.3 (community-messages)2021-03-21T13:24:13Zbgmalert.civicrm.org: upgrade to PHP 7.3 (community-messages)* Currently runs PHP 5.6
* Service is on www-prod.c.o.o, which runs also PHP 7.3 for docs (running as php-fpm, so both can co-exist).
https://github.com/civicrm/civicrm-community-messages.git* Currently runs PHP 5.6
* Service is on www-prod.c.o.o, which runs also PHP 7.3 for docs (running as php-fpm, so both can co-exist).
https://github.com/civicrm/civicrm-community-messages.gittottentottenhttps://lab.civicrm.org/infra/ops/-/issues/910Extension-SHA: PR tests should run on current base+head2019-07-13T20:05:46ZtottenExtension-SHA: PR tests should run on current base+headReport/discussion from [Mattermost](https://chat.civicrm.org/civicrm/pl/ouhttuz537fhjf1cj94rtbdaac):
> (**seamuslee**) @totten also I'm wondering if we should be doing a similar thing in the Extension-SHA job to be that it applies the ...Report/discussion from [Mattermost](https://chat.civicrm.org/civicrm/pl/ouhttuz537fhjf1cj94rtbdaac):
> (**seamuslee**) @totten also I'm wondering if we should be doing a similar thing in the Extension-SHA job to be that it applies the commits from the PR ontop of the current branch of the ext rather than as it does now which is a git checkout <commit> ping @coleman
>
> (**totten**) @seamuslee yeah, i think i agree with that behavior, but need an example to be clear
so if `civicrm/api4` has PR#1234 with base-branch `civicrm:56.78` and head-branch `alice:fix-foo`, then the test should be executed on the merged result of `civicrm:56.78 + alice:fix-foo`
>
> if so, that should be the behavior of https://github.com/civicrm/civicrm-infra/blob/master/jenkins-examples/Extension-SHA.bash#L62-L68
>
> which in turn is responsible for the bits of [output](https://test.civicrm.org/job/Extension-SHA/594/console) for " Build test site... `civibuild download`... `git clonebh` ..."
>
> ([git clonebh](https://github.com/civicrm/civicrm-buildkit/blob/master/bin/git-clonebh))
>
> if it's not behaving that way...then first thing is to check if the given test-run has the right inputs for `GIT_BASE` and `GIT_HEAD`
>
> (**seamuslee**) @totten i think @coleman has found that for example PR is created based on commit #5 from master (i.e. that is when it branches) but then we have a PR or something that adds commits #6, #7, #8 to master
> the PR test on the branch from #5 sometimes works sometimes doesn't because it isn't including the most recent commitshttps://lab.civicrm.org/infra/ops/-/issues/922OVH - Boxes/billing for 2020 (migrate VMs from padthai to paella)2020-01-31T18:33:43ZtottenOVH - Boxes/billing for 2020 (migrate VMs from padthai to paella)I've been running some numbers re:`padthai` and `barbecue` hosting. Notes:
1. There are 5 VMs running on these boxes (`test-1`, `test-3`, `www-test`, `www-demo`, `botdylan`)
2. Currently, padthai goes through `ca.ovh.com` (which allows...I've been running some numbers re:`padthai` and `barbecue` hosting. Notes:
1. There are 5 VMs running on these boxes (`test-1`, `test-3`, `www-test`, `www-demo`, `botdylan`)
2. Currently, padthai goes through `ca.ovh.com` (which allows 1/3/6/12mo arrangements), and bbq goes through `us.ovhcloud.com` (which is strictly month-to-month).
3. `padthai` bills at a higher rate than `barbecue` ($122/mo vs $99/mo), but `barbecue` is markedly faster than `padthai`. (25% faster single-thread per Passmark). This is definitely noticeable when running PR tests. (`test-1` takes longer to run tests than `test-3`.)
4. Being able to prepay reduces the business taxes. I don't know an exact figure, but if we prepay $2k and would have otherwise paid abut 30% * $2k intaxes, then I think it saves ~$600-700.
5. Rough notes/estimation: https://docs.google.com/spreadsheets/d/1FGnxC7Gki_mgU4xNcbKggsy9sglF7EsZCQ87eCRu3k4/edit?usp=sharing
I'm thinking that an optimal path might be to deploy 2x instances of the "Advance-2" (Xeon-E 2136, 64gb, 2x500 SSD) on the "ca.ovh.com" account. This would:
* Enable us to prepay more upfront (*reducing taxes*)
* Get better CPUs for both boxes (*faster cores and more of them*) ==> *running PRs faster* and maybe adding some GL worker capacity?
* Pay lower rate per box ($122+99/mo ==> $97+$97/mo)
* Manage both boxes from the same account (*less billing/admin*)
* Most of the VM's (`test-1`, `www-test`, `www-demo`, `botdylan`) would still be at OVH in the same CA data-center... so potentially one could keep the IPs and simply move the disk-images?
That saves about $1k for the year on tax+hosting. The main Q is the cost/difficulty of migrating the VMs to new physical boxes. @bgm Do you think this is something we could do easily?
If the labor is a problem, then a fallback option would be to prepay padthai. (There's no technical work for anyone, and we get some of the tax savings, but gotta stick it out on older CPU.)https://lab.civicrm.org/infra/ops/-/issues/924Automate ESR distribution using Gitlab2020-10-23T13:17:24ZbgmAutomate ESR distribution using GitlabProposed by Josh and brainstormed on CT calls: to use Gitlab as a way to automate the distribution of [ESR](https://civicrm.org/esr) releases.
* [x] Create a Gitlab project for ESR users (anyone with a download key)
* [ ] Automatically ...Proposed by Josh and brainstormed on CT calls: to use Gitlab as a way to automate the distribution of [ESR](https://civicrm.org/esr) releases.
* [x] Create a Gitlab project for ESR users (anyone with a download key)
* [ ] Automatically add people to that project when they go through a CiviCRM form (new member form, or ESR key request for members)
* [membership form](https://civicrm.org/civicrm/contribute/transact?reset=1&id=64)
* [existing member form](https://civicrm.org/civicrm/profile/create?gid=137&reset=1)
* [x] Semi-automate the upload of CiviCRM tar files as release artifacts on Gitlab (ex: dummy repo, create tag, upload artifact) [example](https://lab.civicrm.org/extensions/civiexportexcel/snippets/19)
* [ ] Provide a way so that ESR users can easily download the files, ex, using the Gitlab API and personal access token. [example](https://gitlab.com/gitlab-com/support-forum/issues/4154)
Question: any accounting info that needs to be handled? What's the process for Xero?
cc @totten @joshhttps://lab.civicrm.org/infra/ops/-/issues/931PR testing should validate ts() strings2020-04-03T01:38:14ZbgmPR testing should validate ts() stringsPR review sometimes catches incorrect uses of `ts`, but not always. The gettext extraction scripts are pretty good at catching many invalid use-cases.
It would be nice to have to run string-extraction on pull-requests. Technically, it s...PR review sometimes catches incorrect uses of `ts`, but not always. The gettext extraction scripts are pretty good at catching many invalid use-cases.
It would be nice to have to run string-extraction on pull-requests. Technically, it should be a quick check to add, similar to checking code syntax.
One issue we have, is that there are many 20-25 errors that are systematically thrown by the string-extractor. Some are annoying to fix issues, others are incorrect uses of 'ts' that are difficult to workaround.
It would be really useful to have a way to flag tolerated or known-issues, so that we can at least start applying some checks moving forward.
Worst case, it could be a list of regexs of code to ignore?
cc @seamuslee @totten @davedhttps://lab.civicrm.org/infra/ops/-/issues/934Remove or Update access for civicrm-docs repo on GitHub2020-06-02T17:16:29ZhomotechsualRemove or Update access for civicrm-docs repo on GitHubThe [CiviCRM Docs](https://github.com/civicrm/civicrm-docs) repo on GitHub is currently unmaintained/unmanaged - it should be a mirror of the [docs-publisher](https://lab.civicrm.org/documentation/docs-publisher) repo from GitLab.
I'd l...The [CiviCRM Docs](https://github.com/civicrm/civicrm-docs) repo on GitHub is currently unmaintained/unmanaged - it should be a mirror of the [docs-publisher](https://lab.civicrm.org/documentation/docs-publisher) repo from GitLab.
I'd like to suggest that we do the following:
1. Shut down the pull requests/issues tracker (I think the latter is already done!)
2. Rename the repo to docs-publisher (the repo itself has very little to do with actual docs content)
3. Set this up to mirror from docs-publisher on lab.
I initially had thought to just remove it but @eileen depends on it :-)homotechsualhomotechsualhttps://lab.civicrm.org/infra/ops/-/issues/936latest.civicrm.org causes cv cron problems2020-02-22T20:53:14Zbgmlatest.civicrm.org causes cv cron problemsReported by @AllenShaw on mattermost:
> That url is reporting something that looks like bad behavior too:
https://latest.civicrm.org/stable.php?format=summary
```
{"malformed":{"name":"malformed","severity":"warning","title":"Version C...Reported by @AllenShaw on mattermost:
> That url is reporting something that looks like bad behavior too:
https://latest.civicrm.org/stable.php?format=summary
```
{"malformed":{"name":"malformed","severity":"warning","title":"Version Check Failed","message":"The server failed to report on available versions. Perhaps the request was malformed."}}
```tottentottenhttps://lab.civicrm.org/infra/ops/-/issues/945c.o: ldapcivi service should not require "admin civicrm"2020-05-10T16:22:13Zbgmc.o: ldapcivi service should not require "admin civicrm"The ldap service for LDAP should not require the equivalent of "administer CiviCRM". The custom API calls probably do not have the correcty alterAPIPermission settings.
(I'm trying to reduce the number of roles on the site, and people/b...The ldap service for LDAP should not require the equivalent of "administer CiviCRM". The custom API calls probably do not have the correcty alterAPIPermission settings.
(I'm trying to reduce the number of roles on the site, and people/bots with admin roles)bgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/948latest.civicrm.org: upgrade to PHP 7.32020-06-02T17:20:46Zbgmlatest.civicrm.org: upgrade to PHP 7.3This currently runs on PHP 5.6: https://latest.civicrm.org/stable.php?format=json
on stats.civicrm.org
Related issues (although not on stats.c.o): #907, #908, #909.
cc @tottenThis currently runs on PHP 5.6: https://latest.civicrm.org/stable.php?format=json
on stats.civicrm.org
Related issues (although not on stats.c.o): #907, #908, #909.
cc @tottenhttps://lab.civicrm.org/infra/ops/-/issues/953Schema definitions!2020-10-23T13:31:57ZhomotechsualSchema definitions!This is kind of a `/dev` issue and partly `/infra`.
We've defined a schema definition (using [JSON Schema](https://json-schema.org)) for docs-books and are in the process of defining one for APIv3 (using [OpenAPI](https://www.openapis.o...This is kind of a `/dev` issue and partly `/infra`.
We've defined a schema definition (using [JSON Schema](https://json-schema.org)) for docs-books and are in the process of defining one for APIv3 (using [OpenAPI](https://www.openapis.org/))
To work out and be useful for validating docs files and/or providing documentation and client generation options for the API these docs should be hosted somewhere (ideally using something like `schema.civicrm.org/v1/docs-book.json`.
We can then reference these schemas for validation of their respective files or in the case of OpenAPI to generate client implementations programmatically.
So onto the "ask". I think the ideal hosting place is to use a repo on GitLab (somewhere?) ideally with GitLab pages enabled and the domain `schema.civicrm.org` connected to it. There are additional schemas that would be useful - extension info.xml files could have a schema to allow validation. Essentially any structured data file can be validated with a schema and it's a useful way to document the available options in the file.
Currently planned schemas include:
- Docs Books
- APIv3
- Extension info.xml
Pinging @bgm, @totten & @seamusleehttps://lab.civicrm.org/infra/ops/-/issues/968Make adding new gitlab milestones part of the release process2021-06-14T17:55:00ZDaveDMake adding new gitlab milestones part of the release processFor main releases, if the release is `5.n.0` suggest to create:
1. `5.n.1` milestone. Example for 5.33.0, create a 5.33.1 milestone.
1. `5.(n+2).0` milestone. Example for 5.33.0 create a 5.35.0 milestone. There will already be a 5.34.0 m...For main releases, if the release is `5.n.0` suggest to create:
1. `5.n.1` milestone. Example for 5.33.0, create a 5.33.1 milestone.
1. `5.(n+2).0` milestone. Example for 5.33.0 create a 5.35.0 milestone. There will already be a 5.34.0 milestone from the previous release.
For smaller releases, e.g. 5.33.1, it's optional based on timing. If it's early in the cycle then could make the next point milestone, e.g. 5.33.2.
This isn't a huge problem, it's just that every month there's several tickets closed shortly after a release and there's no appropriate milestone available.https://lab.civicrm.org/infra/ops/-/issues/974DockerHub account2021-03-21T17:23:41ZhomotechsualDockerHub accountCan we setup an organisation on DockerHub, we can add 3 users for free and it'd be useful to publish infra related images.
Note: I'm not asking for a `civicrm/` namespace - I think we're a way away from official container images. But ra...Can we setup an organisation on DockerHub, we can add 3 users for free and it'd be useful to publish infra related images.
Note: I'm not asking for a `civicrm/` namespace - I think we're a way away from official container images. But rather something like `civicrm-infra/` so we can publish, for examples: `civicrm-infra/docs-publisher`, `civicrm-infra/community-messages`, `civicrm-infra/symfony-ci-base`, `civicrm-infra/docs-pr-test` etc,https://lab.civicrm.org/infra/ops/-/issues/976Add monitoring for GPG key validity2021-05-12T04:42:53ZtottenAdd monitoring for GPG key validityCiviCRM releases are signed by a GPG key (email `info@civicrm.org`; key identified as `61819cb662da5fff79183ef83801d1b07a1e75cb` aka `3801D1B07A1E75CB`). Every few years, the key expiration date should be extended.
It expired recently, ...CiviCRM releases are signed by a GPG key (email `info@civicrm.org`; key identified as `61819cb662da5fff79183ef83801d1b07a1e75cb` aka `3801D1B07A1E75CB`). Every few years, the key expiration date should be extended.
It expired recently, so I pushed to:
* http://keyserver.ubuntu.com/pks/lookup?search=info%40civicrm.org&fingerprint=on&op=index
* https://keys.openpgp.org/search?q=3801D1B07A1E75CB
I'm not sure the best way to make an assertion that the key is valid, but it might be a good task to run once a day.https://lab.civicrm.org/infra/ops/-/issues/980Move services using rest.php to new authx-based endpoint2022-01-25T08:34:39ZbgmMove services using rest.php to new authx-based endpointThere are services querying `/vendor/civicrm/civicrm-core/extern/rest.php`, such as the `ldapcivi` service used by Gitlab. This endpoint is long deprecated.
- [ ] Migrate civildap to new endopint
- [ ] Migrate CiviCRM Spark Aegir server...There are services querying `/vendor/civicrm/civicrm-core/extern/rest.php`, such as the `ldapcivi` service used by Gitlab. This endpoint is long deprecated.
- [ ] Migrate civildap to new endopint
- [ ] Migrate CiviCRM Spark Aegir servers to new endpoint (they fetch client information for the site setup)
Suggestions from Tim:
> needs authx, and it may need to set a header for XMLHttpRequest
> actually, with https://github.com/civicrm/civicrm-core/pull/19727, it would be simpler to switch
> or another way is to change &key=MY_SITE_KEY&api_key=MY_API_KEY to &_authx=Bearer+MY_API_KEY
> in each case, you probably need to relax a setting
> from the [bottom of docs](https://docs.civicrm.org/dev/en/latest/framework/authx/#settings), this is an example which allows all kinds of
```
cv ev 'Civi::settings()->set("authx_guards", []);'
cv ev 'Civi::settings()->set("authx_param_cred", ["jwt", "api_key", "pass"]);'
cv ev 'Civi::settings()->set("authx_header_cred", ["jwt", "api_key", "pass"]);'
cv ev 'Civi::settings()->set("authx_xheader_cred", ["jwt", "api_key", "pass"]);'
cv ev 'Civi::settings()->set("authx_login_cred", ["jwt", "api_key", "pass"]);'
cv ev 'Civi::settings()->set("authx_auto_cred", ["jwt", "api_key", "pass"]);'
```
> here, you probably want something narrower like
```
cv ev 'Civi::settings()->set("authx_param_cred", ["jwt", "api_key"]);'
## Allow JWT+API keys to come in via `?_authx=...` param
```
cc @tottenhttps://lab.civicrm.org/infra/ops/-/issues/982Joomla sandbox offline2022-08-09T14:22:47ZbgmJoomla sandbox offlineThe Joomla sandbox did not have automatic daily rebuilds using [buildkit](https://github.com/civicrm/civicrm-buildkit/). Until we do, I will disable the site, because it is running an ancient and insecure version of CiviCRM (and Joomla)....The Joomla sandbox did not have automatic daily rebuilds using [buildkit](https://github.com/civicrm/civicrm-buildkit/). Until we do, I will disable the site, because it is running an ancient and insecure version of CiviCRM (and Joomla).
If anyone has experience with Joomla and buildkit, please comment here. It's too far outside my comfort zone, so I would need a hand with the setup.https://lab.civicrm.org/infra/ops/-/issues/988Provide demo-triage sites for multiple versions2023-02-22T23:37:29ZtottenProvide demo-triage sites for multiple versions## Goal
Provide demo sites for several recent versions. At time of writing, that would mean (say) `5.59-rc`, `5.58-stable`, `5.57`, `5.56`, and so on (perhaps going back 6-9 months).
## Motivation
The current demo sites are often used...## Goal
Provide demo sites for several recent versions. At time of writing, that would mean (say) `5.59-rc`, `5.58-stable`, `5.57`, `5.56`, and so on (perhaps going back 6-9 months).
## Motivation
The current demo sites are often used to determine whether a bug is easy to reproduce. However, there is another common question when triaging a bug: *When was the bug introduced?* (This helps you determine the specific cause and helps to establish the priority for the bug.)
Even for somebody who has a local dev system and knows how to switch around builds, it usually takes 5+ minutes just to setup an environment for evaluating the bug on one specific version. For folks with fewer resources, it may take longer or be impossible.
## Obstacle
The main reason we haven't done this is security -- if you run a 9 month old version, then the odds are that it has some known/published security vulnerabilities. Of course, I don't think we should worry that Civi contributors will abuse such sites. The problem is about creating an easy target for bots and scriptkids.
We want to find a simple way to facilitate triage while keeping out the riffraff.
## Brainstorming Solutions (Dislike)
* _HTTP Basic (Static/Shared)_: If it leaks to the riffraff, then it's easy to abuse. Changing is a problem for anyone who uses the system. Also, it obfuscates testing of Civi-CMS functionality that involves HTTP Basic.
* _HTTP Basic (LDAP/civicrm.org)_: I don't think we should submit real `civicrm.org` credentials directly to a hackable box. Also, it obfuscates testing of Civi-CMS functionality that involves HTTP Basic.
* _Port Knocking_: Same leakage problem as "HTTP Basic (Static/Shared)". Also, fairly obscure.
## Brainstorming Solutions (Like)
* _OpenID Connect_: If you try to access an older demo site, it redirects to a web-page that shows a warning and does some kind of identity check (e.g. `civicrm.org` account or `github.com` account or `contributer-key.yml` entry). Then it redirects back and sets a cookie. (*The access cookie should be separate from the one usually used by PHP/CMS/Civi.*)
* _Good_: No special setup on the user's workstation. Just a redirect in the web-browser.
* _Bad_: Only works well in web-browser. Using `curl`, `mysql`, etc is annoying.
* _Private Network_: Put older demos on a private network. Connect through some kind of VPN (e.g. "IKEv2", "OpenVPN") or mesh overlay (e.g. "Tailscale", "Nebula").
* _Good_: Works with most tools+protocols (*browser, curl, mailcatcher, mysql, etc*)
* _Bad_: Requires special setup on workstation. Public wifi (cafe/library/train/etc) may block access.https://lab.civicrm.org/infra/ops/-/issues/1002Migrate/integrate download.civicrm.org with civicrm.org2023-09-01T10:02:50ZtottenMigrate/integrate download.civicrm.org with civicrm.orgIn some side discussion about #1001, @colemanw and @bgm suggested migrating or integrating `download.civicrm.org` with `civicrm.org`. I wanted to record an issue to capture this.
How:
* Add a D9/D10 module on `civicrm.org` which either...In some side discussion about #1001, @colemanw and @bgm suggested migrating or integrating `download.civicrm.org` with `civicrm.org`. I wanted to record an issue to capture this.
How:
* Add a D9/D10 module on `civicrm.org` which either:
1. Migrates the PHP logic from the `download.civicrm.org`, or
2. Forwards HTTP sub-requests to `download.civicrm.org`.
Upshots:
* This lets you inherit the navigation, site-wide theming, and analytics.
* It's written with Symfony page-controllers and Twig, which are also supported by D9/D10.
* It doesn't have any interdependencies on Drupal content ("nodes" and "files"), so it should be fairly easy to install/maintain such a module on a local dev-site.
There are a few things to bear in mind:
* `download.civicrm.org` has a few areas of functionality: autobuild info (eg https://download.civicrm.org/latest/), redirects (eg `https://download.civicrm.org/civicrm-X.Y.Z-foo.tar.gz`), and release info (eg https://download.civicrm.org/about/). Each has a few subpages/features.
* Its basic purpose is to list/filter/cache information about the available builds (from Google Cloud Storage). It blends in some additional data from (1) release-notes in Github and (2) JSON files provided by each build.
* From the POV of a general reader on `civicrm.org`, some functionality (like "inspecting the git input used by a candidate build") is niche. But it's still useful for release-management. Migrating/integrating means you may have to reconcile more opinions about what to present.
* It's not currently designed around composable/mixable `block`s. It's just a couple HTML pages. But in Drupal, in the long-run, it probably makes sense to do more of the `block` stuff.https://lab.civicrm.org/infra/ops/-/issues/1003Investigate SPF "Soft fail"2023-09-07T01:33:13ZtottenInvestigate SPF "Soft fail"I got one of the emails from the `civicrm.org` security announcements. In Gmail, there's an option "Show original" which has an interesting report. The message makes it to my inbox (perhaps because of the history; perhaps because of the ...I got one of the emails from the `civicrm.org` security announcements. In Gmail, there's an option "Show original" which has an interesting report. The message makes it to my inbox (perhaps because of the history; perhaps because of the DKIM), but it shows a failure about SPF.
@bgm I'm not sure how we're routing mail right now. Perhaps we need some DNS tweak?
> ![Screen_Shot_2023-09-06_at_6.24.14_PM](/uploads/156f371c5e9e404b73c971535be156cf/Screen_Shot_2023-09-06_at_6.24.14_PM.png)https://lab.civicrm.org/infra/ops/-/issues/1004(Test Systems) Java out-of-memory leads to zombie worker2023-09-07T20:34:42Ztotten(Test Systems) Java out-of-memory leads to zombie worker(Originated on MM chat: https://chat.civicrm.org/civicrm/pl/otynrhfqeintdbwqpnjjakjdia. Note that it starts out with two different problems; one rc tarball problem is cleared up quickly. This issue is about the other problem. I'm taking ...(Originated on MM chat: https://chat.civicrm.org/civicrm/pl/otynrhfqeintdbwqpnjjakjdia. Note that it starts out with two different problems; one rc tarball problem is cleared up quickly. This issue is about the other problem. I'm taking the observations about it and trying to compose a full hypothesis of the problem.)
Suppose you have a job like https://test.civicrm.org/job/CiviCRM-Core-Matrix-PR/4551/BKPROF=dfl,SUITES=phpunit-crm,label=bknix-tmp/console -- the job is interrupted because one of the Java based agents (Jenkins master or Jenkins agent) runs out of memory.
```
Installing build4test_qpf3m database
ok 1806 - CRM_Dedupe_MergerTest::testBatchMergeSelectedDuplicates
ok 1807 - CRM_Dedupe_MergerTest::testBatchMergeAllDuplicates
ok 1808 - CRM_Dedupe_MergerTest::testGetCidRefs
ok 1809 - CRM_Dedupe_MergerTest::testGetMatches
ok 1810 - CRM_Dedupe_MergerTest::testGetMatchesExcludeDeleted with data set #0 (true)
ok 1811 - CRM_Dedupe_MergerTest::testGetMatchesExcludeDeleted with data set #1 (false)
ok 1812 - CRM_Dedupe_MergerTest::testGetMatchesIgnoreLocationType
ok 1813 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatched
ok 1814 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatchedWithLimit
ok 1815 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaMatchedWithSearchLimit
ok 1816 - CRM_Dedupe_MergerTest::testGetMatchesNoCriteria
ok 1817 - CRM_Dedupe_MergerTest::testGetMatchesNoCriteriaButLimit
ok 1818 - CRM_Dedupe_MergerTest::testGetMatchesCriteriaNotMatched
FATAL: command execution failed
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:102)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected reader termination
at hudson.remoting.SynchronousCommandTransport$ReaderThread.lambda$new$1(SynchronousCommandTransport.java:50)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused: java.io.IOException: Backing channel 'test-4' is disconnected.
at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
at com.sun.proxy.$Proxy74.isAlive(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1215)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1207)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:164)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
at hudson.model.Run.execute(Run.java:1900)
at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
at hudson.model.ResourceController.execute(ResourceController.java:107)
at hudson.model.Executor.run(Executor.java:449)
FATAL: Unable to delete script file /tmp/jenkins8376836365637261492.sh
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:102)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected reader termination
at hudson.remoting.SynchronousCommandTransport$ReaderThread.lambda$new$1(SynchronousCommandTransport.java:50)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@4a72872d:test-4": Remote call on test-4 failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:993)
at hudson.FilePath.act(FilePath.java:1186)
at hudson.FilePath.act(FilePath.java:1175)
at hudson.FilePath.delete(FilePath.java:1722)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:164)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
at hudson.model.Run.execute(Run.java:1900)
at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
at hudson.model.ResourceController.execute(ResourceController.java:107)
at hudson.model.Executor.run(Executor.java:449)
Build step 'Execute shell' marked build as failure
ERROR: Step ‘Publish xUnit test result report’ failed: no workspace for CiviCRM-Core-Matrix-PR/BKPROF=dfl,SUITES=phpunit-crm,label=bknix-tmp #4551
Finished: FAILURE
```
Here's what happens next:
* Jenkins kills the communication channel.
* Jenkins assumes that the worker-node kills any ongoing work for the test-job.
* Jenkins establishes a new communication channel and begins running new jobs.
* **But** the worker-node did *not* kill everything. (*I'm not clear exactly what it did do -- eg if any POSIX signals were sent; eg if worker processes are running or suspended.*) For example, `mysqld` is present in the process-table, and it retains a hold on TCP port 5601.
* When Jenkins begins another job, it finds the worker-image (`/home/dispatcher/images/bknix-dfl-2.img`) is in use. In fact, all of the images are in use. So it creates a new one (`bknix-dfl-5.img`).
* When Jenkins starts using `bknix-dfl-5.img`, it tries to launch a new mysqld on TCP port 5601. But it can't; the port is conflicted. You get problems [like this](https://test.civicrm.org/job/CiviCRM-Core-Matrix-PR/4553/BKPROF=dfl,SUITES=phpunit-api4,label=bknix-tmp/console):
```
[mysql] Start daemon: mysqld --datadir="/home/homer/_bknix/ramdisk/worker-3/mysql/data"
[mysetup] Initialize folder: /home/homer/_bknix/ramdisk/worker-3/mysetup
Waiting for MySQL (maxWait=300, interval=0.5, windDown=0.5)...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/home/homer/_bknix/ramdisk/worker-3/mysql/run/mysql.sock' (2)
```
If you have several jobs running at the moment of `OutOfMemory`, then you may wind up doing this multiple times (e.g. 3 jobs die; 3 zombies left behind; 3 new images created; 3 tcp ports blocked).
----------
> (Follow-up) *I'm not clear what the exact status is -- if any POSIX signals sent; if worker processes are running or suspended.*
What I did observe was that these zombie processes were still around ~2 hours after the original. But after ~2h30m, they had gone way on their own.
If the jobs were quietly executing in a headless fashion, then they should've wrapped up in <30min. So they're probably not running -- it suggests they're somehow suspended (and then some other timeout/reaper mechanism comes in after 2hr). But this is pure speculation.
---------
Brainstorming...
* Maybe figure out which java process ran out of memory -- and why.
* Maybe figure out what - if any - signals are being emitted when this OOM happens. Find a way to kill the zombies/orphans properly.
* Maybe introduce some firmer timeouts within the jobs. (We usually rely on Jenkins to timeout jobs; but obviously that doesn't work here. Try sprinkling `timeout` calls into `CiviCRM-Core-Matrix-PR.job` or similar leverage-point.)
* Maybe change the port-allocation function.
* Maybe enable network-namespaces for purposes for any non-interactive jobs.tottentottenhttps://lab.civicrm.org/infra/ops/-/issues/1005Migrate all backups from rdiff-backup to borg/borgmatic2024-01-15T01:38:58ZbgmMigrate all backups from rdiff-backup to borg/borgmaticPriority:
- [x] #970 lab.civicrm.org - runs Ubuntu 22.04
- [x] www-prod.civicrm.osuosl.org - runs Ubuntu 22.04
- [x] latest.civicrm.org - runs Ubuntu 20.04
- [x] chat.civicrm.org - runs Ubuntu 20.04
- [x] spark-1.civicrm.org - needs upg...Priority:
- [x] #970 lab.civicrm.org - runs Ubuntu 22.04
- [x] www-prod.civicrm.osuosl.org - runs Ubuntu 22.04
- [x] latest.civicrm.org - runs Ubuntu 20.04
- [x] chat.civicrm.org - runs Ubuntu 20.04
- [x] spark-1.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm
- [x] spark-2.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm (spark-2 was already running borg, because we backup to a EU server)
- [x] www-prod-2.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), ~~then Debian Bookworm~~ (done)
- [x] www-prod-2: also backup to backups-1.c.o
Followed by:
- [ ] botdylan.civicrm.org - needs upgrade to Debian Bullsye, then Debian Bookworm
- [x] test.civicrm.org - needs upgrade to ~~Debian Bullsye~~ (done), then Debian Bookworm
- [ ] www-demo.civicrm.org (rdiff currently broken) - needs upgrade to Debian Bookworm
Low priority:
- [ ] backups-1.civicrm.org
- [ ] barbecue.civicrm.org
- [ ] padthai.civicrm.org
- [ ] paella.civicrm.org
- [ ] test-1.civicrm.org
- [ ] test-2.civicrm.org
- [ ] test-3.civicrm.org
These can probably be ignored:
- [x] cxnapp-2.civicrm.org (offline)
- [x] www-cxn-2.civicrm.osuosl.org (offline)
- [x] manage.civicrm.osuosl.org (not used anymore, except as a ProxyJump)
- [x] www-test.civicrm.org (offline?)
For each server:
- Verify includes/excludes
- Setup with Ansible
- Update monitoring
We have 116 GB available on sushi, 180 GB used, so presumably we will run out of space if we do them all at once, instead of waiting a bit to purge some old rdiff backups (after .. 6 months?). Although Gitlab is one of the bigger backups, and it's already cleaned up.https://lab.civicrm.org/infra/ops/-/issues/1007Move civicrm extension extraction to Gitlab Pipeline2023-09-24T18:54:22ZbgmMove civicrm extension extraction to Gitlab PipelineI find the current Jenkins job to update extensions on Transifex inefficient to do on a daily basis:
- it fetches all extensions in the directory (filters by those available in-app)
- looks for new releases
- extracts and updates Transi...I find the current Jenkins job to update extensions on Transifex inefficient to do on a daily basis:
- it fetches all extensions in the directory (filters by those available in-app)
- looks for new releases
- extracts and updates Transifex
Eventually it does other things, like fetch Transifex translations, commit to repo, build the mo files. Those things make sense on a daily basis.
Sometimes we want to re-run the extraction on a single extension. For example, recently we had an issue with the gdpr extension, because the maintainers are mixing `vX.Y` and `X.Y` tags.
I did a test to move the "update transifex" process to a Gitlab Pipeline. Personally, I like that Gitlab uses Docker to manage the environment, so it's more clear/self-documented how the job is setup.
The proof of concept can be seen here:
https://lab.civicrm.org/dev/translation/-/blob/master/.gitlab-ci.yml
Note: the config is split in two tasks, so that for testing we can more easily run only one or the other (extract / commit). We can merge them once it's well-tested.
What's missing:
- [x] Configure the Transifex token in the Gitlab CI/CD settings of the project
- [x] Configure a Github token so that the pipeline can commit (personal token added to my `mlutfy-civicrm` account)
- [ ] Setup a Gitlab webhook so that we can trigger the pipeline for new releases
- [ ] Call the webhook when new releases are published on civicrm.org (extdir module, `modules/custom/extdir/extdir.drush.inc`)https://lab.civicrm.org/infra/ops/-/issues/1008promtail: ansible config for nginx logs on Gitlab2023-09-29T15:57:48Zbgmpromtail: ansible config for nginx logs on GitlabI updated the promtail config on most webservers, but need to add some adjustment in Ansible so that promtail can ingest this file on lab.c.o : `/var/opt/gitlab/nginx/logs/gitlab_error.log`
(I had mostly used a template that was designe...I updated the promtail config on most webservers, but need to add some adjustment in Ansible so that promtail can ingest this file on lab.c.o : `/var/opt/gitlab/nginx/logs/gitlab_error.log`
(I had mostly used a template that was designed for standard Debian servers, where nginx logs are in `/var/log/nginx/access.log`)
and latest.civicrm.org logs are missing toobgmbgmhttps://lab.civicrm.org/infra/ops/-/issues/1009Extension directory times out when uncached2023-11-01T14:26:49ZJonGoldExtension directory times out when uncachedCivi's Guzzle connections time out after the number of seconds specified in the `http_timeout` setting. Which, by default, is 5 seconds.
As the extension directory has grown, the amount of time needed to generate the results of, say, `...Civi's Guzzle connections time out after the number of seconds specified in the `http_timeout` setting. Which, by default, is 5 seconds.
As the extension directory has grown, the amount of time needed to generate the results of, say, `https://civicrm.org/extdir/ver=5.68.alpha1|uf=Drupal|status=stable|ready=` now exceeds 5 seconds.
This leads to timeouts when accessing the directory - difficult to troubleshoot because the next time, the cached results are returned quickly.
I thought I'd raise this as an infra issue first. If reducing the time needed to generate the results isn't possible, I'll submit a PR to raise the connection timeout specifically when loading the extension directory.https://lab.civicrm.org/infra/ops/-/issues/1010extdir: upgrade to a more recent PHP version (ideally 8.0 or later)2023-10-25T17:37:22Zbgmextdir: upgrade to a more recent PHP version (ideally 8.0 or later)Currently runs on PHP 7.2.
civicrm.org runs on PHP 8.0.
Repo: https://lab.civicrm.org/infrastructure/extdir.gitCurrently runs on PHP 7.2.
civicrm.org runs on PHP 8.0.
Repo: https://lab.civicrm.org/infrastructure/extdir.githttps://lab.civicrm.org/infra/ops/-/issues/1011Moving services off the OSUOSL cluster2024-01-04T22:26:43ZbgmMoving services off the OSUOSL clusterThe OSUOSL cluster was setup around 2014-2015, close to 10 years. They have been reliable and very cost-effective, thanks to OSUOSL. However, for performance and for planning the future, we should start slowly moving some services off th...The OSUOSL cluster was setup around 2014-2015, close to 10 years. They have been reliable and very cost-effective, thanks to OSUOSL. However, for performance and for planning the future, we should start slowly moving some services off those machines. Notably, when we had incidents upgrading some VMs, restoring MySQL backups was extremely slow.
The following VMs are on OSUOSL:
- lab.civicrm.org
- chat.civicrm.org
- www-prod.civicrm.osuosl.org (mostly docs, and some services, such as community-messages)
- latest.civicrm.org (stats/pingbacks)
- test-2.civicrm.org (jenkins node)
Less critical:
- manage.civicrm.osuosl.org (used mostly as an Ansible ProxyJump host, but some servers might still be configured to use LDAP)
Not used / mostly shutown:
- cxnapp-2
- www-cxn-2
We currently have VMs also at Linode (www-prod-2) and OVH (paella and test-3). The `test-3` server is dedicated to running tests, but paella runs these services:
- botdylan.civicrm.org (github bot),
- test-1.civicrm.org
- www-demo (for sandbox sites)
- www-test (not used, was for hosting a test site for civicrm.org)
Paella has around 230 GB of free disk space.
Internal reference: https://chat.civicrm.org/civicrm/pl/ocbbd61xsbg9xphwrjk4eq8wswhttps://lab.civicrm.org/infra/ops/-/issues/1012POT Scan: Update for sequentialcreditnotes, eventcart, flexmailer, etc2023-11-09T18:52:39ZtottenPOT Scan: Update for sequentialcreditnotes, eventcart, flexmailer, etcOver the past half-year, there have been a few new folders within the `civicrm-core` file-hierarchy, e.g.
```
setup
ext/sequentialcreditnotes
ext/flexmailer
ext/eventcart
ext/ewaysingle
ext/financialacls
ext/greenwich
ext/afform
ext/sea...Over the past half-year, there have been a few new folders within the `civicrm-core` file-hierarchy, e.g.
```
setup
ext/sequentialcreditnotes
ext/flexmailer
ext/eventcart
ext/ewaysingle
ext/financialacls
ext/greenwich
ext/afform
ext/search
```
The POT scanner should check these folders. Eventually, we're going to notice missing strings. There are mitigating factors - e.g. the corpus of strings isn't huge, and a lot of the strings existed before (either in different core folders or in a standalone extension), so there may be a bit of lag-time between the file-reorg and the appearance of symptoms.
There are currently two distinct localization flows -- iirc:
* The core flow - which splits strings into multiple POT files (`contribute.pot`, `event.pot`, etc), sends those to transifex, gets the PO's back out, and recombines into one MO (`civicrm.mo`). To capture the updated folders in this flow, we'd need to update `create-pot-files.sh`.
* The ext flow - which scans multiple repos, creating one POT per repo, sends those to transifex, gets the PO's back out, and publishes multiple MOs. To capture the updated folders in this flow, we'd need to update `civiextensions-update-transifex.php` and/or `create-pot-files-extensions.sh` and/or the scheduled-job.https://lab.civicrm.org/infra/ops/-/issues/1013mysql 8.0.29 being used for "max" but it can't even be downloaded anymore bec...2023-11-12T20:09:23ZDaveDmysql 8.0.29 being used for "max" but it can't even be downloaded anymore because it has a bad bugIt came up in another context that 8.0.29 is what's being used, but see https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html
> This release is no longer available for download. It was removed due to a critical issue that cou...It came up in another context that 8.0.29 is what's being used, but see https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html
> This release is no longer available for download. It was removed due to a critical issue that could cause data in InnoDB tables having added columns to be interpreted incorrectly. Please upgrade to MySQL 8.0.30 instead.
I don't know if the version in use is from a particular distribution that backports patches so maybe is not affected, but also the latest is 8.0.35.https://lab.civicrm.org/infra/ops/-/issues/1014Deprecate the l10n tar.gz2024-02-07T19:23:27ZbgmDeprecate the l10n tar.gzIf [PR#28139](https://github.com/civicrm/civicrm-core/pull/28139) is merged, we could get rid of the civicrm-l10n.tar.gz once the last ESR that relies on it is unsupported.
- [ ] Make sure that install docs and i18n guide (wiki page) ha...If [PR#28139](https://github.com/civicrm/civicrm-core/pull/28139) is merged, we could get rid of the civicrm-l10n.tar.gz once the last ESR that relies on it is unsupported.
- [ ] Make sure that install docs and i18n guide (wiki page) have instructions about the new way to install languages since https://github.com/civicrm/civicrm-core/pull/28061 (CiviCRM 5.69)
- [ ] dev/drupal#193 Deprecate the old installer (including from the Drupal7 drush module)
- [ ] Remove it from buildkit builds
- [ ] Remove the link from the civicrm.org/download page
- [ ] Stop generating the l10n tarballshttps://lab.civicrm.org/infra/ops/-/issues/1016Upgrade Java v17 for jenkins and nodes2023-12-11T20:28:09ZbgmUpgrade Java v17 for jenkins and nodeshttps://lab.civicrm.org/infra/ops/-/issues/1017Test runs do composer install before applying the PR patch, meaning changes t...2023-12-29T22:43:01ZDaveDTest runs do composer install before applying the PR patch, meaning changes to composer.json/lock aren't being testede.g. https://github.com/civicrm/civicrm-core/pull/28813
Possibly because of https://github.com/civicrm/civicrm-buildkit/commit/3081d83c648f6aa244f33380e7a3054deb515bf7#diff-652e839cd6819e54de3eafe2ac9126fa3da1d288d271a0fa681f28048111fa2...e.g. https://github.com/civicrm/civicrm-core/pull/28813
Possibly because of https://github.com/civicrm/civicrm-buildkit/commit/3081d83c648f6aa244f33380e7a3054deb515bf7#diff-652e839cd6819e54de3eafe2ac9126fa3da1d288d271a0fa681f28048111fa22R25 ?https://lab.civicrm.org/infra/ops/-/issues/1018Move latest.civicrm.org from OSUOSL to paella.civicrm.org (OVH)2024-01-05T16:53:09ZbgmMove latest.civicrm.org from OSUOSL to paella.civicrm.org (OVH)* [x] OVH: assign an IP address, add a "virtual mac" of type OVH, use the full hostname as the vMAC name
* [x] KVM server: create a ZFS volume for the VM
* ex: `zfs create -s -V 70G [pool]/[name-of-vm]` (see `zpool status; zfs list`)
*...* [x] OVH: assign an IP address, add a "virtual mac" of type OVH, use the full hostname as the vMAC name
* [x] KVM server: create a ZFS volume for the VM
* ex: `zfs create -s -V 70G [pool]/[name-of-vm]` (see `zpool status; zfs list`)
* [x] Ansible: copy a relevant example from `host_vars/[vm]` for the new server, adapt values (hostname and IPs)
* [x] Ansible: add the hostname in the `hosts` file
* [x] Ansible: add the hostname as a preseed in `host_vars/kvm-foo.example.org` (the parent server)
* [x] Ansible: generate the preseed file: `ansible-playbook -l kvm-foo.example.org --tags kvm-server-preseeds ./site.yml`
* [x] KVM server: start the installation
* `ssh root@x[...].example.org`
* `/etc/preseeds/[hostname]/start.sh`
* [x] Change the preseed password
* [x] Ansible (create deploy user): `ansible-playbook -l xxxx.example.org -u myuser --become-user=root --ask-become-pass ./setup.yml`
* [x] Ansible (full installation): `ansible-playbook -l xxxx.example.org ./site.yml`
* [x] Test that the VM reboots cleanly
* [x] Migrate the services from latest.civicrm.org
* [x] latest.civicrm.org pingback service (and mysql DB)
* [x] stats.civicrm.org (static? deprecate?)
* [x] releaser files
* [x] Monitoring: update the host in Icinga and re-enable icinga on the new VM (was disable to avoid conflicts)
* [x] Update the A/AAAA records for latest.civicrm.org and stats.civicrm.org
* [x] OVH: configure rDNS for the IPv4 and IPv6 addresses
* [x] Backups: double-check that backups are running (they were initially disabled, not to caused conflicts with the current production VM)
* [x] Verify that monitoring is green
* [ ] Shutdown the old VM?
Old DNS records:
- 140.211.167.189
- 2605:bc80:3010:102:0:3:5:0
New DNS records:
- 192.95.2.135
- 2607:5300:203:6713:700::