Skip to content

Syncing from upstream odoo/runbot (17.0)#1059

Open
bt-admin wants to merge 49 commits intobrain-tec:17.0from
odoo:17.0
Open

Syncing from upstream odoo/runbot (17.0)#1059
bt-admin wants to merge 49 commits intobrain-tec:17.0from
odoo:17.0

Conversation

@bt-admin
Copy link
Copy Markdown

bt_gitbot

Because popovers are always in the topmost layer they workaround the
overflow issue I had with dropdowns, which means it's now possible to
scroll the stagings horizontally while having the statuses remain
glued to their trigger.

And because popovers natively handle unicity, remove the bit of JS
which did that for dropdowns.

Using anchor positioning is *supposed* to correctly handle moving the
popover around when it's not visible but that doesn't seem to work. I
assume I did something wrong, but remove the positioning JS anyway,
it *should* not be needed.

Also just noticed that the branch folding doesn't work on mobile,
apparently `click` doesn't trigger correctly on the titles even with
the "enabler" the internet tells me about (`role=button`, `cursor:
pointer`, ...), added a `touchstart` handler to fix it.

Fixes #1356
Might have been there polluting my logs for a while despite
2fea318 claiming to have handled it:
the error shows up *at commit*, so commit inside the `try` in order
for the `except` to handle that case.
Since we pre-create forward-port PRs in their cron, when GH notifies
us that the PR has been open and we unconditionally try to create it
again the call fails with a `UniqueViolation` and we reply with a 400.

Github doesn't seem to be super happy with 400s, plus it shows up
weirdly in the logs.

We could LBYL but because the webhook could be sent while we're in the
middle of creating the PRs this is often not going to work correctly,
so better handle the violation and reply that we know the PR already.

Fixes #1362
The locator is kinda nice, but if a staging falls off the front page
it stops working which sucks.

Update the locator to *exclusively* use `[data-batch-id]`, and add
that attribute on the branch list. This way it works on both, the
pages are small enough that the probably-less-optimized selector does
not seem to cause performance issues.

Also cleanup the outline spec a bit: it uses either `invert` or
`currentColor` for color which is fine, however the default width is
`medium` which is a bit much (UA-dependent but ~3px), so switch that
to `thin`.
Followup to ada6e20: in `join`
mode (designed to join back existing splits after finding a likely
culprit for failure), the process had a special case for finding a
single split in which case it would do nothing (rather than delete the
split and creating a new one, which is a waste of time).

*However* it didn't have a special case for *no splits*, so if the
problematic batch would be alone (e.g. stage a batch, staging break)
or the very last one of a staging, it would proceed to "join" no
splits into a new split with no batches, which then breaks staging
because that does not handle this condition.

If there are no extant splits, just do nothing, there's no joining to
do.

Fixes #1364
The previous commit fixed the source of empty splits, but nothing says
I won't fuck up again in the future, so update the staging process to
not fall over if it hits an empty split again. Might make it more
difficult to realize there are empty splits but I don't think that's
worth making the mergebot unavailable over.

Resolves #1364
Complements the new lazy configs on runbot, which stops triggering
every job on every update of every PR: if some required statuses are
entirely missing from a PR (not even pending) when it is approved,
call the runbot and request that those statuses be computed /
generated.

This is currently completely specific to runbot/mergebot interactions,
going through the saas tunnel.
When the github_login of an employee is changed, they are first
deprovisioned, then reprovisioned with the new github_login.

The first part leads to the res.partner having their email blanked,
and since the two keys used to find an existing partner are the email
and the github_login when they get reprovisioned the existing partner
is not found, which leads to trying to create a new one, which fails
because the user we're trying to create alongside the partner
conflicts.

Improve the provisioning process by adding a fallback on the *user*
searching first by email (login) and then by oauth_uid, this way the
neutered partner can be found (through the user), updated, and
reactivated.

This does not fix the case where the new github_login was already used
to interact with the mergebot so there is a separate partner with that
set which fails the creation due to an other duplicate, but that's a
much rarer case, and easily resolved by deleting the second partner.

Fixes #1367
@bt-admin bt-admin added the 17.0 label Mar 30, 2026
xmo-odoo added 21 commits March 31, 2026 08:49
A few methods were accessed but never called causing either
conditional bypass (yielding a likely error as it tries to read a
dangling staging) or data loss (in an assertion's error message so not
exactly critical)
`changes` should be a dict of dicts, no sets allowed.
Feedback objects have a `repository` not a `repo`.
Probably NBD especially as it would only occur for PRs without a
description but still.
Because the loop variable used the same name as the parameter, any
scheduling of fp followup with more than one PR of which at least one
was `skipmerge` might force fw on a bunch of innocent PRs who never
asked for it.

skipmerge is not that common so the likelihood is low, and getting an
extra fw you were not ready for is not the end of the world, but
still...
Not super necessary but other locations use a fullmatch.
- Fix the error message which would return something like
  `17.0:runbot_merge.branch() not found`, the latter half of which is
  not very useful.
- Explicitly search on the project's name rather than rely on an
  implicit `name_search()`.
This was not handling batch writes involving `closed`
correctly. They're likely very uncommon but no reason not to shift the
handling of that flag into the pre-write loop.
When running this test in runbot status mode with high load, the cron
leaving the validation comment has a strong tendency to not run,
failing the cron. This resolves itself on rerun but it is annoying.

Claude has imagined an interleaving where this would happen:
`post_comment -> dummy-central -> webhook` taking enough time that it
has not committed by the time `run_crons` is invoked, which means
`runbot_merge.cron_validate`'s trigger is not visible yet, does not
get run, and does not send the validation message.

However the introduction of even *massive* timeouts (10+ seconds)
between the comment and the run_cron does not work around the issue,
so it sounds pretty reliably like that was made up. There might be a
genuine issue in the triggering of the validation cron, but since
that's a best effort anyway in the general case and the workaround
works well enough around the issue that I can see, good enough.
Any error raised during `fast_forward` yields a
`ForwardPortError`[^1], under the assumption that if pushing fails on
the *first* push it's innocuous (nothing happens) the process of
updating the reference branches to the (successful) staging's would
just reraise the FF error, treating the staging as just dead / noop.

However with the recent stability troubles of github this is becoming
less and less true, with errors more and more frequently being a dice
roll as to whether the requested operation succeeded or failed. This
exact issue occurred on 2026-03-30: during forward-porting staging
157794 received an error 502 (bad gateway) with a json error message
of "server error", only `odoo/odoo` was being staged to so when that
failed the mergebot assumed this was a dead staging, and tried staging
the PRs again.

Unbeknownst to our poor bot however, while github had returned an
error it also had updated odoo/odoo:saas-18.2 and integrated the two
PRs. This resulted in any further staging of those PRs failing because
they were creating empty commit, confusing their developers in turn.

A potential long-term solution might be to add more fine-grained error
handling here, separating non-fast-forward errors from others and
still allowing for an initial non-fast-forward error to be considered
innocuous, *however* that seems like unnecessary bother especially
when those conditions are very hard to test since the addition of the
sanity-check pushes to tmp (DC would need support for scenarisation in
order to simulate transient or mid-step errors).

It might also improve things (or make the errors easier to detect) to
go through git in order to perform the sync, rather than github.

But we're going to do neither here. What we are going to do is
quickly fail and neuter the branch. Although an improvement which
seems like a good idea is to keep and reraise from the original error,
as the call which initially errored might have put the repository in a
differently problematic state than the original, and we likely want
the original version of the issue.

[^1]: maybe it should not and only the specific "not a forward port"
    error should be...
Apparently it didn't bother the fetch too much but might as well: a
few lines below we add the full refname to the branches set, so the
initialization should be similar.
Fix root to work on recordsets which are not singletons.
This is only for the extremely rare case of delegating to a list of
partners, instead of the usual `delegate+`, so I'm unsure this has
ever been used at all. But it would ping the wrong user in case of
error afterwards so...
This is similar to one of the changes in
bafa515, apparently I missed this
site back then.

Also replace an `str.__mod__` by an f-string.
Confused 4096 and 8192, this makes for a number which is a bit weird
and not necessarily recognizable at first glance. Though arbitrary, it
makes more sense to be reminiscent of classic "round" numbers (binary
round number in this case).
If a PR has no merge method set *and* has more than 50 commits, the
second half of the check would be executed on a `None` and blow up.

Guard it. For all we know the writer of the PR will set the method to
`merge` anyway.
xmo-odoo and others added 20 commits April 1, 2026 16:08
Staging integration was one of the areas where we were still using the
github API to interact with the git database. Most of the other
changes were primarily driven by rate limits, with a bunch of
secondary factors:

- the degrading reliability
- the latency of change propagation through the API
- rate limiting as the mergebot's load increased

cf #247

Fixes #1378
When we forward-port a PR, we know the number of commits the new PR
has because we *just* created them, we don't need to trust github
especially when it apparently sometimes decides to lie to us and tell
us our PR has 0 commits.

Update `_create_port_branch` and `_cherry_pick` to return the number
of commits they processed, and squash it atop the PR object github
sends back (actually just pass `squash` explicitly to `_from_pr`, as
it requires less changes).

Fixes #1377
If github claims that a PR has 0 commits on creation or update, trust
it but verify (set up a cron job to fetch the PR and count the number
of commits it actually has).

Fixes #1377
…dation

When validate_pr() is called as a pre-validation step before staging, it can raise exceptions.Skip (for instance, on multi-commit PRs missing a merge_method).

Since Skip is a subclass of MergeError, and validate.py lacked an explicit handler for it, these PRs were previously caught by the general MergeError block. This resulted in the PR being incorrectly marked with pr.error = True and receiving an "unable to stage" comment. Once in this error state, the PR is blocked from further staging attempts even after the missing information (like the merge method) is provided.

This fix adds a specific handler for exceptions.Skip to ensure the validation simply bypasses the PR without flagging it as a failure, matching the behavior already present in stagings_create.py.
The hook config propagation has apparently become more delayed on GH,
so without some time to propagate events will get dispatched even
within the scope.
Doing the two ops within the same scope, github would sometimes
dispatch the status hook before the comment one, leading to an
"incorrect" mail message ordering.

By splitting in two different scopes, the delay at the end of each
scope seems to give github enough time to ensure the hooks are
dispatched in the expected order.
Apparently the `commits` endpoint now returns a 422 when a commit is
not found, rather than a 404.
When trying to discover modify/delete conflicts, the mergebot would
`ls-remote` the source's target in order to get its head (and use that
as `--merge-base`), but it would not *fetch* that head. So for
long-deactivated branches the head might not be in the repository,
leading to an error when trying to `diff` between it and the source's
HEAD.

`fetchone` simultaneously resolves the reference and fetches the
commit, so it's a drop-in replacement which does a better job.

After more consideration, remove `remote_head` (`ls-remote`) that was
the only callsite and it's proven to really be unsafe, any attempt to
use ls-remote that way should probably use `fetchone`.

There is only one other use of `ls-remote`, and it uses that endpoint
because that makes it easier to map refnames to oids on batch
resolution, but then it fetches in bulk using `fetch_heads`.

Fixes #1394
Having no subject makes it really confusing. Also reorganize a bit:
dump the entire response text as notification message instead of the
exception's repr and some generic stuff (especially since the repr
seems to be getting stripped out somewhere).

Fixes #1391
Sending everything in one shot is a problem when there is a lot of
backlog (e.g. the feedback cron was stopped for a while), hundreds of
feedbacks sent in a single run means there's no feedback whatsoever
for the entire thing which is frustrating to watch.

Fixes #1388
Also stop trying to send feedbacks after 12 attempts (spaced 10mn
apart).

Fixes #1388
They're merged, it's not necessary.

Also skip for closed PRs, but add a dependency on that one so the
statuses get recomputed when the PR gets reopened.
Since git-fw runs from inside a repository, it can look at the
configured remotes to try to find what repository the PR should live
in.

Fixes #1376
When hitting a private repository over HTTP, if it's not configured
however it's supposed to be configured git will prompt for a password
and then you're stuck. By disabling credentials prompting, if there's
a credential helper configured it will be used, otherwise the command
will fail and go to the ssh URL which will hopefully succeed.

If there are psychos out there who enter their credentials on every
remote op it won't work for them, but that seems an unlikely case.
The previous code would work fine as long as the user had already
fetched the ref to a remote-tracking branch somehow, otherwise it
would fetch the commit but then have nothing to switch to.

Here we change a few things:

- look for the ref in the main repository
- look for the ref in the dev repository (which may not exist)
- `fetch` the ref itself (not just its commits)

This should lead to having a remote-tracking branch for the fw's ref,
which we can then `switch` to. And set the upstream to ensure it's
properly configured.

Technically we could set up the branch by hand and set a url as
`branch.<name>.remote`, an `push` works, however the UX for this is
terrible as git says the branch has no upstream and won't be able to
track its remote state, so that seems undesirable.
This job was missed after `state` got fragmented into subfields, as a
result if a PR is closed via `check` then any dependent of the `state`
is hit, the PR will look reopened to the mergebot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants