Last week I ran a routine CVE sweep across all 35 repositories managed by git-steer. The scan itself worked fine. What it revealed was more interesting: four systemic gaps that had been quietly compounding since we first automated security remediation.
Fix PRs were being created but never tracked to completion. Most repos had Dependabot disabled entirely. Workflow error handling was swallowing failures. And the changelog pipeline — fully built and tested — was sitting in a staging area, uncommitted.
None of these were emergencies. All of them were the kind of operational debt that erodes trust in automation over time. So we planned the fix, executed it, and shipped it in under 15 minutes.
Here’s what we found, what we built, and why the planning mattered more than the code.
Gap 1: The Orphaned PR Problem
What we found: git-steer’s security sweep creates RFC issues, dispatches a GitHub Actions workflow, and that workflow opens fix PRs. But once the PR existed, nothing tracked it. A PR could sit open for weeks, merge without updating the RFC, or close without anyone noticing the fix had failed.
Why it matters: If you automate the creation of security fixes but not the follow-through, you’ve built a system that looks like it’s working while vulnerabilities remain open. MTTR (Mean Time to Remediate) becomes unmeasurable because you never record when — or whether — a fix actually lands.
What we built: A new CI script (ci-pr-followup.mjs) that runs daily in the heartbeat workflow, positioned between the security sweep and dashboard generation so the dashboard always reflects fresh data. The script:
- Loads all active RFCs from the state repo
- For each RFC with a linked PR, fetches the PR status
- Merged: marks the RFC as
fixed, computes MTTR in hours, closes the RFC issue with a resolution comment - Stale (>48h without updates): posts a nudge comment on the PR, adds a
stalelabel - Merge conflict: posts a conflict notice, adds a
merge-conflictlabel - Closed without merge: resets the RFC to
open— the fix failed, try again - No PR after 24h: logs a warning that the workflow may have silently failed
As a fallback, it also scans all managed repos for open PRs with security + automated labels that aren’t linked to any RFC. These orphans get flagged so nothing slips through.
The decision: We considered doing this in the MCP server itself (poll on demand), but the heartbeat approach is better. It runs whether or not anyone is actively using the MCP tools, and it updates state before the dashboard regenerates — so the dashboard is always current.
Gap 2: Dependabot Was Disabled on Most Repos
What we found: When the security scan ran against all 35 repos, many returned 403 errors on the Dependabot alerts endpoint. Investigation revealed that vulnerability alerts and automated security fixes were never enabled on most ry-ops repositories. The scan was silently skipping them.
Why it matters: You can’t fix vulnerabilities you can’t see. A security scanning pipeline that silently skips half your repos is worse than no pipeline at all, because it creates false confidence.
What we built: Two layers of enforcement:
-
Heartbeat enforcement step: Before the security scan runs, a new step iterates over all managed repos and calls
PUT /repos/{repo}/vulnerability-alertsandPUT /repos/{repo}/automated-security-fixes. This is idempotent — if already enabled, the API returns success. If the repo doesn’t support it, we log and move on. -
MCP tool (
security_enforce): For on-demand use. Call it from any MCP client to ensure Dependabot is enabled across all managed repos (or a specific one). -
Onboarding hook: We patched
config_add_reposo that when you add a new repo to git-steer’s managed set, Dependabot is automatically enabled. No more forgetting.
The decision: We chose “enforce on every heartbeat” over “enforce once on setup” because repos can have their settings changed externally. Daily enforcement is cheap (one API call per repo) and guarantees drift correction.
Gap 3: Workflows Were Swallowing Failures
What we found: Multiple workflows had error handling that silently discarded failures:
- heartbeat.yml: The log step always recorded
"completed"regardless of whether earlier steps failed, and it only ran on success (missingif: always()) - security-fix.yml: The report step tried to call
tonumberon an empty string when no fixable vulnerabilities were found, crashing the entire report - security-sweep.yml: PR bodies included
Closes #0when no RFC issue existed, the RFC comment step ran even withissueNumber=0, and PR numbers were never written back to state - ci-dashboard.mjs: Scan failures logged
xto stdout with no indication of which repo failed or why - ci-changelog.mjs: Failed PR fetches returned empty arrays silently
Why it matters: Silent failures in automation are poison. They don’t trigger alerts, they don’t show up in logs, and they compound. A workflow that “succeeds” while doing nothing useful is the hardest kind of bug to catch.
What we fixed:
| File | Fix |
|---|---|
heartbeat.yml | "completed" replaced with ${{ job.status }}, added if: always() to log step |
security-fix.yml | FIXED_COUNT="${FIXED_COUNT:-0}" — default to 0 before tonumber |
security-sweep.yml | Guard issueNumber=0 in PR body (no Closes #0), guard RFC comment condition, add PR persistence step to write prNumber/prUrl back to rfcs.jsonl |
ci-dashboard.mjs | Catch block now logs repo.fullName: ${err.message} |
ci-changelog.mjs | Catch block now logs repo.fullName: ${err.message} |
The decision: Every fix was minimal and targeted. We didn’t refactor the workflows or add retry logic or build a notification system. We made the failures visible. That’s the right first step — you can’t decide how to handle errors you can’t see.
Gap 4: The Changelog Pipeline Was Built but Not Shipped
What we found: scripts/ci-changelog.mjs was fully written, tested, and working. The heartbeat workflow already had the changelog-sync task option defined. But the script was sitting in the working directory, uncommitted. The pipeline existed in every way except the one that mattered: it wasn’t running.
Why it matters: This is the classic “last mile” problem. The hardest part of shipping isn’t building the feature — it’s the commit, the PR, the merge, the deploy. An uncommitted script is an idea, not a capability.
What we did: Committed it. Added error logging to the getRecentMergedPRs catch block while we were in there. That’s it.
Sometimes the fix is just shipping the thing.
The GitHub Client Methods
To support the PR lifecycle loop and Dependabot enforcement, we added four new methods to src/github/client.ts:
getPullRequest(owner, repo, pullNumber)— Returns full PR state: merged status, mergeable flag, labels, timestamps. Used by the follow-up script to determine what action to take.enableVulnerabilityAlerts(owner, repo)—PUT /repos/{owner}/{repo}/vulnerability-alerts. Enables Dependabot alerts.enableAutomatedSecurityFixes(owner, repo)—PUT /repos/{owner}/{repo}/automated-security-fixes. Enables Dependabot auto-fix PRs.checkVulnerabilityAlertsEnabled(owner, repo)—GETwith 204/404 check. For querying current state without modifying it.
These are thin wrappers around the GitHub API. No business logic, no retry logic, no caching. The client’s job is to be a typed, authenticated interface to the API. Policy decisions belong in the tools and workflows that call it.
The Speed Factor: 15 Minutes
Here’s the timeline:
-
Planning: We spent time before writing code to map every gap, decide which files to modify, determine the step order in the heartbeat workflow, and split the work into two logical PRs. The plan was a document, not a conversation.
-
Implementation: With the plan in hand, every edit was mechanical. No exploration, no “let me try this.” Read the file, make the change, move on. Nine tasks, executed sequentially where dependent and in parallel where independent.
-
Verification:
npm run buildpassed on the first try. No TypeScript errors, no missing imports, no broken references.
The speed wasn’t about rushing. It was about not having to think during implementation because all the thinking happened during planning.
The Pattern
Every time we’ve had success shipping quickly, the pattern is the same:
- Discover the gaps by running the system, not by reading the code
- Map every change before touching a file — file name, line number, what changes
- Batch related changes into coherent PRs with clear scope
- Execute mechanically — the plan is the spec, implementation is just typing
- Verify immediately — build, test, review
Change management isn’t bureaucracy. It’s the discipline of knowing what you’re going to do before you do it. That discipline is what makes 15-minute implementations possible instead of 3-hour debugging sessions.
The gaps we found weren’t complex. The fixes weren’t clever. But without the plan, we’d have been jumping between files, discovering dependencies mid-edit, and second-guessing scope. With it, we shipped two PRs worth of hardening in the time it takes to drink a coffee.
git-steer is an MCP server for GitHub repository management. It handles repo settings, branch management, PR workflows, security scanning, and autonomous vulnerability remediation across 35+ repositories — all without cloning a single repo locally.