Last week I ran a routine CVE sweep across all 35 repositories managed by git-steer. The scan itself worked fine. What it revealed was more interesting: four systemic gaps that had been quietly compounding since we first automated security remediation.

Fix PRs were being created but never tracked to completion. Most repos had Dependabot disabled entirely. Workflow error handling was swallowing failures. And the changelog pipeline — fully built and tested — was sitting in a staging area, uncommitted.

None of these were emergencies. All of them were the kind of operational debt that erodes trust in automation over time. So we planned the fix, executed it, and shipped it in under 15 minutes.

Here’s what we found, what we built, and why the planning mattered more than the code.


Gap 1: The Orphaned PR Problem

What we found: git-steer’s security sweep creates RFC issues, dispatches a GitHub Actions workflow, and that workflow opens fix PRs. But once the PR existed, nothing tracked it. A PR could sit open for weeks, merge without updating the RFC, or close without anyone noticing the fix had failed.

Why it matters: If you automate the creation of security fixes but not the follow-through, you’ve built a system that looks like it’s working while vulnerabilities remain open. MTTR (Mean Time to Remediate) becomes unmeasurable because you never record when — or whether — a fix actually lands.

What we built: A new CI script (ci-pr-followup.mjs) that runs daily in the heartbeat workflow, positioned between the security sweep and dashboard generation so the dashboard always reflects fresh data. The script:

  • Loads all active RFCs from the state repo
  • For each RFC with a linked PR, fetches the PR status
  • Merged: marks the RFC as fixed, computes MTTR in hours, closes the RFC issue with a resolution comment
  • Stale (>48h without updates): posts a nudge comment on the PR, adds a stale label
  • Merge conflict: posts a conflict notice, adds a merge-conflict label
  • Closed without merge: resets the RFC to open — the fix failed, try again
  • No PR after 24h: logs a warning that the workflow may have silently failed

As a fallback, it also scans all managed repos for open PRs with security + automated labels that aren’t linked to any RFC. These orphans get flagged so nothing slips through.

The decision: We considered doing this in the MCP server itself (poll on demand), but the heartbeat approach is better. It runs whether or not anyone is actively using the MCP tools, and it updates state before the dashboard regenerates — so the dashboard is always current.


Gap 2: Dependabot Was Disabled on Most Repos

What we found: When the security scan ran against all 35 repos, many returned 403 errors on the Dependabot alerts endpoint. Investigation revealed that vulnerability alerts and automated security fixes were never enabled on most ry-ops repositories. The scan was silently skipping them.

Why it matters: You can’t fix vulnerabilities you can’t see. A security scanning pipeline that silently skips half your repos is worse than no pipeline at all, because it creates false confidence.

What we built: Two layers of enforcement:

  1. Heartbeat enforcement step: Before the security scan runs, a new step iterates over all managed repos and calls PUT /repos/{repo}/vulnerability-alerts and PUT /repos/{repo}/automated-security-fixes. This is idempotent — if already enabled, the API returns success. If the repo doesn’t support it, we log and move on.

  2. MCP tool (security_enforce): For on-demand use. Call it from any MCP client to ensure Dependabot is enabled across all managed repos (or a specific one).

  3. Onboarding hook: We patched config_add_repo so that when you add a new repo to git-steer’s managed set, Dependabot is automatically enabled. No more forgetting.

The decision: We chose “enforce on every heartbeat” over “enforce once on setup” because repos can have their settings changed externally. Daily enforcement is cheap (one API call per repo) and guarantees drift correction.


Gap 3: Workflows Were Swallowing Failures

What we found: Multiple workflows had error handling that silently discarded failures:

  • heartbeat.yml: The log step always recorded "completed" regardless of whether earlier steps failed, and it only ran on success (missing if: always())
  • security-fix.yml: The report step tried to call tonumber on an empty string when no fixable vulnerabilities were found, crashing the entire report
  • security-sweep.yml: PR bodies included Closes #0 when no RFC issue existed, the RFC comment step ran even with issueNumber=0, and PR numbers were never written back to state
  • ci-dashboard.mjs: Scan failures logged x to stdout with no indication of which repo failed or why
  • ci-changelog.mjs: Failed PR fetches returned empty arrays silently

Why it matters: Silent failures in automation are poison. They don’t trigger alerts, they don’t show up in logs, and they compound. A workflow that “succeeds” while doing nothing useful is the hardest kind of bug to catch.

What we fixed:

FileFix
heartbeat.yml"completed" replaced with ${{ job.status }}, added if: always() to log step
security-fix.ymlFIXED_COUNT="${FIXED_COUNT:-0}" — default to 0 before tonumber
security-sweep.ymlGuard issueNumber=0 in PR body (no Closes #0), guard RFC comment condition, add PR persistence step to write prNumber/prUrl back to rfcs.jsonl
ci-dashboard.mjsCatch block now logs repo.fullName: ${err.message}
ci-changelog.mjsCatch block now logs repo.fullName: ${err.message}

The decision: Every fix was minimal and targeted. We didn’t refactor the workflows or add retry logic or build a notification system. We made the failures visible. That’s the right first step — you can’t decide how to handle errors you can’t see.


Gap 4: The Changelog Pipeline Was Built but Not Shipped

What we found: scripts/ci-changelog.mjs was fully written, tested, and working. The heartbeat workflow already had the changelog-sync task option defined. But the script was sitting in the working directory, uncommitted. The pipeline existed in every way except the one that mattered: it wasn’t running.

Why it matters: This is the classic “last mile” problem. The hardest part of shipping isn’t building the feature — it’s the commit, the PR, the merge, the deploy. An uncommitted script is an idea, not a capability.

What we did: Committed it. Added error logging to the getRecentMergedPRs catch block while we were in there. That’s it.

Sometimes the fix is just shipping the thing.


The GitHub Client Methods

To support the PR lifecycle loop and Dependabot enforcement, we added four new methods to src/github/client.ts:

  • getPullRequest(owner, repo, pullNumber) — Returns full PR state: merged status, mergeable flag, labels, timestamps. Used by the follow-up script to determine what action to take.
  • enableVulnerabilityAlerts(owner, repo)PUT /repos/{owner}/{repo}/vulnerability-alerts. Enables Dependabot alerts.
  • enableAutomatedSecurityFixes(owner, repo)PUT /repos/{owner}/{repo}/automated-security-fixes. Enables Dependabot auto-fix PRs.
  • checkVulnerabilityAlertsEnabled(owner, repo)GET with 204/404 check. For querying current state without modifying it.

These are thin wrappers around the GitHub API. No business logic, no retry logic, no caching. The client’s job is to be a typed, authenticated interface to the API. Policy decisions belong in the tools and workflows that call it.


The Speed Factor: 15 Minutes

Here’s the timeline:

  1. Planning: We spent time before writing code to map every gap, decide which files to modify, determine the step order in the heartbeat workflow, and split the work into two logical PRs. The plan was a document, not a conversation.

  2. Implementation: With the plan in hand, every edit was mechanical. No exploration, no “let me try this.” Read the file, make the change, move on. Nine tasks, executed sequentially where dependent and in parallel where independent.

  3. Verification: npm run build passed on the first try. No TypeScript errors, no missing imports, no broken references.

The speed wasn’t about rushing. It was about not having to think during implementation because all the thinking happened during planning.


The Pattern

Every time we’ve had success shipping quickly, the pattern is the same:

  1. Discover the gaps by running the system, not by reading the code
  2. Map every change before touching a file — file name, line number, what changes
  3. Batch related changes into coherent PRs with clear scope
  4. Execute mechanically — the plan is the spec, implementation is just typing
  5. Verify immediately — build, test, review

Change management isn’t bureaucracy. It’s the discipline of knowing what you’re going to do before you do it. That discipline is what makes 15-minute implementations possible instead of 3-hour debugging sessions.

The gaps we found weren’t complex. The fixes weren’t clever. But without the plan, we’d have been jumping between files, discovering dependencies mid-edit, and second-guessing scope. With it, we shipped two PRs worth of hardening in the time it takes to drink a coffee.


git-steer is an MCP server for GitHub repository management. It handles repo settings, branch management, PR workflows, security scanning, and autonomous vulnerability remediation across 35+ repositories — all without cloning a single repo locally.