TutorialsApril 12, 2026·6 min read

Building an AI Agent That Manages Your GitHub PRs Automatically

I got tired of the PR review bottleneck on my team. Three developers, dozens of PRs a week, and half of them sitting for days before anyone looked at them. So I built an AI agent that handles the first pass of every pull request — summarizing changes, flagging potential issues, checking for common mistakes, and even suggesting improvements.

It is not replacing human reviewers. But it cuts the time from "PR opened" to "meaningful feedback" from days to minutes. Here is exactly how I built it and what I learned along the way.

The Architecture

The system has three components:

A GitHub webhook that fires on PR events (opened, updated, ready for review)
A Node.js service that processes the webhook, fetches the diff, and sends it to Claude
The Claude API that analyzes the code and generates review comments

I considered using a pre-built solution like CodeRabbit or GitHub Copilot code review, but I wanted full control over the review criteria and the ability to customize it for our codebase.

Setting Up the Webhook Handler

First, the webhook receiver. I used Express because it is simple and I did not need anything fancy:

import express from 'express';
import crypto from 'crypto';

const app = express();
app.use(express.json());

function verifyWebhookSignature(req, secret) {
  const signature = req.headers['x-hub-signature-256'];
  const hmac = crypto.createHmac('sha256', secret);
  const digest = 'sha256=' + hmac.update(JSON.stringify(req.body)).digest('hex');
  return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(digest));
}

app.post('/webhook', async (req, res) => {
  if (!verifyWebhookSignature(req, process.env.GITHUB_WEBHOOK_SECRET)) {
    return res.status(401).send('Invalid signature');
  }

  const event = req.headers['x-github-event'];
  if (event === 'pull_request') {
    const { action, pull_request } = req.body;
    if (['opened', 'synchronize', 'ready_for_review'].includes(action)) {
      await reviewPR(pull_request);
    }
  }

  res.status(200).send('OK');
});

app.listen(3000);

The signature verification is important. Without it, anyone could send fake webhook payloads to your endpoint and trigger reviews on arbitrary code.

Fetching and Processing the Diff

GitHub's API gives you the diff in a couple of formats. I found that fetching individual file patches works better than the full diff because you can process each file separately and give the AI more focused context:

async function getPRFiles(owner, repo, prNumber) {
  const response = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}/files`,
    {
      headers: {
        'Authorization': `Bearer ${process.env.GITHUB_TOKEN}`,
        'Accept': 'application/vnd.github.v3+json'
      }
    }
  );
  return response.json();
}

async function reviewPR(pullRequest) {
  const files = await getPRFiles(
    pullRequest.base.repo.owner.login,
    pullRequest.base.repo.name,
    pullRequest.number
  );

  // Filter out files we do not want to review
  const reviewableFiles = files.filter(f =>
    !f.filename.includes('package-lock.json') &&
    !f.filename.includes('node_modules') &&
    !f.filename.endsWith('.min.js') &&
    f.patch // Skip binary files
  );

  const reviews = [];
  for (const file of reviewableFiles) {
    const review = await analyzeFile(file, pullRequest);
    if (review) reviews.push(review);
  }

  await postReviewComments(pullRequest, reviews);
}

The AI Review Logic

This is where it gets interesting. The prompt engineering took the most iteration. My first attempt just said "review this code" and the AI gave generic advice. The key was being specific about what to look for:

async function analyzeFile(file, pullRequest) {
  const prompt = `You are reviewing a pull request for a ${getProjectType()} project.

PR Title: ${pullRequest.title}
PR Description: ${pullRequest.body || 'No description provided'}

File: ${file.filename}
Changes:
${file.patch}

Review this code change for:
1. Bugs or logic errors
2. Security vulnerabilities (SQL injection, XSS, auth bypasses)
3. Performance issues (N+1 queries, unnecessary re-renders, memory leaks)
4. Missing error handling
5. Breaking changes to public APIs

Do NOT comment on:
- Code style or formatting (our linter handles that)
- Minor naming preferences
- Adding comments to obvious code

For each issue found, respond with JSON:
{
  "comments": [
    {
      "line": <line number in the diff>,
      "severity": "critical" | "warning" | "suggestion",
      "message": "<clear explanation of the issue and how to fix it>"
    }
  ]
}

If the code looks good, return { "comments": [] }`;

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 2000,
    messages: [{ role: 'user', content: prompt }]
  });

  return JSON.parse(response.content[0].text);
}

The "Do NOT comment on" Section Is Critical

Without explicit exclusions, the AI will nitpick everything. Nobody wants an AI reviewer that comments on every variable name. The exclusion list keeps the signal-to-noise ratio high, which is what makes people actually read the AI's feedback instead of ignoring it.

Posting Comments Back to GitHub

GitHub's review API lets you post inline comments on specific lines of a PR. This is way better than dumping everything in a single comment — developers see the feedback right next to the relevant code:

async function postReviewComments(pullRequest, reviews) {
  const comments = reviews.flatMap(r => r.comments || []);

  if (comments.length === 0) {
    // Post a simple approval comment
    await postComment(pullRequest, 'AI Review: No issues found. Looks good!');
    return;
  }

  const criticalCount = comments.filter(c => c.severity === 'critical').length;
  const body = criticalCount > 0
    ? `AI Review: Found ${criticalCount} critical issue(s) that should be addressed.`
    : `AI Review: Found ${comments.length} suggestion(s) for improvement.`;

  await createReview(pullRequest, body, comments);
}

What I Learned After 3 Months

The agent has reviewed over 400 PRs now. Here is what surprised me:

It catches real bugs. About 15% of PRs get a critical finding, and roughly 80% of those are legitimate issues. That is a better hit rate than I expected.
Speed matters more than depth. Getting feedback in 2 minutes instead of 2 days changed our team's workflow more than the quality of the feedback itself.
False positives kill trust. Early on, the agent flagged too many non-issues. Developers started ignoring it. Tuning the prompt to be more conservative was essential.
Context is everything. The agent does not know your codebase conventions. I added a REVIEW_GUIDELINES.md file that gets included in every prompt, and the quality jumped significantly.

Cost Breakdown

Running this on Claude Sonnet for a team of 3 developers doing maybe 40 PRs a week costs about $15-25/month in API calls. Each review processes 5-10 files and uses roughly 3,000-5,000 tokens per file. That is absurdly cheap for the time it saves.

Should You Build This?

If your team has more than 2 developers and PRs regularly sit unreviewed for more than a few hours, yes. The setup takes an afternoon, and the ROI is immediate. It does not replace human review — it makes human review faster by handling the mechanical checks so reviewers can focus on architecture and design decisions.

The full source code is about 300 lines of TypeScript. No framework, no complex infrastructure. Just a webhook, an API call, and some careful prompt engineering.

Tutorial Automation Claude GitHub

Share this article

Twitter Facebook LinkedIn