Skip to main content
Back to blog

Using Playwright Agents with Azure DevOps

How to use Playwright's agent and agentic testing capabilities with Azure DevOps pipelines. Covers AI-driven test execution, autonomous test generation, agent-based regression, and integrating Playwright MCP with Azure CI/CD.

InnovateBits5 min read
Share

Playwright's AI agent capabilities — particularly through MCP (Model Context Protocol) integration — enable a new class of autonomous testing: tests that can explore an application, identify issues, and generate reports without explicit scripted steps. This article covers how to integrate these capabilities into Azure DevOps pipelines.


What Playwright agents can do

Traditional Playwright tests are deterministic: you write explicit steps, the test follows them exactly. Playwright agents are goal-driven: you give them an objective, and they determine the steps.

// Traditional Playwright (scripted)
await page.goto('/checkout')
await page.fill('[name="email"]', 'user@test.com')
await page.click('[data-testid="submit"]')
await expect(page.locator('.confirmation')).toBeVisible()
 
// Agentic Playwright (goal-based — experimental)
await agent.accomplish('Complete a checkout with email user@test.com and verify confirmation')

The agent uses a language model to interpret the current page state, decide the next action, and adapt when the UI behaves unexpectedly.


Playwright MCP integration

Playwright's MCP server exposes browser control to AI models. Set up in Azure DevOps:

steps:
  - script: npm ci
  - script: npx playwright install chromium
  - script: |
      # Install Playwright MCP
      npm install @playwright/mcp
    displayName: Install Playwright MCP
 
  - script: |
      node scripts/agent-test.js
    displayName: Run agent tests
    env:
      OPENAI_API_KEY: $(OPENAI_API_KEY)
      BASE_URL: $(STAGING_URL)
// scripts/agent-test.js
import { chromium } from '@playwright/test'
import { PlaywrightMCP } from '@playwright/mcp'
import OpenAI from 'openai'
 
const browser = await chromium.launch({ headless: true })
const page = await browser.newPage()
const mcp = new PlaywrightMCP(page)
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
 
// Navigate to the app
await page.goto(process.env.BASE_URL)
 
// Give the AI agent a test objective
const result = await openai.chat.completions.create({
  model: 'gpt-4o',
  tools: mcp.tools,
  messages: [{
    role: 'user',
    content: `
      Test the checkout flow on ${process.env.BASE_URL}.
      Specifically verify:
      1. Empty cart prevents checkout
      2. Adding an item allows checkout
      3. Valid payment method completes the order
      Report any issues found as JSON.
    `
  }],
  tool_choice: 'auto',
})
 
// Agent executes actions via MCP tools
const report = await mcp.executeToolCalls(result.choices[0].message.tool_calls)
console.log(JSON.stringify(report, null, 2))
 
await browser.close()

Autonomous regression testing

Use AI agents for open-ended regression discovery — exploring areas of the application for unexpected behaviour:

// Agent-based exploratory regression
const explorationTargets = [
  'Checkout flow — payment edge cases',
  'User profile — edge cases around name and email updates',
  'Search — boundary conditions on filters',
]
 
const findings = []
 
for (const target of explorationTargets) {
  await page.goto(process.env.BASE_URL)
  
  const result = await openai.chat.completions.create({
    model: 'gpt-4o',
    tools: mcp.tools,
    messages: [{
      role: 'user',
      content: `
        Explore "${target}" on this application.
        Look for: error messages that don't make sense, UI that breaks,
        actions that have no effect, unexpected page state.
        Report findings as a JSON array with: description, severity, steps_to_reproduce.
      `
    }],
  })
 
  const sessionFindings = JSON.parse(/* extract from result */)
  findings.push(...sessionFindings)
}
 
// Write findings to Azure DevOps as work items
for (const finding of findings.filter(f => f.severity !== 'cosmetic')) {
  await createAzureDevOpsBug(finding)
}

Integrating agent results into Azure DevOps

Agent test results need to be structured to appear in the Tests tab:

// Convert agent findings to JUnit XML
import { Builder } from 'xml2js'
 
function agentFindingsToJUnit(findings: Finding[]): string {
  const failures = findings.filter(f => f.type === 'failure')
  const passes   = findings.filter(f => f.type === 'pass')
 
  return new Builder().buildObject({
    testsuites: {
      testsuite: {
        $: { name: 'Agent Tests', tests: findings.length, failures: failures.length },
        testcase: findings.map(f => ({
          $: { name: f.description, classname: 'Agent' },
          ...(f.type === 'failure' ? {
            failure: { $: { message: f.description }, _: f.steps_to_reproduce }
          } : {})
        }))
      }
    }
  })
}
- task: PublishTestResults@2
  inputs:
    testResultsFormat: JUnit
    testResultsFiles: agent-results/results.xml
    testRunTitle: Agent Testing — $(Build.BuildNumber)
  condition: always()

Current limitations and best practices

Limitations:

  • Agent tests are non-deterministic — different runs may explore different paths
  • AI API costs scale with usage — budget for pipeline runs
  • Agents can get stuck in loops or take unexpected paths
  • Not suitable for regression assertions requiring exact values

Best practices:

  • Use agents for exploratory discovery, not for regression gating
  • Set a time limit for agent sessions: timeout: 5 minutes
  • Always have a human review agent-found bugs before creating work items
  • Run agent tests on a schedule (nightly), not on every PR

Common errors and fixes

Error: Agent gets stuck clicking the same element repeatedly Fix: Add a maxSteps limit and an instruction: "If you've taken more than 20 actions without making progress, stop and report what you've found so far."

Error: OpenAI rate limits exceed during parallel agent sessions Fix: Run agent sessions sequentially or use exponential backoff. Each agent session makes dozens of API calls; parallel sessions quickly hit rate limits.

Error: Agent findings include false positives (reporting expected behaviour as bugs) Fix: Provide context in the prompt: "The following behaviours are expected: [list]". Also run agents against a known-good environment first to calibrate what "normal" looks like.

Free newsletter

Stay ahead in AI-driven QA

Get practical tutorials on test automation, AI testing, and quality engineering — straight to your inbox. No spam, unsubscribe any time.

Discussion

Sign in with GitHub to comment · powered by Giscus