Using Playwright Agents with Azure DevOps
How to use Playwright's agent and agentic testing capabilities with Azure DevOps pipelines. Covers AI-driven test execution, autonomous test generation, agent-based regression, and integrating Playwright MCP with Azure CI/CD.
Playwright's AI agent capabilities — particularly through MCP (Model Context Protocol) integration — enable a new class of autonomous testing: tests that can explore an application, identify issues, and generate reports without explicit scripted steps. This article covers how to integrate these capabilities into Azure DevOps pipelines.
What Playwright agents can do
Traditional Playwright tests are deterministic: you write explicit steps, the test follows them exactly. Playwright agents are goal-driven: you give them an objective, and they determine the steps.
// Traditional Playwright (scripted)
await page.goto('/checkout')
await page.fill('[name="email"]', 'user@test.com')
await page.click('[data-testid="submit"]')
await expect(page.locator('.confirmation')).toBeVisible()
// Agentic Playwright (goal-based — experimental)
await agent.accomplish('Complete a checkout with email user@test.com and verify confirmation')The agent uses a language model to interpret the current page state, decide the next action, and adapt when the UI behaves unexpectedly.
Playwright MCP integration
Playwright's MCP server exposes browser control to AI models. Set up in Azure DevOps:
steps:
- script: npm ci
- script: npx playwright install chromium
- script: |
# Install Playwright MCP
npm install @playwright/mcp
displayName: Install Playwright MCP
- script: |
node scripts/agent-test.js
displayName: Run agent tests
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
BASE_URL: $(STAGING_URL)// scripts/agent-test.js
import { chromium } from '@playwright/test'
import { PlaywrightMCP } from '@playwright/mcp'
import OpenAI from 'openai'
const browser = await chromium.launch({ headless: true })
const page = await browser.newPage()
const mcp = new PlaywrightMCP(page)
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
// Navigate to the app
await page.goto(process.env.BASE_URL)
// Give the AI agent a test objective
const result = await openai.chat.completions.create({
model: 'gpt-4o',
tools: mcp.tools,
messages: [{
role: 'user',
content: `
Test the checkout flow on ${process.env.BASE_URL}.
Specifically verify:
1. Empty cart prevents checkout
2. Adding an item allows checkout
3. Valid payment method completes the order
Report any issues found as JSON.
`
}],
tool_choice: 'auto',
})
// Agent executes actions via MCP tools
const report = await mcp.executeToolCalls(result.choices[0].message.tool_calls)
console.log(JSON.stringify(report, null, 2))
await browser.close()Autonomous regression testing
Use AI agents for open-ended regression discovery — exploring areas of the application for unexpected behaviour:
// Agent-based exploratory regression
const explorationTargets = [
'Checkout flow — payment edge cases',
'User profile — edge cases around name and email updates',
'Search — boundary conditions on filters',
]
const findings = []
for (const target of explorationTargets) {
await page.goto(process.env.BASE_URL)
const result = await openai.chat.completions.create({
model: 'gpt-4o',
tools: mcp.tools,
messages: [{
role: 'user',
content: `
Explore "${target}" on this application.
Look for: error messages that don't make sense, UI that breaks,
actions that have no effect, unexpected page state.
Report findings as a JSON array with: description, severity, steps_to_reproduce.
`
}],
})
const sessionFindings = JSON.parse(/* extract from result */)
findings.push(...sessionFindings)
}
// Write findings to Azure DevOps as work items
for (const finding of findings.filter(f => f.severity !== 'cosmetic')) {
await createAzureDevOpsBug(finding)
}Integrating agent results into Azure DevOps
Agent test results need to be structured to appear in the Tests tab:
// Convert agent findings to JUnit XML
import { Builder } from 'xml2js'
function agentFindingsToJUnit(findings: Finding[]): string {
const failures = findings.filter(f => f.type === 'failure')
const passes = findings.filter(f => f.type === 'pass')
return new Builder().buildObject({
testsuites: {
testsuite: {
$: { name: 'Agent Tests', tests: findings.length, failures: failures.length },
testcase: findings.map(f => ({
$: { name: f.description, classname: 'Agent' },
...(f.type === 'failure' ? {
failure: { $: { message: f.description }, _: f.steps_to_reproduce }
} : {})
}))
}
}
})
}- task: PublishTestResults@2
inputs:
testResultsFormat: JUnit
testResultsFiles: agent-results/results.xml
testRunTitle: Agent Testing — $(Build.BuildNumber)
condition: always()Current limitations and best practices
Limitations:
- Agent tests are non-deterministic — different runs may explore different paths
- AI API costs scale with usage — budget for pipeline runs
- Agents can get stuck in loops or take unexpected paths
- Not suitable for regression assertions requiring exact values
Best practices:
- Use agents for exploratory discovery, not for regression gating
- Set a time limit for agent sessions:
timeout: 5 minutes - Always have a human review agent-found bugs before creating work items
- Run agent tests on a schedule (nightly), not on every PR
Common errors and fixes
Error: Agent gets stuck clicking the same element repeatedly
Fix: Add a maxSteps limit and an instruction: "If you've taken more than 20 actions without making progress, stop and report what you've found so far."
Error: OpenAI rate limits exceed during parallel agent sessions Fix: Run agent sessions sequentially or use exponential backoff. Each agent session makes dozens of API calls; parallel sessions quickly hit rate limits.
Error: Agent findings include false positives (reporting expected behaviour as bugs) Fix: Provide context in the prompt: "The following behaviours are expected: [list]". Also run agents against a known-good environment first to calibrate what "normal" looks like.
Stay ahead in AI-driven QA
Get practical tutorials on test automation, AI testing, and quality engineering — straight to your inbox. No spam, unsubscribe any time.
Discussion
Sign in with GitHub to comment · powered by Giscus