Using Playwright Agents with Azure DevOps

How to use Playwright agentic testing with Azure DevOps — AI-driven test execution, autonomous browser agents, and CI/CD pipeline integration.

Playwright's AI agent capabilities — particularly through MCP (Model Context Protocol) integration — enable a new class of autonomous testing: tests that can explore an application, identify issues, and generate reports without explicit scripted steps. This article covers how to integrate these capabilities into Azure DevOps pipelines.

What Playwright agents can do

Traditional Playwright tests are deterministic: you write explicit steps, the test follows them exactly. Playwright agents are goal-driven: you give them an objective, and they determine the steps.

TYPESCRIPT
1// Traditional Playwright (scripted)
2await page.goto('/checkout')
3await page.fill('[name="email"]', 'user@test.com')
4await page.click('[data-testid="submit"]')
5await expect(page.locator('.confirmation')).toBeVisible()
6
7// Agentic Playwright (goal-based — experimental)
8await agent.accomplish('Complete a checkout with email user@test.com and verify confirmation')

The agent uses a language model to interpret the current page state, decide the next action, and adapt when the UI behaves unexpectedly.

Playwright MCP integration

Playwright's MCP server exposes browser control to AI models. Set up in Azure DevOps:

YAML
1steps:
2  - script: npm ci
3  - script: npx playwright install chromium
4  - script: |
5      # Install Playwright MCP
6      npm install @playwright/mcp
7    displayName: Install Playwright MCP
8
9  - script: |
10      node scripts/agent-test.js
11    displayName: Run agent tests
12    env:
13      OPENAI_API_KEY: $(OPENAI_API_KEY)
14      BASE_URL: $(STAGING_URL)

TYPESCRIPT
1// scripts/agent-test.js
2import { chromium } from '@playwright/test'
3import { PlaywrightMCP } from '@playwright/mcp'
4import OpenAI from 'openai'
5
6const browser = await chromium.launch({ headless: true })
7const page = await browser.newPage()
8const mcp = new PlaywrightMCP(page)
9const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
10
11// Navigate to the app
12await page.goto(process.env.BASE_URL)
13
14// Give the AI agent a test objective
15const result = await openai.chat.completions.create({
16  model: 'gpt-4o',
17  tools: mcp.tools,
18  messages: [{
19    role: 'user',
20    content: `
21      Test the checkout flow on ${process.env.BASE_URL}.
22      Specifically verify:
23      1. Empty cart prevents checkout
24      2. Adding an item allows checkout
25      3. Valid payment method completes the order
26      Report any issues found as JSON.
27    `
28  }],
29  tool_choice: 'auto',
30})
31
32// Agent executes actions via MCP tools
33const report = await mcp.executeToolCalls(result.choices[0].message.tool_calls)
34console.log(JSON.stringify(report, null, 2))
35
36await browser.close()

Autonomous regression testing

Use AI agents for open-ended regression discovery — exploring areas of the application for unexpected behaviour:

TYPESCRIPT
1// Agent-based exploratory regression
2const explorationTargets = [
3  'Checkout flow — payment edge cases',
4  'User profile — edge cases around name and email updates',
5  'Search — boundary conditions on filters',
6]
7
8const findings = []
9
10for (const target of explorationTargets) {
11  await page.goto(process.env.BASE_URL)
12  
13  const result = await openai.chat.completions.create({
14    model: 'gpt-4o',
15    tools: mcp.tools,
16    messages: [{
17      role: 'user',
18      content: `
19        Explore "${target}" on this application.
20        Look for: error messages that don't make sense, UI that breaks,
21        actions that have no effect, unexpected page state.
22        Report findings as a JSON array with: description, severity, steps_to_reproduce.
23      `
24    }],
25  })
26
27  const sessionFindings = JSON.parse(/* extract from result */)
28  findings.push(...sessionFindings)
29}
30
31// Write findings to Azure DevOps as work items
32for (const finding of findings.filter(f => f.severity !== 'cosmetic')) {
33  await createAzureDevOpsBug(finding)
34}

Integrating agent results into Azure DevOps

Agent test results need to be structured to appear in the Tests tab:

TYPESCRIPT
1// Convert agent findings to JUnit XML
2import { Builder } from 'xml2js'
3
4function agentFindingsToJUnit(findings: Finding[]): string {
5  const failures = findings.filter(f => f.type === 'failure')
6  const passes   = findings.filter(f => f.type === 'pass')
7
8  return new Builder().buildObject({
9    testsuites: {
10      testsuite: {
11        $: { name: 'Agent Tests', tests: findings.length, failures: failures.length },
12        testcase: findings.map(f => ({
13          $: { name: f.description, classname: 'Agent' },
14          ...(f.type === 'failure' ? {
15            failure: { $: { message: f.description }, _: f.steps_to_reproduce }
16          } : {})
17        }))
18      }
19    }
20  })
21}

YAML
1- task: PublishTestResults@2
2  inputs:
3    testResultsFormat: JUnit
4    testResultsFiles: agent-results/results.xml
5    testRunTitle: Agent Testing — $(Build.BuildNumber)
6  condition: always()

Current limitations and best practices

Limitations:

Agent tests are non-deterministic — different runs may explore different paths
AI API costs scale with usage — budget for pipeline runs
Agents can get stuck in loops or take unexpected paths
Not suitable for regression assertions requiring exact values

Best practices:

Use agents for exploratory discovery, not for regression gating
Set a time limit for agent sessions: timeout: 5 minutes
Always have a human review agent-found bugs before creating work items
Run agent tests on a schedule (nightly), not on every PR

Common errors and fixes

Error: Agent gets stuck clicking the same element repeatedly Fix: Add a maxSteps limit and an instruction: "If you've taken more than 20 actions without making progress, stop and report what you've found so far."

Error: OpenAI rate limits exceed during parallel agent sessions Fix: Run agent sessions sequentially or use exponential backoff. Each agent session makes dozens of API calls; parallel sessions quickly hit rate limits.

Error: Agent findings include false positives (reporting expected behaviour as bugs) Fix: Provide context in the prompt: "The following behaviours are expected: [list]". Also run agents against a known-good environment first to calibrate what "normal" looks like.

Using Playwright Agents with Azure DevOps

What Playwright agents can do

Playwright MCP integration

Autonomous regression testing

Integrating agent results into Azure DevOps

Current limitations and best practices

Common errors and fixes

Share this article

Follow for more

Related Posts

AI Testing with Azure DevOps: 2026 Guide

Agentic AI and Autonomous Testing

Agentic AI in Quality Engineering: Intro

AI Test Generation with Claude & Playwright

AI Testing with Azure DevOps: 2026 Guide

Agentic AI and Autonomous Testing

Agentic AI in Quality Engineering: Intro

AI Test Generation with Claude & Playwright