AI Transformation 5 min read

Using Playwright Agents with Azure DevOps

How to use Playwright agentic testing with Azure DevOps — AI-driven test execution, autonomous browser agents, and CI/CD pipeline integration.

I
InnovateBits
InnovateBits

Playwright's AI agent capabilities — particularly through MCP (Model Context Protocol) integration — enable a new class of autonomous testing: tests that can explore an application, identify issues, and generate reports without explicit scripted steps. This article covers how to integrate these capabilities into Azure DevOps pipelines.


What Playwright agents can do

Traditional Playwright tests are deterministic: you write explicit steps, the test follows them exactly. Playwright agents are goal-driven: you give them an objective, and they determine the steps.

TYPESCRIPT
1// Traditional Playwright (scripted) 2await page.goto('/checkout') 3await page.fill('[name="email"]', 'user@test.com') 4await page.click('[data-testid="submit"]') 5await expect(page.locator('.confirmation')).toBeVisible() 6 7// Agentic Playwright (goal-based — experimental) 8await agent.accomplish('Complete a checkout with email user@test.com and verify confirmation')

The agent uses a language model to interpret the current page state, decide the next action, and adapt when the UI behaves unexpectedly.


Playwright MCP integration

Playwright's MCP server exposes browser control to AI models. Set up in Azure DevOps:

YAML
1steps: 2 - script: npm ci 3 - script: npx playwright install chromium 4 - script: | 5 # Install Playwright MCP 6 npm install @playwright/mcp 7 displayName: Install Playwright MCP 8 9 - script: | 10 node scripts/agent-test.js 11 displayName: Run agent tests 12 env: 13 OPENAI_API_KEY: $(OPENAI_API_KEY) 14 BASE_URL: $(STAGING_URL)
TYPESCRIPT
1// scripts/agent-test.js 2import { chromium } from '@playwright/test' 3import { PlaywrightMCP } from '@playwright/mcp' 4import OpenAI from 'openai' 5 6const browser = await chromium.launch({ headless: true }) 7const page = await browser.newPage() 8const mcp = new PlaywrightMCP(page) 9const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }) 10 11// Navigate to the app 12await page.goto(process.env.BASE_URL) 13 14// Give the AI agent a test objective 15const result = await openai.chat.completions.create({ 16 model: 'gpt-4o', 17 tools: mcp.tools, 18 messages: [{ 19 role: 'user', 20 content: ` 21 Test the checkout flow on ${process.env.BASE_URL}. 22 Specifically verify: 23 1. Empty cart prevents checkout 24 2. Adding an item allows checkout 25 3. Valid payment method completes the order 26 Report any issues found as JSON. 27 ` 28 }], 29 tool_choice: 'auto', 30}) 31 32// Agent executes actions via MCP tools 33const report = await mcp.executeToolCalls(result.choices[0].message.tool_calls) 34console.log(JSON.stringify(report, null, 2)) 35 36await browser.close()

Autonomous regression testing

Use AI agents for open-ended regression discovery — exploring areas of the application for unexpected behaviour:

TYPESCRIPT
1// Agent-based exploratory regression 2const explorationTargets = [ 3 'Checkout flow — payment edge cases', 4 'User profile — edge cases around name and email updates', 5 'Search — boundary conditions on filters', 6] 7 8const findings = [] 9 10for (const target of explorationTargets) { 11 await page.goto(process.env.BASE_URL) 12 13 const result = await openai.chat.completions.create({ 14 model: 'gpt-4o', 15 tools: mcp.tools, 16 messages: [{ 17 role: 'user', 18 content: ` 19 Explore "${target}" on this application. 20 Look for: error messages that don't make sense, UI that breaks, 21 actions that have no effect, unexpected page state. 22 Report findings as a JSON array with: description, severity, steps_to_reproduce. 23 ` 24 }], 25 }) 26 27 const sessionFindings = JSON.parse(/* extract from result */) 28 findings.push(...sessionFindings) 29} 30 31// Write findings to Azure DevOps as work items 32for (const finding of findings.filter(f => f.severity !== 'cosmetic')) { 33 await createAzureDevOpsBug(finding) 34}

Integrating agent results into Azure DevOps

Agent test results need to be structured to appear in the Tests tab:

TYPESCRIPT
1// Convert agent findings to JUnit XML 2import { Builder } from 'xml2js' 3 4function agentFindingsToJUnit(findings: Finding[]): string { 5 const failures = findings.filter(f => f.type === 'failure') 6 const passes = findings.filter(f => f.type === 'pass') 7 8 return new Builder().buildObject({ 9 testsuites: { 10 testsuite: { 11 $: { name: 'Agent Tests', tests: findings.length, failures: failures.length }, 12 testcase: findings.map(f => ({ 13 $: { name: f.description, classname: 'Agent' }, 14 ...(f.type === 'failure' ? { 15 failure: { $: { message: f.description }, _: f.steps_to_reproduce } 16 } : {}) 17 })) 18 } 19 } 20 }) 21}
YAML
1- task: PublishTestResults@2 2 inputs: 3 testResultsFormat: JUnit 4 testResultsFiles: agent-results/results.xml 5 testRunTitle: Agent Testing — $(Build.BuildNumber) 6 condition: always()

Current limitations and best practices

Limitations:

  • Agent tests are non-deterministic — different runs may explore different paths
  • AI API costs scale with usage — budget for pipeline runs
  • Agents can get stuck in loops or take unexpected paths
  • Not suitable for regression assertions requiring exact values

Best practices:

  • Use agents for exploratory discovery, not for regression gating
  • Set a time limit for agent sessions: timeout: 5 minutes
  • Always have a human review agent-found bugs before creating work items
  • Run agent tests on a schedule (nightly), not on every PR

Common errors and fixes

Error: Agent gets stuck clicking the same element repeatedly Fix: Add a maxSteps limit and an instruction: "If you've taken more than 20 actions without making progress, stop and report what you've found so far."

Error: OpenAI rate limits exceed during parallel agent sessions Fix: Run agent sessions sequentially or use exponential backoff. Each agent session makes dozens of API calls; parallel sessions quickly hit rate limits.

Error: Agent findings include false positives (reporting expected behaviour as bugs) Fix: Provide context in the prompt: "The following behaviours are expected: [list]". Also run agents against a known-good environment first to calibrate what "normal" looks like.

Tags
#playwright#agentic-ai#azure-devops#ai-testing#playwright-mcp#autonomous-testing

Share this article

Follow for more

Follow me on social media for more developer tips, tricks, and tutorials. Let's connect and build something great together!