Skip to main content
Back to blog

AI in Testing with Azure DevOps: 2025–2026 Guide

How to use AI-powered testing tools with Azure DevOps in 2025–2026. Covers GitHub Copilot for test generation, AI-assisted defect triage, intelligent test selection, and integrating AI test tools into Azure Pipelines.

InnovateBits5 min read
Share

AI is changing what's possible in test automation. Tasks that previously took hours — writing test cases from requirements, triaging failures, selecting which tests to run — are becoming automated. Azure DevOps integrates with several AI tools that make these capabilities available directly in your existing pipeline.


AI-assisted test case generation with GitHub Copilot

GitHub Copilot (available in VS Code, JetBrains, and Copilot Workspace) generates test cases from code context. Combined with Azure DevOps, the workflow is:

  1. Developer opens a PR in Azure Repos
  2. QA engineer reviews the code diff in VS Code with Copilot
  3. Copilot suggests test cases based on the changed code

Prompt pattern for Playwright test generation:

// Given this function:
async function applyDiscount(code: string, cartTotal: number): Promise<number> {
  const discount = await discountService.validate(code)
  if (!discount.valid) throw new Error(discount.reason)
  return cartTotal * (1 - discount.percentage / 100)
}

// Generate Playwright tests covering:
// - Valid code, correct calculation
// - Invalid code throws correct error
// - Expired code error
// - Boundary: 0% discount
// - Boundary: 100% discount

Copilot generates structured test code that a QA engineer reviews and refines — cutting initial test writing time by 60–70%.


Integrating AI test generation into the pipeline

Use OpenAI or Azure OpenAI to generate test stubs when new code is merged:

- stage: AITestGeneration
  displayName: AI Test Suggestions
  condition: eq(variables['Build.Reason'], 'PullRequest')
  jobs:
    - job: GenerateTests
      steps:
        - script: npm ci
        - script: |
            node scripts/generate-test-suggestions.js \
              --diff "$(git diff origin/main...HEAD)" \
              --output suggestions/new-tests.md
          displayName: Generate AI test suggestions
          env:
            OPENAI_API_KEY: $(OPENAI_API_KEY)
 
        - task: CreatePRComment@1
          inputs:
            content: |
              ## AI Test Suggestions
              $(cat suggestions/new-tests.md)
          displayName: Post test suggestions to PR

The AI-generated suggestions appear as a PR comment — the QA engineer reviews and implements the ones that add value.


AI-powered failure triage

When the pipeline fails, an AI triage step categorises the failure before anyone investigates manually:

- job: AITriage
  dependsOn: E2ETests
  condition: failed()
  steps:
    - script: |
        node scripts/triage-failures.js \
          --results test-results/results.xml \
          --output triage/triage-report.md
      displayName: AI failure triage
      env:
        OPENAI_API_KEY: $(OPENAI_API_KEY)
 
    - task: CreateWorkItem@1
      inputs:
        workItemType: Task
        title: '[AI Triage] Pipeline failure — $(Build.BuildNumber)'
        description: $(cat triage/triage-report.md)
        assignTo: $(QA_LEAD_EMAIL)

The triage script reads the JUnit XML, extracts error messages, and asks the AI: "Is this a product bug, environment issue, test flakiness, or test code bug? Suggest the next investigation step."

Example triage output:

Test: checkout > payment > credit card validation
Error: Expected 422, received 500
Triage: Likely a product bug. The server returned a 500 (unhandled error) 
        when given an invalid card number. Expected behaviour is a 422 
        validation error. Recommend: check payment service error handling 
        for invalid card formats.
Suggested action: Assign to payments team, investigate card validation logic.

Intelligent test selection

Run only tests likely to fail based on which code changed:

- script: |
    node scripts/select-tests.js \
      --changed-files "$(git diff --name-only origin/main...HEAD)" \
      --output selected-tests.txt
  displayName: AI test selection
 
- script: |
    TESTS=$(cat selected-tests.txt)
    npx playwright test $TESTS
  displayName: Run selected tests

The selection script uses:

  1. File-to-test mapping: which test files cover which source files (via import analysis)
  2. Historical failure data: tests that have failed when similar files changed previously
  3. Risk weighting: critical-tagged tests always run regardless

This reduces PR pipeline time from 15 minutes to 3–5 minutes by running only the 20–30 tests most likely to catch regressions.


Azure AI services for test validation

Use Azure Cognitive Services for testing AI-generated content:

// Testing that AI-generated product descriptions are appropriate
import { ContentSafetyClient } from '@azure/ai-content-safety'
 
test('AI product descriptions pass content safety check', async ({ request }) => {
  const client = new ContentSafetyClient(
    process.env.CONTENT_SAFETY_ENDPOINT!,
    new AzureKeyCredential(process.env.CONTENT_SAFETY_KEY!)
  )
 
  const response = await request.get('/api/products/ai-descriptions')
  const products = await response.json()
 
  for (const product of products) {
    const analysis = await client.analyzeText({ text: product.description })
    expect(analysis.hateResult?.severity).toBe(0)
    expect(analysis.violenceResult?.severity).toBe(0)
  }
})

Self-healing selectors (emerging capability)

Some AI testing tools (Healenium, Testim, Mabl) automatically update broken CSS selectors when UI changes. Integration with Azure DevOps:

- script: |
    # Healenium proxy for self-healing Selenium
    docker run -d \
      -p 8085:8085 \
      -e spring.datasource.url=jdbc:postgresql://$(DB_HOST):5432/healenium \
      healenium/hlm-proxy:latest
  displayName: Start Healenium proxy
 
- script: mvn test -Dselenide.remote=http://localhost:8085/wd/hub
  displayName: Run tests with self-healing

When a selector breaks, Healenium finds the element using visual similarity and updates the locator — preventing test failures from cosmetic UI changes.


Common errors and fixes

Error: OpenAI API rate limits hit during pipeline test generation Fix: Cache AI responses for identical inputs. The diff for a small PR often generates the same test suggestions — no need to call the API twice.

Error: AI triage misclassifies obvious environment failures as product bugs Fix: Add a pre-check step: if the error message contains "ECONNREFUSED" or "timeout", classify as environment failure immediately without calling the AI.

Error: Intelligent test selection skips critical tests that should always run Fix: Maintain an explicit always-run.txt list of critical test IDs. The selection script always includes these regardless of what changed.

Free newsletter

Stay ahead in AI-driven QA

Get practical tutorials on test automation, AI testing, and quality engineering — straight to your inbox. No spam, unsubscribe any time.

Discussion

Sign in with GitHub to comment · powered by Giscus