Shift-Right Testing: Quality in Production
Shift-right testing goes beyond shift-left to embed continuous quality validation in production. Learn canary releases, synthetic monitoring, chaos.
For the past decade, "shift-left" has been the dominant quality engineering mantra — test earlier, test closer to the requirements, find defects before they're expensive. The advice is sound and the results are real. But shift-left alone has a blind spot: staging environments are not production.
No matter how good your pre-deployment testing is, production has data, traffic patterns, infrastructure configurations, third-party integrations, and user behaviour that staging never fully replicates. The World Quality Report 2025–26 found that 38% of organisations are now running shift-right pilots — validating quality continuously in production, not just before deployment.
This guide covers the strategies, tools, and practices that make up a mature shift-right quality approach.
What Shift-Right Testing Is
Shift-right testing is the practice of continuing quality validation after deployment, using production systems and real user traffic as the test environment. It is not a replacement for pre-deployment testing — it's an additional layer that catches defects that only manifest at scale, in real conditions.
The core insight: the question "did our tests pass?" is less useful than "is our production system healthy for real users right now?"
Shift-right encompasses:
- Synthetic monitoring — scripted user journeys running continuously against production
- Canary releases — gradual rollouts with automated quality gates
- Chaos engineering — deliberate fault injection to validate resilience
- Observability-driven QA — deriving test insights from production telemetry
- A/B testing validation — quality checks across experiment variants
- Feature flag testing — validating behaviour across flag configurations
Synthetic Monitoring
Synthetic monitoring runs automated tests against production on a continuous schedule — every minute, every 5 minutes, every hour. Unlike passive alerting (which fires when something already broken reaches users), synthetic monitoring proactively checks critical flows before users encounter issues.
Setting up Playwright-based synthetic monitoring with Checkly
Checkly executes Playwright scripts on a schedule and alerts on failures. You write tests once and they run from multiple geographic locations continuously.
TYPESCRIPT1// checkly.config.ts 2import { defineConfig } from 'checkly' 3import { Frequency } from 'checkly/constructs' 4 5export default defineConfig({ 6 projectName: 'InnovateBits Production', 7 logicalId: 'innovatebits-prod', 8 repoUrl: 'https://github.com/your-org/your-repo', 9 checks: { 10 activated: true, 11 muted: false, 12 runtimeId: '2024.02', 13 frequency: Frequency.EVERY_5M, 14 locations: ['us-east-1', 'eu-west-1', 'ap-southeast-1'], 15 tags: ['production'], 16 alertChannels: [], 17 checkMatch: '**/__checks__/**/*.check.ts', 18 }, 19})
TYPESCRIPT1// __checks__/homepage.check.ts 2import { BrowserCheck, Frequency } from 'checkly/constructs' 3 4new BrowserCheck('homepage-check', { 5 name: 'Homepage loads correctly', 6 frequency: Frequency.EVERY_5M, 7 locations: ['us-east-1', 'eu-west-1'], 8 code: { 9 entrypoint: './homepage.spec.ts' 10 } 11})
TYPESCRIPT1// homepage.spec.ts (standard Playwright test) 2import { test, expect } from '@playwright/test' 3 4test('homepage is accessible and functional', async ({ page }) => { 5 await page.goto(process.env.ENVIRONMENT_URL ?? 'https://www.yourapp.com') 6 7 // Core availability check 8 await expect(page).toHaveTitle(/YourApp/) 9 10 // Navigation functional 11 await expect(page.getByRole('navigation')).toBeVisible() 12 13 // Core CTA present 14 await expect(page.getByRole('button', { name: /get started/i })).toBeVisible() 15 16 // Performance check (basic) 17 const navigationTiming = await page.evaluate(() => { 18 const [entry] = performance.getEntriesByType('navigation') as PerformanceNavigationTiming[] 19 return entry?.loadEventEnd - entry?.startTime 20 }) 21 expect(navigationTiming).toBeLessThan(5000) // Alert if >5s load time 22})
What to monitor synthetically
Not everything warrants synthetic monitoring. Prioritise flows where:
- A failure would directly impact revenue (checkout, payment, auth)
- A failure would be invisible until many users experience it
- Dependencies on third-party services could silently degrade
Standard synthetic monitoring suite:
- Homepage availability + load time
- Login flow (creates a real session)
- Core user journey (add to cart, checkout initiation)
- API health endpoints
- Key integrations (payment provider, auth service, search)
Canary Releases with Quality Gates
A canary release deploys new code to a small percentage of production traffic (typically 1–10%) while keeping the majority on the stable version. Quality gates monitor the canary cohort and trigger automatic rollback if metrics degrade.
Quality gate metrics
Error rate gate: If the error rate on the canary cohort exceeds baseline by more than 1%, pause or rollback.
Latency gate: If p95 latency increases by more than 20%, investigate before continuing rollout.
Business metric gate: If conversion rate or key business events drop significantly in the canary cohort, rollback.
Implementation with Argo Rollouts (Kubernetes)
YAML1# argo-rollout.yaml 2apiVersion: argoproj.io/v1alpha1 3kind: Rollout 4metadata: 5 name: my-app 6spec: 7 strategy: 8 canary: 9 steps: 10 - setWeight: 5 # 5% canary 11 - pause: { duration: 10m } 12 - analysis: 13 templates: 14 - templateName: error-rate-check 15 - setWeight: 25 # 25% if analysis passed 16 - pause: { duration: 10m } 17 - analysis: 18 templates: 19 - templateName: latency-check 20 - setWeight: 100 # Full rollout 21--- 22apiVersion: argoproj.io/v1alpha1 23kind: AnalysisTemplate 24metadata: 25 name: error-rate-check 26spec: 27 metrics: 28 - name: error-rate 29 interval: 1m 30 successCondition: result[0] < 0.01 # <1% error rate 31 failureLimit: 3 32 provider: 33 prometheus: 34 address: http://prometheus.monitoring:9090 35 query: | 36 sum(rate(http_requests_total{status=~"5.."}[5m])) 37 / 38 sum(rate(http_requests_total[5m]))
Chaos Engineering
Chaos engineering is the practice of deliberately introducing failures into your production system to validate that it handles them gracefully. The goal is to find resilience gaps before real failures find them for you.
Netflix famously pioneered chaos engineering with Chaos Monkey, which randomly terminates production instances. For most teams, a more measured approach is appropriate.
Starting with "game days"
A game day is a scheduled chaos exercise where your team deliberately introduces a failure scenario and evaluates how the system responds. Run them in staging before production:
Example game days:
- Kill the primary database and verify failover completes within SLA
- Throttle the payment service to 10% of normal capacity and verify the checkout flow degrades gracefully
- Introduce 500ms latency to the search API and verify caching prevents user impact
- Terminate 50% of API server instances and verify auto-scaling replaces them before requests start failing
Chaos tools
Chaos Monkey (Netflix, open source) — randomly terminates EC2 instances Litmus (CNCF) — cloud-native chaos engineering for Kubernetes Gremlin — commercial chaos-as-a-service with a broad attack library AWS Fault Injection Simulator — AWS-managed chaos for AWS workloads
BASH1# Litmus: inject pod failure in a namespace 2kubectl apply -f - <<EOF 3apiVersion: litmuschaos.io/v1alpha1 4kind: ChaosEngine 5metadata: 6 name: pod-failure-experiment 7spec: 8 engineState: 'active' 9 appinfo: 10 appns: 'production' 11 applabel: 'app=payment-service' 12 experiments: 13 - name: pod-failure 14 spec: 15 components: 16 env: 17 - name: TOTAL_CHAOS_DURATION 18 value: '60' # seconds 19 - name: CHAOS_INTERVAL 20 value: '10' 21 - name: FORCE 22 value: 'false' 23EOF
Observability-Driven QA
Observability (logs, metrics, traces) is not just for operations — it's a quality engineering tool. Production telemetry reveals defects that never surface in testing.
Deriving test insights from production data
User error patterns → new test cases
If your logs show that users frequently encounter a specific error (e.g., "Invalid phone format" on the registration form), that's a signal your test suite lacks coverage for that input pattern. Mine your error logs for user-encountered failures and add test cases for the top patterns.
Performance regressions → performance tests
If your traces show that a specific API endpoint slowed from 50ms to 500ms after a deploy, that's a missing performance test. Add a latency assertion to your CI pipeline for that endpoint.
Feature flag states → test matrix gaps
If your feature flags create 8 possible configuration states and your tests only cover 2 of them, production is testing the other 6. Map your flag states to your test matrix and close the gaps.
Recommended observability stack for QE
YAML1# docker-compose for local observability (mirrors production stack) 2services: 3 jaeger: # Distributed tracing 4 image: jaegertracing/all-in-one:latest 5 ports: ["16686:16686", "6831:6831/udp"] 6 7 prometheus: # Metrics 8 image: prom/prometheus:latest 9 ports: ["9090:9090"] 10 11 grafana: # Dashboards 12 image: grafana/grafana:latest 13 ports: ["3001:3000"] 14 15 loki: # Log aggregation 16 image: grafana/loki:latest 17 ports: ["3100:3100"]
Combining Shift-Left and Shift-Right
The most effective quality strategy combines both approaches into a continuous quality loop:
Requirements
↓
[Shift-Left] Acceptance criteria + test cases written
↓
[Shift-Left] Unit + API + E2E tests in CI
↓
Deploy to staging
↓
[Shift-Left] Full regression suite
↓
Canary deploy (5%)
↓
[Shift-Right] Quality gates on canary cohort
↓
Full deploy
↓
[Shift-Right] Synthetic monitoring (continuous)
↓
[Shift-Right] Observability data → new test cases
↓
(Back to requirements for next feature)
Each stage catches defects that the previous stage would miss. The system is self-improving: production monitoring generates new test cases that strengthen pre-deployment testing, which in turn reduces production incidents.
Getting Started with Shift-Right
If you're starting from zero, the highest-value first step is synthetic monitoring on your three most critical user flows. The setup takes a few hours. The value — continuous visibility into production quality — is immediate and ongoing.
From there: add a canary release process for your highest-risk deployments. Then build toward chaos engineering as your production confidence grows.
Shift-right doesn't replace shift-left. It completes it.
For the CI/CD pipeline foundations that shift-right builds on, see our CI/CD Pipeline Guide. For the QE strategy that ties both approaches together, see our Quality Engineering Strategy Roadmap.
Share this article
Follow for more
Follow me on social media for more developer tips, tricks, and tutorials. Let's connect and build something great together!