Smart Test Data Generation Using AI

How to use AI and LLMs to generate comprehensive, realistic test data — covering synthetic data generation, edge case discovery, PII-safe test datasets.

Test data is one of the most underappreciated challenges in software testing. Your test suite is only as good as the data you run it against. Inadequate test data leads to coverage gaps, false confidence, and defects that only surface in production when real user data triggers edge cases you never considered.

AI makes comprehensive test data generation significantly easier. This guide covers how to use LLMs and AI tools to generate realistic, diverse, and edge-case-rich test data at scale.

The Test Data Problem

There are three common approaches to test data, each with significant problems:

Using production data is the most realistic option but comes with privacy and compliance risks. GDPR, HIPAA, and similar regulations restrict using real customer data for testing. Data breaches that expose test environments put real user data at risk.

Manually created test data is safe but limited. Engineers create a few representative examples and miss the long tail of edge cases that real users generate — unusual name formats, international characters, extreme field lengths, unusual date patterns.

Faker libraries (Faker.js, Python Faker) generate random realistic data but don't reason about domain-specific constraints or generate purposeful edge cases. They're great for volume data but poor for deliberate coverage.

AI-assisted test data generation combines the realism of production data with the safety of synthetic data, while adding the intelligence to generate edge cases you haven't thought of.

Using LLMs for Test Data Generation

Generating structured test datasets

LLMs excel at generating diverse, realistic structured data when given clear instructions:

TYPESCRIPT
1import Anthropic from '@anthropic-ai/sdk';
2
3const anthropic = new Anthropic();
4
5async function generateUserTestData(count: number): Promise<User[]> {
6  const response = await anthropic.messages.create({
7    model: 'claude-sonnet-4-20250514',
8    max_tokens: 2000,
9    messages: [{
10      role: 'user',
11      content: `Generate ${count} realistic but fictional user records for testing an e-commerce platform.
12      
13      Include diversity in:
14      - Names (international, unusual characters, hyphenated, single names, very long names)
15      - Email formats (subaddressing like user+tag@domain.com, different TLDs, edge cases)
16      - Phone numbers (different countries, formats, with/without country codes)
17      - Addresses (international, PO boxes, apartment formats, missing fields)
18      
19      Also include these specific edge cases:
20      - A user with a name containing SQL injection attempt
21      - A user with emoji in their display name
22      - A user with maximum-length fields
23      - A user with minimal required fields only
24      
25      Return ONLY a valid JSON array, no markdown, no explanation.
26      Each object: { id, firstName, lastName, email, phone, address, createdAt }`
27    }]
28  });
29
30  const text = response.content[0].type === 'text' ? response.content[0].text : '';
31  return JSON.parse(text);
32}

This generates test users that cover:

Normal cases (the happy path)
International variations (which manual data creation typically neglects)
Security edge cases (injection attempts)
Boundary cases (maximum lengths, minimum fields)

Generating edge case inputs for specific fields

For testing individual form fields or API parameters, prompt for targeted edge case generation:

TYPESCRIPT
1async function generateEmailEdgeCases(): Promise<string[]> {
2  const response = await anthropic.messages.create({
3    model: 'claude-sonnet-4-20250514',
4    max_tokens: 1000,
5    messages: [{
6      role: 'user',
7      content: `Generate a comprehensive list of email address edge cases for testing an email validation system.
8      
9      Include:
10      - Valid emails that naive validators reject (valid per RFC 5321)
11      - Invalid emails that naive validators accept  
12      - Internationalized domain names
13      - Subaddressing (user+tag@domain.com)
14      - IP address domains
15      - Very long local parts
16      - Unicode in local part
17      - Common typos (gmail.com → gmial.com)
18      - Disposable email domains
19      - Business email patterns
20      
21      Return ONLY a JSON array of strings. No explanation.`
22    }]
23  });
24
25  return JSON.parse(response.content[0].type === 'text' ? response.content[0].text : '[]');
26}

Domain-specific test data

For domain-specific testing (healthcare, finance, e-commerce), LLMs understand domain context:

TYPESCRIPT
1async function generateProductCatalogData() {
2  const response = await anthropic.messages.create({
3    model: 'claude-sonnet-4-20250514',
4    max_tokens: 3000,
5    messages: [{
6      role: 'user',
7      content: `Generate test product data for an e-commerce platform. Include edge cases that test:
8      
9      1. Pricing edge cases: $0.00 products, very high prices ($99,999.99), prices with many decimal places
10      2. Inventory edge cases: 0 stock, negative stock (pre-order), very large inventory (10000+)  
11      3. Name edge cases: very short (1 char), very long (200+ chars), special characters, unicode
12      4. Category edge cases: uncategorised products, multi-category products
13      5. Image edge cases: no image, multiple images, broken image URL
14      6. Description edge cases: empty, very long (5000+ chars), HTML in description
15      
16      Return ONLY valid JSON array. Schema: { id, name, price, stock, category, description, imageUrl }`
17    }]
18  });
19
20  return JSON.parse(response.content[0].type === 'text' ? response.content[0].text : '[]');
21}

Open Source Tools for AI-Assisted Test Data

Faker.js with AI augmentation

Faker.js generates basic realistic data. Combine it with an LLM for edge case layers:

TYPESCRIPT
1import { faker } from '@faker-js/faker';
2
3// Faker for volume data
4function generateBulkUsers(count: number) {
5  return Array.from({ length: count }, () => ({
6    id: faker.string.uuid(),
7    name: faker.person.fullName(),
8    email: faker.internet.email(),
9    phone: faker.phone.number(),
10    address: faker.location.streetAddress(),
11  }));
12}
13
14// LLM for targeted edge cases — combine both
15async function generateCompleteTestDataset(bulkCount: number) {
16  const bulkData = generateBulkUsers(bulkCount);
17  const edgeCases = await generateUserTestData(20); // LLM-generated edge cases
18  
19  return [...bulkData, ...edgeCases];
20}

Mockaroo

Mockaroo is a web-based tool for generating realistic CSV/JSON test data with custom schemas. While not AI-powered in the LLM sense, it's excellent for generating large volumes of relational test data with consistent referential integrity.

Gretel.ai

Gretel generates synthetic data that statistically mimics real production data without containing actual personal information. This is particularly valuable when you need realistic data distributions (which production data has) without the privacy risk.

PII-Safe Synthetic Data from Production

A common need: you want test data that reflects the real distributions and patterns in your production database, but without containing actual PII.

The workflow:

Extract a sample of production data
Use an anonymisation tool or LLM to synthesise statistically similar but entirely fictional data
Use the synthetic dataset for testing

PYTHON
1# Example: anonymise a production user sample
2import anthropic
3
4def anonymise_user_sample(production_users: list) -> list:
5    """
6    Takes real user records and generates synthetic equivalents
7    with similar patterns but no real PII.
8    """
9    client = anthropic.Anthropic()
10    
11    # Extract just the structure (no actual data) to show the LLM
12    sample = production_users[:3]  # Small sample for pattern recognition
13    
14    response = client.messages.create(
15        model="claude-sonnet-4-20250514",
16        max_tokens=3000,
17        messages=[{
18            "role": "user",
19            "content": f"""Given this sample of user records (patterns only, ignore the actual values):
20            {sample}
21            
22            Generate {len(production_users)} synthetic user records that:
23            - Follow the same patterns and formats as the sample
24            - Contain ONLY fictional names, emails, addresses (no real people)
25            - Maintain similar data distributions (name length, email domain distribution, etc.)
26            - Include the same fields as the original
27            
28            Return ONLY a JSON array. No explanation."""
29        }]
30    )
31    
32    return json.loads(response.content[0].text)

Integrating AI-Generated Data into Your Test Suite

The most practical integration pattern is a data factory that uses AI for edge case generation and Faker for volume:

TYPESCRIPT
1// test/factories/dataFactory.ts
2export class DataFactory {
3  private aiClient: Anthropic;
4  
5  constructor() {
6    this.aiClient = new Anthropic();
7  }
8  
9  // Fast: Faker for standard data
10  user(overrides: Partial<User> = {}): User {
11    return {
12      id: faker.string.uuid(),
13      email: faker.internet.email(),
14      name: faker.person.fullName(),
15      ...overrides,
16    };
17  }
18  
19  // AI-powered: for edge case coverage
20  async userEdgeCases(): Promise<User[]> {
21    const cacheKey = 'user-edge-cases';
22    // Cache AI-generated data between test runs to avoid repeated API calls
23    if (this.cache.has(cacheKey)) return this.cache.get(cacheKey);
24    
25    const data = await this.generateWithAI('users', 20);
26    this.cache.set(cacheKey, data);
27    return data;
28  }
29}

Cache AI-generated test data between runs — you don't need to regenerate edge cases on every test run, and caching reduces API costs and latency.

Key Principles

Generate for coverage, not just volume. 1,000 variations of normal inputs is less valuable than 20 carefully chosen edge cases. Use AI to find the edges.

Review AI-generated data. LLMs can generate plausible-looking but logically invalid data. Review a sample before using in production test suites.

Version control your generated datasets. Save AI-generated test datasets in your repository so tests are reproducible without making API calls on every run.

Keep PII out of test environments. Even if the data looks realistic, ensure it's provably synthetic. Don't use real names, real addresses, or real phone numbers — even if obfuscated.

For more on the broader AI in testing landscape, see our Implementing AI in Software Testing guide. For the test data management strategies that work within a Playwright test suite, see our API Testing Guide.