基于AI大模型的结构解析自动生成Mock测试数据策略一、概述微服务架构中服务间接口调用的测试数据准备工作占据了开发人员大量时间。传统Mock数据构造依赖人工编写JSON/XML不仅效率低下还容易遗漏边界条件和异常场景。更严重的是当接口定义变更时存量Mock数据必须同步更新维护成本极高。AI大模型具备强大的代码理解和结构化生成能力可以自动解析接口Schema并生成高质量的Mock测试数据。本文提出一种基于AI大模型的结构解析策略从OpenAPI/Swagger规范出发经过Schema解析、Prompt构建、AI生成、数据校验四步实现Mock测试数据的全自动生成。二、核心原理2.1 结构解析策略框架接口定义(Swagger/Protobuf/Java) ↓ Schema解析器(按类型适配) ↓ 中间表示(IR - Intermediate Representation) ↓ Prompt模板引擎 ↓ AI大模型生成 ↓ JSON Schema校验 → 数据修正 ↓ Mock数据集2.2 Schema解析类型适配接口协议解析策略核心组件OpenAPI 3.0JSON Schema解析 $ref递归SwaggerParsergRPC/Protobuf.proto文件Message解析ProtobufParserDubbo/HSFJava反射 注解解析JavaAnnotationParserGraphQLSchema Introspection Type解析GraphQLParser2.3 AI生成策略AI大模型生成Mock数据时采用约束生成策略类型约束字段类型必须匹配string/integer/boolean/array/object格式约束满足pattern、minLength、maxLength等格式要求枚举约束从enum定义的值中选取业务约束通过Prompt注入业务上下文生成语义合理的数据边界覆盖每个字段至少包含一个边界值和一个异常值三、实战配置3.1 项目结构mock-ai-generator/ ├── config.yaml ├── main.py ├── parser/ │ ├── __init__.py │ ├── base_parser.py │ ├── swagger_parser.py │ ├── protobuf_parser.py │ └── ir_schema.py ├── prompt/ │ ├── __init__.py │ ├── template.py │ └── builder.py ├── generator/ │ ├── __init__.py │ ├── ai_client.py │ └── fallback_generator.py ├── validator/ │ ├── __init__.py │ └── schema_validator.py └── output/ └── .gitkeep3.2 配置与入口# config.yaml ai: provider: dashscope model: qwen-max api_key: ${AI_API_KEY} temperature: 0.3 max_tokens: 4096 services: - name: order-service type: swagger url: http://order-service:8080/v3/api-docs business_context: 电商订单系统包含订单创建、支付、退款流程 - name: user-service type: swagger url: http://user-service:8080/v3/api-docs business_context: 用户中心包含注册、登录、信息查询 generation: data_count: 10 edge_case_ratio: 0.3 error_case_ratio: 0.2 output_format: json include_validation_report: true# main.py import yaml import json import logging from pathlib import Path from parser.swagger_parser import SwaggerParser from prompt.builder import PromptBuilder from generator.ai_client import AIClient from generator.fallback_generator import FallbackGenerator from validator.schema_validator import SchemaValidator logging.basicConfig(levellogging.INFO) logger logging.getLogger(__name__) class MockDataPipeline: def __init__(self, config_path: str): with open(config_path) as f: self.config yaml.safe_load(f) self.ai_client AIClient(self.config[ai]) self.validator SchemaValidator() self.fallback FallbackGenerator() def run(self): for svc in self.config[services]: logger.info(处理服务: %s, svc[name]) parser SwaggerParser(svc[url]) apis parser.parse_all() for api in apis: mock_data self.process_api(svc, api) self.save_output(svc[name], api, mock_data) def process_api(self, svc: dict, api: dict) - list: prompt_builder PromptBuilder( business_contextsvc.get(business_context, ), data_countself.config[generation][data_count], edge_case_ratioself.config[generation][edge_case_ratio], error_case_ratioself.config[generation][error_case_ratio], ) prompt prompt_builder.build(api) logger.info( API: %s %s, api[method], api[path]) try: data self.ai_client.generate(prompt) validated [] for item in data: result self.validator.validate(item, api) if result[valid]: validated.append(item) else: logger.warn(数据校验失败: %s, result[errors]) return validated except Exception as e: logger.warn(AI生成失败使用兜底策略: %s, e) return self.fallback.generate(api, 5) def save_output(self, service_name: str, api: dict, data: list): output_dir Path(output) / service_name output_dir.mkdir(parentsTrue, exist_okTrue) path_part api[path].replace(/, _).strip(_) filename f{api[method].lower()}_{path_part}.json filepath output_dir / filename with open(filepath, w) as f: json.dump({ api: api, mock_data: data, count: len(data), generated_at: str(Path(__file__).stat().st_mtime) }, f, ensure_asciiFalse, indent2) logger.info( 输出: %s (%d条), filepath, len(data)) if __name__ __main__: pipeline MockDataPipeline(config.yaml) pipeline.run()四、高级实践4.1 中间表示层设计# parser/ir_schema.py from typing import List, Dict, Any, Optional from dataclasses import dataclass, field dataclass class FieldSchema: name: str field_type: str required: bool False description: str example: Any None enum_values: List[Any] field(default_factorylist) min_length: Optional[int] None max_length: Optional[int] None minimum: Optional[float] None maximum: Optional[float] None pattern: Optional[str] None properties: List[FieldSchema] field(default_factorylist) items: Optional[FieldSchema] None dataclass class ApiSchema: path: str method: str summary: str parameters: List[FieldSchema] field(default_factorylist) request_body: Optional[FieldSchema] None response_body: Optional[FieldSchema] None class IRSchemaBuilder: staticmethod def from_swagger(path: str, method: str, detail: dict, spec: dict) - ApiSchema: api ApiSchema( pathpath, methodmethod.upper(), summarydetail.get(summary, ) ) for param in detail.get(parameters, []): schema param.get(schema, {}) field FieldSchema( nameparam[name], field_typeschema.get(type, string), requiredparam.get(required, False), descriptionparam.get(description, ), exampleschema.get(example), enum_valuesschema.get(enum, []), min_lengthschema.get(minLength), max_lengthschema.get(maxLength), minimumschema.get(minimum), maximumschema.get(maximum), patternschema.get(pattern), ) api.parameters.append(field) request_body detail.get(requestBody, {}) if request_body: content request_body.get(content, {}) json_content content.get(application/json, {}) body_schema json_content.get(schema, {}) api.request_body IRSchemaBuilder._parse_schema(body_schema, spec) responses detail.get(responses, {}) success responses.get(200, responses.get(201, {})) resp_content success.get(content, {}) resp_json resp_content.get(application/json, {}) resp_schema resp_json.get(schema, {}) if resp_schema: api.response_body IRSchemaBuilder._parse_schema(resp_schema, spec) return api staticmethod def _parse_schema(schema: dict, spec: dict) - Optional[FieldSchema]: if not schema: return None ref schema.get($ref, ) if ref: ref_path ref.replace(#/, ).split(/) resolved spec for key in ref_path: resolved resolved.get(key, {}) return IRSchemaBuilder._parse_schema(resolved, spec) field FieldSchema( nameschema.get(title, root), field_typeschema.get(type, object), requiredFalse, descriptionschema.get(description, ), exampleschema.get(example), enum_valuesschema.get(enum, []), ) if field.field_type object: for prop_name, prop_schema in schema.get(properties, {}).items(): child IRSchemaBuilder._parse_schema(prop_schema, spec) if child: child.name prop_name child.required prop_name in schema.get(required, []) field.properties.append(child) elif field.field_type array: items schema.get(items, {}) field.items IRSchemaBuilder._parse_schema(items, spec) return field4.2 Prompt模板引擎# prompt/template.py from typing import Dict, Any MOCK_DATA_PROMPT 你是一个专业的测试数据生成专家。你的任务是严格按照接口定义生成Mock数据。 ## 约束规则 1. 数据类型必须严格匹配Schema定义 2. 字符串字段生成有意义的真实数据不要用test或string占位 3. 数字字段在合理业务范围内生成包含边界值 4. 枚举字段从定义中随机选择 5. 时间字段使用ISO 8601格式 6. 嵌套对象必须完整填充 ## 生成要求 - 总共生成 {data_count} 组数据 - 其中约 {normal_count} 组为正常数据 - 约 {edge_count} 组为边界数据 - 约 {error_count} 组为异常数据 - 每组数据包含 _case_type 字段标记类型 ## 接口定义 ### {method} {path} {summary} ### 请求参数 {parameters_section} ### 请求体Schema {request_body_section} ### 响应体Schema {response_body_section} ### 业务上下文 {business_context} 请以JSON数组格式输出每组的字段必须严格对齐接口定义。 PARAM_TEMPLATE | {name} | {param_in} | {field_type} | {required} | {description} | {constraints} | SCHEMA_TEMPLATE - {name} ({field_type}, {required}) 描述: {description} 约束: {constraints} 示例: {example} def format_constraints(field: dict) - str: parts [] if field.get(enum_values): parts.append(f枚举: {field[enum_values]}) if field.get(min_length) is not None: parts.append(fminLen: {field[min_length]}) if field.get(max_length) is not None: parts.append(fmaxLen: {field[max_length]}) if field.get(minimum) is not None: parts.append(fmin: {field[minimum]}) if field.get(maximum) is not None: parts.append(fmax: {field[maximum]}) if field.get(pattern): parts.append(fregex: {field[pattern]}) return ; .join(parts) if parts else 无4.3 兜底生成策略# generator/fallback_generator.py import random import string from datetime import datetime, timedelta from typing import Dict, List, Any class FallbackGenerator: def generate(self, api: dict, count: int 5) - List[Dict]: data [] for i in range(count): record self._generate_record(api) data.append(record) return data def _generate_record(self, api: dict) - Dict: record {} for param in api.get(parameters, []): record[param[name]] self._gen_value(param) if api.get(request_body): self._fill_object(record, api[request_body]) return record def _gen_value(self, field: dict): field_type field.get(field_type, string) enum_values field.get(enum_values, []) if enum_values: return random.choice(enum_values) if field_type string: if field.get(pattern): return self._gen_by_pattern(field[pattern]) min_len field.get(min_length, 1) max_len field.get(max_length, 20) length random.randint(min_len, max_len) return .join(random.choices(string.ascii_letters, klength)) elif field_type integer: minimum field.get(minimum, 0) maximum field.get(maximum, 10000) return random.randint(minimum, maximum) elif field_type number: minimum field.get(minimum, 0.0) maximum field.get(maximum, 10000.0) return round(random.uniform(minimum, maximum), 2) elif field_type boolean: return random.choice([True, False]) elif field_type array: items field.get(items, {}) count random.randint(0, 3) return [self._gen_value(items) for _ in range(count)] elif field_type object: obj {} for prop in field.get(properties, []): obj[prop[name]] self._gen_value(prop) return obj return None def _fill_object(self, target: Dict, schema: dict): for prop in schema.get(properties, []): target[prop[name]] self._gen_value(prop) def _gen_by_pattern(self, pattern: str) - str: if pattern ^\\d{4}-\\d{2}-\\d{2}$: date datetime.now() - timedelta(daysrandom.randint(0, 365)) return date.strftime(%Y-%m-%d) if email in pattern.lower(): return fuser{random.randint(1,9999)}example.com if phone in pattern.lower(): return f1{random.choice([3,5,7,8,9])}{random.randint(10000000,99999999)} return generated_ .join(random.choices(string.ascii_lowercase, k8))4.4 Schema校验器# validator/schema_validator.py import re from typing import Dict, Any, List, Tuple class SchemaValidator: def validate(self, data: Any, api: dict) - Dict: errors [] if api.get(parameters): for param in api[parameters]: if param[name] in data: error self._validate_field( data[param[name]], param ) if error: errors.append(f参数 {param[name]}: {error}) elif param.get(required): errors.append(f参数 {param[name]}: 必填字段缺失) if api.get(request_body): body_errors self._validate_schema( data, api[request_body], ) errors.extend(body_errors) return {valid: len(errors) 0, errors: errors} def _validate_field(self, value: Any, field: dict) - str: expected_type field.get(field_type, string) actual_type type(value).__name__ if expected_type integer and not isinstance(value, int): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type string and not isinstance(value, str): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type number and not isinstance(value, (int, float)): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type boolean and not isinstance(value, bool): return f类型错误: 期望{expected_type}, 实际{actual_type} if field.get(enum_values) and value not in field[enum_values]: return f枚举值错误: {value} 不在 {field[enum_values]} 中 if isinstance(value, str): if field.get(min_length) and len(value) field[min_length]: return f长度不足: {len(value)} {field[min_length]} if field.get(max_length) and len(value) field[max_length]: return f长度超限: {len(value)} {field[max_length]} if field.get(pattern) and not re.match(field[pattern], value): return f正则不匹配: {value} 不匹配 {field[pattern]} if isinstance(value, (int, float)): if field.get(minimum) is not None and value field[minimum]: return f小于最小值: {value} {field[minimum]} if field.get(maximum) is not None and value field[maximum]: return f大于最大值: {value} {field[maximum]} return def _validate_schema(self, data: Any, schema: dict, path: str) - List[str]: errors [] if schema.get(field_type) object: for prop in schema.get(properties, []): prop_path f{path}.{prop[name]} if path else prop[name] if prop[name] not in data: if prop.get(required): errors.append(f{prop_path}: 必填字段缺失) continue error self._validate_field(data[prop[name]], prop) if error: errors.append(f{prop_path}: {error}) if prop.get(properties): nested self._validate_schema( data[prop[name]], prop, prop_path ) errors.extend(nested) elif schema.get(field_type) array: items schema.get(items, {}) for idx, item in enumerate(data): item_path f{path}[{idx}] error self._validate_field(item, items) if error: errors.append(f{item_path}: {error}) return errors五、最佳实践实践要点说明推荐度IR中间层不同接口协议(OpenAPI/Protobuf/GraphQL)统一转为IR再处理⭐⭐⭐⭐⭐温度控制生成Mock数据时temperature设为0.3兼顾确定性和多样性⭐⭐⭐⭐⭐兜底策略AI不可用时使用FallbackGenerator确保流程不中断⭐⭐⭐⭐⭐Schema校验AI输出必须经过JSON Schema校验不合规的自动修正⭐⭐⭐⭐边界覆盖强制每种数据类型至少包含边界值如空字符串、最大值、负值⭐⭐⭐⭐增量更新接口变更时只重新生成变更部分避免全量覆盖影响已有测试⭐⭐⭐六、总结基于AI大模型的结构解析自动生成Mock测试数据策略核心在于三个环节的精密配合Schema解析将各类接口定义统一为中间表示IRPrompt工程将IR转换为大模型能够理解的结构化指令AI生成与校验确保输出数据的类型正确性和边界覆盖率。本文的兜底生成策略保证了AI不可用时的流程连续性Schema校验器则在数据质量层面把住了最后一道关。对于微服务团队这套方案可以将Mock数据准备时间从小时级压缩到分钟级显著提升接口联调和自动化测试的效率。
基于AI大模型的结构解析自动生成Mock测试数据策略
发布时间:2026/6/3 9:29:04
基于AI大模型的结构解析自动生成Mock测试数据策略一、概述微服务架构中服务间接口调用的测试数据准备工作占据了开发人员大量时间。传统Mock数据构造依赖人工编写JSON/XML不仅效率低下还容易遗漏边界条件和异常场景。更严重的是当接口定义变更时存量Mock数据必须同步更新维护成本极高。AI大模型具备强大的代码理解和结构化生成能力可以自动解析接口Schema并生成高质量的Mock测试数据。本文提出一种基于AI大模型的结构解析策略从OpenAPI/Swagger规范出发经过Schema解析、Prompt构建、AI生成、数据校验四步实现Mock测试数据的全自动生成。二、核心原理2.1 结构解析策略框架接口定义(Swagger/Protobuf/Java) ↓ Schema解析器(按类型适配) ↓ 中间表示(IR - Intermediate Representation) ↓ Prompt模板引擎 ↓ AI大模型生成 ↓ JSON Schema校验 → 数据修正 ↓ Mock数据集2.2 Schema解析类型适配接口协议解析策略核心组件OpenAPI 3.0JSON Schema解析 $ref递归SwaggerParsergRPC/Protobuf.proto文件Message解析ProtobufParserDubbo/HSFJava反射 注解解析JavaAnnotationParserGraphQLSchema Introspection Type解析GraphQLParser2.3 AI生成策略AI大模型生成Mock数据时采用约束生成策略类型约束字段类型必须匹配string/integer/boolean/array/object格式约束满足pattern、minLength、maxLength等格式要求枚举约束从enum定义的值中选取业务约束通过Prompt注入业务上下文生成语义合理的数据边界覆盖每个字段至少包含一个边界值和一个异常值三、实战配置3.1 项目结构mock-ai-generator/ ├── config.yaml ├── main.py ├── parser/ │ ├── __init__.py │ ├── base_parser.py │ ├── swagger_parser.py │ ├── protobuf_parser.py │ └── ir_schema.py ├── prompt/ │ ├── __init__.py │ ├── template.py │ └── builder.py ├── generator/ │ ├── __init__.py │ ├── ai_client.py │ └── fallback_generator.py ├── validator/ │ ├── __init__.py │ └── schema_validator.py └── output/ └── .gitkeep3.2 配置与入口# config.yaml ai: provider: dashscope model: qwen-max api_key: ${AI_API_KEY} temperature: 0.3 max_tokens: 4096 services: - name: order-service type: swagger url: http://order-service:8080/v3/api-docs business_context: 电商订单系统包含订单创建、支付、退款流程 - name: user-service type: swagger url: http://user-service:8080/v3/api-docs business_context: 用户中心包含注册、登录、信息查询 generation: data_count: 10 edge_case_ratio: 0.3 error_case_ratio: 0.2 output_format: json include_validation_report: true# main.py import yaml import json import logging from pathlib import Path from parser.swagger_parser import SwaggerParser from prompt.builder import PromptBuilder from generator.ai_client import AIClient from generator.fallback_generator import FallbackGenerator from validator.schema_validator import SchemaValidator logging.basicConfig(levellogging.INFO) logger logging.getLogger(__name__) class MockDataPipeline: def __init__(self, config_path: str): with open(config_path) as f: self.config yaml.safe_load(f) self.ai_client AIClient(self.config[ai]) self.validator SchemaValidator() self.fallback FallbackGenerator() def run(self): for svc in self.config[services]: logger.info(处理服务: %s, svc[name]) parser SwaggerParser(svc[url]) apis parser.parse_all() for api in apis: mock_data self.process_api(svc, api) self.save_output(svc[name], api, mock_data) def process_api(self, svc: dict, api: dict) - list: prompt_builder PromptBuilder( business_contextsvc.get(business_context, ), data_countself.config[generation][data_count], edge_case_ratioself.config[generation][edge_case_ratio], error_case_ratioself.config[generation][error_case_ratio], ) prompt prompt_builder.build(api) logger.info( API: %s %s, api[method], api[path]) try: data self.ai_client.generate(prompt) validated [] for item in data: result self.validator.validate(item, api) if result[valid]: validated.append(item) else: logger.warn(数据校验失败: %s, result[errors]) return validated except Exception as e: logger.warn(AI生成失败使用兜底策略: %s, e) return self.fallback.generate(api, 5) def save_output(self, service_name: str, api: dict, data: list): output_dir Path(output) / service_name output_dir.mkdir(parentsTrue, exist_okTrue) path_part api[path].replace(/, _).strip(_) filename f{api[method].lower()}_{path_part}.json filepath output_dir / filename with open(filepath, w) as f: json.dump({ api: api, mock_data: data, count: len(data), generated_at: str(Path(__file__).stat().st_mtime) }, f, ensure_asciiFalse, indent2) logger.info( 输出: %s (%d条), filepath, len(data)) if __name__ __main__: pipeline MockDataPipeline(config.yaml) pipeline.run()四、高级实践4.1 中间表示层设计# parser/ir_schema.py from typing import List, Dict, Any, Optional from dataclasses import dataclass, field dataclass class FieldSchema: name: str field_type: str required: bool False description: str example: Any None enum_values: List[Any] field(default_factorylist) min_length: Optional[int] None max_length: Optional[int] None minimum: Optional[float] None maximum: Optional[float] None pattern: Optional[str] None properties: List[FieldSchema] field(default_factorylist) items: Optional[FieldSchema] None dataclass class ApiSchema: path: str method: str summary: str parameters: List[FieldSchema] field(default_factorylist) request_body: Optional[FieldSchema] None response_body: Optional[FieldSchema] None class IRSchemaBuilder: staticmethod def from_swagger(path: str, method: str, detail: dict, spec: dict) - ApiSchema: api ApiSchema( pathpath, methodmethod.upper(), summarydetail.get(summary, ) ) for param in detail.get(parameters, []): schema param.get(schema, {}) field FieldSchema( nameparam[name], field_typeschema.get(type, string), requiredparam.get(required, False), descriptionparam.get(description, ), exampleschema.get(example), enum_valuesschema.get(enum, []), min_lengthschema.get(minLength), max_lengthschema.get(maxLength), minimumschema.get(minimum), maximumschema.get(maximum), patternschema.get(pattern), ) api.parameters.append(field) request_body detail.get(requestBody, {}) if request_body: content request_body.get(content, {}) json_content content.get(application/json, {}) body_schema json_content.get(schema, {}) api.request_body IRSchemaBuilder._parse_schema(body_schema, spec) responses detail.get(responses, {}) success responses.get(200, responses.get(201, {})) resp_content success.get(content, {}) resp_json resp_content.get(application/json, {}) resp_schema resp_json.get(schema, {}) if resp_schema: api.response_body IRSchemaBuilder._parse_schema(resp_schema, spec) return api staticmethod def _parse_schema(schema: dict, spec: dict) - Optional[FieldSchema]: if not schema: return None ref schema.get($ref, ) if ref: ref_path ref.replace(#/, ).split(/) resolved spec for key in ref_path: resolved resolved.get(key, {}) return IRSchemaBuilder._parse_schema(resolved, spec) field FieldSchema( nameschema.get(title, root), field_typeschema.get(type, object), requiredFalse, descriptionschema.get(description, ), exampleschema.get(example), enum_valuesschema.get(enum, []), ) if field.field_type object: for prop_name, prop_schema in schema.get(properties, {}).items(): child IRSchemaBuilder._parse_schema(prop_schema, spec) if child: child.name prop_name child.required prop_name in schema.get(required, []) field.properties.append(child) elif field.field_type array: items schema.get(items, {}) field.items IRSchemaBuilder._parse_schema(items, spec) return field4.2 Prompt模板引擎# prompt/template.py from typing import Dict, Any MOCK_DATA_PROMPT 你是一个专业的测试数据生成专家。你的任务是严格按照接口定义生成Mock数据。 ## 约束规则 1. 数据类型必须严格匹配Schema定义 2. 字符串字段生成有意义的真实数据不要用test或string占位 3. 数字字段在合理业务范围内生成包含边界值 4. 枚举字段从定义中随机选择 5. 时间字段使用ISO 8601格式 6. 嵌套对象必须完整填充 ## 生成要求 - 总共生成 {data_count} 组数据 - 其中约 {normal_count} 组为正常数据 - 约 {edge_count} 组为边界数据 - 约 {error_count} 组为异常数据 - 每组数据包含 _case_type 字段标记类型 ## 接口定义 ### {method} {path} {summary} ### 请求参数 {parameters_section} ### 请求体Schema {request_body_section} ### 响应体Schema {response_body_section} ### 业务上下文 {business_context} 请以JSON数组格式输出每组的字段必须严格对齐接口定义。 PARAM_TEMPLATE | {name} | {param_in} | {field_type} | {required} | {description} | {constraints} | SCHEMA_TEMPLATE - {name} ({field_type}, {required}) 描述: {description} 约束: {constraints} 示例: {example} def format_constraints(field: dict) - str: parts [] if field.get(enum_values): parts.append(f枚举: {field[enum_values]}) if field.get(min_length) is not None: parts.append(fminLen: {field[min_length]}) if field.get(max_length) is not None: parts.append(fmaxLen: {field[max_length]}) if field.get(minimum) is not None: parts.append(fmin: {field[minimum]}) if field.get(maximum) is not None: parts.append(fmax: {field[maximum]}) if field.get(pattern): parts.append(fregex: {field[pattern]}) return ; .join(parts) if parts else 无4.3 兜底生成策略# generator/fallback_generator.py import random import string from datetime import datetime, timedelta from typing import Dict, List, Any class FallbackGenerator: def generate(self, api: dict, count: int 5) - List[Dict]: data [] for i in range(count): record self._generate_record(api) data.append(record) return data def _generate_record(self, api: dict) - Dict: record {} for param in api.get(parameters, []): record[param[name]] self._gen_value(param) if api.get(request_body): self._fill_object(record, api[request_body]) return record def _gen_value(self, field: dict): field_type field.get(field_type, string) enum_values field.get(enum_values, []) if enum_values: return random.choice(enum_values) if field_type string: if field.get(pattern): return self._gen_by_pattern(field[pattern]) min_len field.get(min_length, 1) max_len field.get(max_length, 20) length random.randint(min_len, max_len) return .join(random.choices(string.ascii_letters, klength)) elif field_type integer: minimum field.get(minimum, 0) maximum field.get(maximum, 10000) return random.randint(minimum, maximum) elif field_type number: minimum field.get(minimum, 0.0) maximum field.get(maximum, 10000.0) return round(random.uniform(minimum, maximum), 2) elif field_type boolean: return random.choice([True, False]) elif field_type array: items field.get(items, {}) count random.randint(0, 3) return [self._gen_value(items) for _ in range(count)] elif field_type object: obj {} for prop in field.get(properties, []): obj[prop[name]] self._gen_value(prop) return obj return None def _fill_object(self, target: Dict, schema: dict): for prop in schema.get(properties, []): target[prop[name]] self._gen_value(prop) def _gen_by_pattern(self, pattern: str) - str: if pattern ^\\d{4}-\\d{2}-\\d{2}$: date datetime.now() - timedelta(daysrandom.randint(0, 365)) return date.strftime(%Y-%m-%d) if email in pattern.lower(): return fuser{random.randint(1,9999)}example.com if phone in pattern.lower(): return f1{random.choice([3,5,7,8,9])}{random.randint(10000000,99999999)} return generated_ .join(random.choices(string.ascii_lowercase, k8))4.4 Schema校验器# validator/schema_validator.py import re from typing import Dict, Any, List, Tuple class SchemaValidator: def validate(self, data: Any, api: dict) - Dict: errors [] if api.get(parameters): for param in api[parameters]: if param[name] in data: error self._validate_field( data[param[name]], param ) if error: errors.append(f参数 {param[name]}: {error}) elif param.get(required): errors.append(f参数 {param[name]}: 必填字段缺失) if api.get(request_body): body_errors self._validate_schema( data, api[request_body], ) errors.extend(body_errors) return {valid: len(errors) 0, errors: errors} def _validate_field(self, value: Any, field: dict) - str: expected_type field.get(field_type, string) actual_type type(value).__name__ if expected_type integer and not isinstance(value, int): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type string and not isinstance(value, str): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type number and not isinstance(value, (int, float)): return f类型错误: 期望{expected_type}, 实际{actual_type} if expected_type boolean and not isinstance(value, bool): return f类型错误: 期望{expected_type}, 实际{actual_type} if field.get(enum_values) and value not in field[enum_values]: return f枚举值错误: {value} 不在 {field[enum_values]} 中 if isinstance(value, str): if field.get(min_length) and len(value) field[min_length]: return f长度不足: {len(value)} {field[min_length]} if field.get(max_length) and len(value) field[max_length]: return f长度超限: {len(value)} {field[max_length]} if field.get(pattern) and not re.match(field[pattern], value): return f正则不匹配: {value} 不匹配 {field[pattern]} if isinstance(value, (int, float)): if field.get(minimum) is not None and value field[minimum]: return f小于最小值: {value} {field[minimum]} if field.get(maximum) is not None and value field[maximum]: return f大于最大值: {value} {field[maximum]} return def _validate_schema(self, data: Any, schema: dict, path: str) - List[str]: errors [] if schema.get(field_type) object: for prop in schema.get(properties, []): prop_path f{path}.{prop[name]} if path else prop[name] if prop[name] not in data: if prop.get(required): errors.append(f{prop_path}: 必填字段缺失) continue error self._validate_field(data[prop[name]], prop) if error: errors.append(f{prop_path}: {error}) if prop.get(properties): nested self._validate_schema( data[prop[name]], prop, prop_path ) errors.extend(nested) elif schema.get(field_type) array: items schema.get(items, {}) for idx, item in enumerate(data): item_path f{path}[{idx}] error self._validate_field(item, items) if error: errors.append(f{item_path}: {error}) return errors五、最佳实践实践要点说明推荐度IR中间层不同接口协议(OpenAPI/Protobuf/GraphQL)统一转为IR再处理⭐⭐⭐⭐⭐温度控制生成Mock数据时temperature设为0.3兼顾确定性和多样性⭐⭐⭐⭐⭐兜底策略AI不可用时使用FallbackGenerator确保流程不中断⭐⭐⭐⭐⭐Schema校验AI输出必须经过JSON Schema校验不合规的自动修正⭐⭐⭐⭐边界覆盖强制每种数据类型至少包含边界值如空字符串、最大值、负值⭐⭐⭐⭐增量更新接口变更时只重新生成变更部分避免全量覆盖影响已有测试⭐⭐⭐六、总结基于AI大模型的结构解析自动生成Mock测试数据策略核心在于三个环节的精密配合Schema解析将各类接口定义统一为中间表示IRPrompt工程将IR转换为大模型能够理解的结构化指令AI生成与校验确保输出数据的类型正确性和边界覆盖率。本文的兜底生成策略保证了AI不可用时的流程连续性Schema校验器则在数据质量层面把住了最后一道关。对于微服务团队这套方案可以将Mock数据准备时间从小时级压缩到分钟级显著提升接口联调和自动化测试的效率。