Turn an AI Requirements Definition Agent into a “Development OS”: A Deep Dive into Technical Design from As-Is/To-Be Generation to Prototyping and Offshore Collaboration

1. Executive Summary (Technical Summary, ~300 Japanese characters)

a laptop computer sitting on top of a wooden table

The core reason legacy modernization struggles in Japanese enterprises is not the development process itself, but rather the person-dependent nature of requirements definition and the slow pace of consensus building. Generative AI is moving beyond meeting-minutes summarization to provide end-to-end support—from automated As-Is business flow generation to To-Be design, contradiction/exception detection, requirements lists and design document outputs, and even working prototypes—dramatically increasing upstream throughput. This article proposes a reference architecture for institutionalizing an AI requirements-definition agent as a “development OS” (RAG, workflow execution, auditability, permission boundaries, CI integration), along with key design considerations for performance, security, and scalability. ⚙️

2. Technical Background and Challenges (Architecture Diagram Explanation, Existing Pain Points)

an abstract background with lines and shapes

As the referenced article indicates, requirements definition is the upstream core that determines “what to build,” and any misalignment amplifies downstream as rework. In large enterprises, business operations are full of complex exceptions, branches, permissions, and audit requirements—so mismatched documentation granularity and tacit knowledge can be fatal. As a result, structural issues remain: (1) dependence on veterans (skills concentrated in the top 1%), (2) reviews that tend to become “gut feel,” (3) slow consensus due to the absence of prototypes, and (4) expanding misalignment across offshore teams and multiple vendors.

The key point here is not “using AI as a summarization tool,” but rather managing requirements deliverables as a machine-readable intermediate representation (IR: Intermediate Representation) and connecting them to design, implementation, and testing. Requirements-definition AI plays a compiler-like role: converting natural language → IR → documents/prototypes/tickets. 🔧

Technical flow diagram (explained): (1) Ingest audio/minutes/existing design documents/policies → (2) normalize and anonymize → (3) use RAG to reference internal standards, past projects, and policies → (4) LLM extracts the As-Is business process (BPMN-equivalent) and domain glossary → (5) generate To-Be based on change-policy inputs → (6) statically analyze exceptions, contradictions, and permission inconsistencies → (7) output requirements lists (User Stories/FR/NFR) and design documents (screens/interfaces/data model) → (8) generate prototypes → (9) approval workflow and audit logs → (10) sync to Jira/Azure DevOps to start implementation.

[Meeting Audio/Notes]   [Legacy Docs]   [Policies]
        \                |              /
         \               |             /
          --> Ingestion/ETL --> PII Redaction --> Vector Index (RAG)
                                |                    |
                                v                    v
                         LLM Orchestrator  <--> Knowledge Base
                                |
                                v
                 As-Is Process IR --> To-Be Process IR
                                |
                    +-----------+-----------+
                    |                       |
                    v                       v
         Consistency/Exception Analyzer   Prototype Generator
                    |                       |
                    v                       v
       Requirements/Design Artifacts     Clickable UI/API Mock
                    |
                    v
          Approval + Audit + Ticket Sync

3. Technical Section ①: Business Flows and Requirements Model Design as an Intermediate Representation (IR) ⚙️

3.1 Why AI Requirements Definition Breaks Without an IR

If you keep stacking requirements in natural language, you cannot maintain consistency across granularity, terminology, and exceptions—leading to “battles of interpretation” in later phases. Instead, represent As-Is/To-Be in a BPMN 2.0-equivalent form (or proprietary JSON) and structure business events, branching conditions, responsibilities (RACI), and data inputs/outputs. This allows AI outputs to be managed as true deliverables with diff control (Git), enabling reviews and audits. IR also connects directly to test design (scenario coverage) and authorization design (RBAC/ABAC).

3.2 Reference Model (JSON Schema Example)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ProcessIR",
  "type": "object",
  "properties": {
    "processId": {"type": "string"},
    "version": {"type": "string"},
    "actors": {"type": "array", "items": {"type": "string"}},
    "activities": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": {"type": "string"},
          "name": {"type": "string"},
          "type": {"enum": ["task","gateway","event"]},
          "inputs": {"type": "array", "items": {"type": "string"}},
          "outputs": {"type": "array", "items": {"type": "string"}},
          "sla": {"type": "string"},
          "controls": {
            "type": "object",
            "properties": {
              "rbac": {"type": "array", "items": {"type": "string"}},
              "audit": {"type": "boolean"}
            }
          }
        },
        "required": ["id","name","type"]
      }
    },
    "edges": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "from": {"type": "string"},
          "to": {"type": "string"},
          "condition": {"type": "string"}
        },
        "required": ["from","to"]
      }
    }
  },
  "required": ["processId","version","activities","edges"]
}

3.3 Implementation Key Points

Do not accept LLM outputs as-is. Make schema validation + rule-based correction mandatory (e.g., missing gateway branch conditions, isolated node detection). This loop—“AI → structuring → validation → diff review”—is the key to converting person-dependence into a repeatable process. 🔧

4. Technical Section ②: RAG Design—How to Make Internal Standards, Past Projects, and Policies Actually Work 🔧

4.1 Layering Knowledge (Policy/Pattern/Project)

The accuracy of requirements-definition AI depends less on model size and more on the quality of referenced knowledge. At minimum, split RAG into three layers: (1) Policy (internal policies, audit requirements, personal data handling, log retention), (2) Pattern (standard architectures, standard interfaces, standard screens, naming conventions), and (3) Project (existing system specs, current DB, operational procedures). Because update frequency and approval flows differ by layer, separate indexes and control retrieval with boosts/filters.

4.2 Practical Embedding/Chunking Settings (Example)

Assume an embedding class equivalent to text-embedding-3-large (implementation is vendor-dependent). Chunk not by “paragraph,” but by “requirement unit.” Examples: one policy clause, one API endpoint in a spec, one item in a screen spec. Always attach metadata (system, module, confidentiality, effectiveDate) to ensure accountability for retrieval results.

rag:
  embeddingModel: "text-embedding-3-large"
  chunking:
    strategy: "semantic"
    maxTokens: 800
    overlapTokens: 120
  metadata:
    - system
    - module
    - docType
    - confidentiality  # public/internal/restricted
    - effectiveDate
  retrieval:
    topK: 8
    filters:
      confidentiality: ["internal","restricted"]
    reranker: "bge-reranker-v2-m3"

4.3 Don’t Allow the “Freedom Not to Reference”

Introduce “mandatory reference rules” during generation. For example: “Any business flow that handles personal data must cite the internal PII policy,” or “Approval workflows must attach the audit-log requirements template.” In other words, make RAG a guardrail. This reduces project-to-project quality variance. ⚙️

5. Technical Section ③: Static Analysis for Exceptions, Contradictions, and Gaps 📊

5.1 The Root Cause of Incidents: “Exceptions” and “Boundary Conditions”

A common pattern in large projects that spiral out of control is that teams quickly agree on the happy path while postponing exceptions. Requirements-definition AI can attack this automatically. Against the To-Be IR, detect via rules: (a) unreachable nodes, (b) missing terminal states, (c) overlapping/missing branch conditions, (d) unassigned roles, (e) audit=false despite being audit-relevant, etc.—and generate follow-up interview questions.

5.2 Rule Engine Example (Simplified)

def detect_orphan_nodes(process):
    ids = {a["id"] for a in process["activities"]}
    connected = set()
    for e in process["edges"]:
        connected.add(e["from"]); connected.add(e["to"])
    return list(ids - connected)

def detect_missing_rbac(process):
    missing = []
    for a in process["activities"]:
        if a.get("type") == "task" and not a.get("controls", {}).get("rbac"):
            missing.append(a["id"])
    return missing

5.3 Benchmark Metrics (Quantifying Quality)

“Good requirements” are often discussed qualitatively, but metrics are essential for AI adoption. Examples: number of exception scenarios, number of unassigned responsibilities, audit-scope coverage, trend of review findings, and rework effort. Automatically aggregate these via IR analysis and operate them as quality gates. 📊

6. Technical Section ④: Prototype Generation—Turn Consensus Building into “Executable Specs” 🔧

6.1 Let Users Touch It Before Spec Freeze

The key takeaway from the referenced article is “eliminate misalignment with a working prototype.” Technically, the ideal is to reproduce screen transitions, access control, input validation, and API mocks at minimal cost—and deliver something business users can actually operate within 48 hours. The critical point is not treating the prototype as disposable. Keep UI definitions (JSON) and OpenAPI as deliverables and slide them into implementation.

6.2 Contract-First Example with OpenAPI 3.1

openapi: 3.1.0
info:
  title: Approval API
  version: 0.1.0
paths:
  /approvals:
    post:
      summary: Create approval request
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [requesterId, amount, reason]
              properties:
                requesterId: { type: string }
                amount: { type: number, minimum: 0 }
                reason: { type: string, maxLength: 2000 }
      responses:
        "201":
          description: Created

6.3 Boundaries to Prevent “Prototype Artifacts” from Reaching Production

Prototype generation is powerful, but it can also cause incidents. Mitigate by clearly separating the prototype environment from production: use synthetic data only, prohibit external transmission, and require audit logs. Also, automatically enumerate the “gaps” when moving from prototype to production implementation to visualize missing requirements. ⚙️

7. Technical Section ⑤: Security Boundaries—Eliminate the “Data Exfiltration” Risk in Requirements-Definition AI 🔒

7.1 Threat Model (Minimum Set)

Because requirements definition mixes business operations, customers, contracts, and personal data, it is one of the most dangerous areas to apply LLMs. Threats can be organized as: (1) external transmission of confidential information, (2) prompt injection, (3) RAG poisoning, (4) excessive privileges, and (5) lack of auditability. Unless these are addressed at design time, company-wide rollout will stall.

7.2 Technical Controls (Implementation Essentials)

Automatic masking of PII/contract information: Run DLP (regex + NER) at ingestion and hash/tokenize.
Prompt defense: Fix the system prompt and allowlist tool execution. Do not interpret user input as instructions.
RAG signing: Add signatures and approver metadata to knowledge documents; exclude unapproved documents from retrieval.
Audit logs: Track who submitted what, which documents were referenced, and what was output.

# Example: block outbound transmission from the prototype environment (conceptual)
server {
  listen 443 ssl;
  location / {
    proxy_pass http://prototype-app;
  }
  # Assume outbound traffic other than an allowlist is blocked via FW/egress controls
}

7.3 Log Design That Can Withstand Audit Requirements

At minimum, record: (a) input artifact hash, (b) prompt template version, (c) model/version, (d) retrieved doc IDs, and (e) output artifact hash. This enables post-hoc tracing of “why the requirements became what they are,” ensuring accountability for audits and litigation. 🔧

8. Technical Section ⑥: Scalability—Operational Design for Large and Parallel Projects 📊

8.1 The Bottleneck Isn’t the LLM—It’s Human Review

Once you introduce requirements-definition AI, generation speed increases, but review workload becomes the new constraint. The countermeasure is to break deliverables into “change diffs” and reduce the review unit size. Manage IR in Git so PRs can present To-Be diffs, requirement diffs, and API diffs. Also auto-tag review perspectives (audit, authorization, performance, operations) and route to specialists.

8.2 Multi-Tenant Design for Parallel Projects

Vector DB/logs/artifact storage should be separated by project (tenant). At minimum, recommend namespace separation plus KMS key separation. Enforce tenant boundaries during RAG retrieval with mandatory filters (enforced at the DB layer to prevent application-side omissions). ⚙️

8.3 Performance Benchmark (Example)

The following is an example benchmark assuming input of “50,000 characters of meeting minutes + ~200 pages of existing design documents,” processing through As-Is → To-Be → requirements list output (environment: Kubernetes v1.29, vector DB uses HNSW, RAG topK=8). The numbers are design targets and should be tuned per organization based on measurement. 📊

Process	p50	p95	Main cause	Mitigation
Ingest + normalization	45s	120s	OCR/formatting	Diff-based ingest, parallel OCR
Embedding generation	90s	240s	Token volume	Requirement-unit chunking, deduplication
RAG retrieval + reranking	180ms	650ms	Vector search	Tune HNSW parameters, caching
As-Is IR generation	35s	80s	LLM inference	Staged generation, structure via function calling
To-Be IR generation	40s	95s	More branches	Module split, partial regeneration
Static analysis (rules)	1.2s	3.5s	Graph traversal	Incremental analysis
Requirements list/design doc output	25s	70s	Template expansion	Precompile templates

9. Technical Section ⑦: Compress “Specification Transfer Cost” with AI in Hybrid Offshore/In-House Delivery 🔧

9.1 The Real Cost of Offshore Is Communication

As referenced article 3 points out, offshore utilization is increasing due to talent shortages and rising costs. However, many failure factors come down to “asymmetric understanding of specifications.” The value of requirements-definition AI here is not translation to bridge English proficiency or cultural gaps, but structuring specifications to reduce ambiguity. If you provide IR, OpenAPI, screen transitions, and acceptance criteria (Given/When/Then) as a set, transfer costs drop.

9.2 Example: Auto-Generated Acceptance Criteria (Gherkin)

Feature: Create approval request
  Scenario: Amount must be non-negative
    Given I am an authenticated requester
    When I submit an approval with amount -1
    Then the API should respond with 400
    And the error code should be "VALIDATION_ERROR"

9.3 Carry Change Requests (CR) as “Diffs”

Rework in offshore delivery increases as change requests are thrown over the wall in natural language. If you operationalize CRs by attaching To-Be IR diffs (JSON Patch) and automatically computed impact scope (screen/API/DB/authorization/audit), estimation, implementation, and testing become more stable. ⚙️

10. Comparative Analysis Table (Compare 3+ Options)

Options for introducing an AI requirements-definition agent broadly fall into three categories: (1) SaaS requirements-definition AI (integrated, e.g., an all-in-one product like Acsim), (2) general-purpose LLM + in-house orchestration (LangGraph, etc.), and (3) BPM/requirements management tool-centric (AI as an assistant). The best choice depends on use cases and governance requirements. 📊

Option	Strengths	Weaknesses/Risks	Best-fit organizations	Controls (audit/authorization)
SaaS Requirements-Definition AI (integrated)	End-to-end (As-Is/To-Be/prototype/design docs), fast time-to-start	Data exfiltration concerns, features can become a black box	SIs and enterprises that need to run large-scale DX on tight timelines	Depends on vendor capabilities (must verify log granularity)
General-purpose LLM + in-house build (RAG/IR/CI integration)	Optimized for internal standards/policies; boundary controls can be built to your requirements	High initial implementation cost; requires an operations function (evaluation/improvement)	Organizations with an in-house engineering culture that want to assetize upstream productivity long-term	Potentially strongest by design (can embed KMS/audit/signing)
BPM/requirements management tool-centric + AI assistance	Easy to fit into existing processes; audit/governance feels familiar	Weaker end-to-end generation; prototype linkage may be limited	Regulated industries with strict audit requirements and cautious change management	Leverages existing controls, but AI reference/output logging must be built separately

11. Best Practices and Anti-Patterns (Bullets)

✅ Best Practices

Manage IR (business flows/requirements/data) with Git-based diff control and align reviews to PR operations
Separate RAG knowledge into Policy/Pattern/Project and require approval and signing
Use exception/contradiction detection as a quality gate and auto-generate follow-up interview questions
Keep prototypes as non-disposable deliverables (OpenAPI, UI definitions, etc.)
Store “evidence referenced (doc IDs)” in audit logs to ensure explainability

❌ Anti-Patterns

Being satisfied with meeting-minutes summaries and not having structured IR
Mixing unapproved documents into RAG and allowing poisoning
Running prototypes on production data (no DLP/isolation)
Skipping reviews because “AI made it, so it must be correct” (collapses accountability boundaries)
Continuing to send natural-language change requests offshore without managing diffs

12. Implementation Roadmap and Checklist ⚙️

12.1 Weeks 0–4: PoC (Validate with One Requirements Process)

[ ] Build ingestion pipelines (minutes/policies/existing design docs)
[ ] Implement PII masking (DLP) and tenant isolation
[ ] Generate As-Is IR → create ground-truth data manually (for evaluation)
[ ] To-Be generation + static analysis (at least 10 rules)
[ ] Outputs (requirements list, OpenAPI skeleton, Gherkin skeleton)

12.2 Weeks 5–12: Pilot (Parallel Operation Across 2–3 Projects)

[ ] Integrate into Git/PR operations (IR diff reviews)
[ ] Implement audit logs (prompt/version/retrieval IDs)
[ ] Integrate with Jira/Azure DevOps (auto ticket creation)
[ ] Standardize offshore deliverables package (IR + OpenAPI + Gherkin)

12.3 Weeks 13–24: Enterprise Rollout (Governance and SLA)

[ ] Establish knowledge approval flows (Policy/Pattern/Project)
[ ] Turn quality gates (exception coverage/audit-scope coverage) into KPIs
[ ] Change management for models/prompts (versioning)
[ ] Formalize accountability boundaries for generated artifacts (AI/owner/approver)

13. Reference Resources and Next Steps

⚙️ BPMN 2.0 Specification (OMG)
🔧 OpenAPI 3.1 Specification
📊 Gherkin / Cucumber (standardizing acceptance criteria)
🔒 NIST AI RMF (AI risk management framework)

Next steps: Start with processes that have many exceptions and strong audit requirements—such as “approvals,” “requests,” and “master data management”—and run As-Is → To-Be → prototype → acceptance criteria in two weeks. Template the IR and quality metrics you gain there to create cross-project “repeatability.” That is the shortest path to elevating requirements-definition AI from a one-off tool to a true “development OS.” ⚙️