The Problem

Why AIDLC?

Faster delivery
Tens of dollars per feature, hours of elapsed time. Not weeks.

Built-in compliance
Every action audited. Every dollar attributed to an issue.

Institutional memory
The pipeline learns. The org's knowledge stops walking out the door.

Faster onboarding
New engineers ramp on a pipeline that already knows the codebase. 

A pipeline that improves
Each cycle's retro feeds the next cycle's configuration.

Humans stay in control
Agents propose. Humans approve. Every merge is gated.

Problems

Why AIDLC?

Observability Standard

OpenTelemetry:
Baked in, not bolted on

We instrument every migrated workload with OpenTelemetry from day one, giving you vendor-neutral traces, metrics, and logs across your entire estate.
What we wire up on every engagement

Auto-instrumentation of .NET and Java services → ADOT collector sidecars on EKS → AWS X-Ray for distributed tracing → CloudWatch for metrics and logs → custom dashboards and SLO alerting. All traces, metrics, and logs flow through a single OTel pipeline, swap backends without re-instrumenting.

VM → Container replatforming

Move off bare VMs onto EKS with proper orchestration, autoscaling, and workload isolation.

Custom metrics

Business and technical KPIs exported via OTel metrics SDK to CloudWatch and Managed Prometheus.

Structured logging

JSON-structured logs with trace correlation IDs, shipped to CloudWatch Logs Insights.

SLO alerting

Composite alarms and SLO burn-rate alerts wired from day one not as an afterthought.

Before & After

What changes in your architecture

Before
Monolithic .NET / Java app on VMs
After
Containerized microservices on EKS
Before
Self-managed relational DB on EC2
After
Amazon Aurora / RDS with redesigned schema
Before
On-prem RabbitMQ / ActiveMQ
After
Amazon SQS / MSK / EventBridge
Before
No distributed tracing or correlation
After
Full OTel traces, metrics, and logs
Before
Single AWS account, manual IAM
After
AWS Control Tower multi-account with guardrails
Before
Manual deployments, no CI/CD
After
GitOps pipelines with ArgoCD / CodePipeline
Enterprise Scale

AWS Landing Zone & multi-account governance

Security and compliance are not a phase, they are the foundation. We deploy enterprise-grade account structures that satisfy the most demanding regulatory requirements.
Outcomes

Measurable results our clients achieve

60-80%
infrastructure cost reduction vs on-prem VMs

10x
faster deployments via GitOps pipelines

99.9%+
availability through EKS self-healing & multi-AZ

<5 min
MTTR with full OTel trace-to-log correlation

Automating Letter of Credit Compliance with Multi-Agent AI on AWS

Business Context

A fintech is building AI-driven Letter of Credit (L/C) compliance tooling for trade finance banks, and asked kloia to deliver the document-parser platform at the core of the offering. In the bank operations process the platform replaces, documents arrive physically by DHL or TNT, get scanned, then are examined manually against UCP 600 (ICC Uniform Customs and Practice, 2007 revision) and ISBP 821. Each packet passes a two-stage review (expert, then manager), and a single discrepancy in dates, amounts, party names, or document counts blocks payment. The cycle runs about five days end-to-end, with peak weeks at ten times normal volume.

kloia was asked to deliver a standalone AI service that ingests the packet, classifies and extracts the documents, verifies them against the L/C requirements, and emits a multilingual  compliance report, deployed on AWS and operated by kloia.

Key contributing factors:

  • L/C presentations arrive as a single combined PDF mixing the SWIFT MT 700 message with six to ten supporting documents in arbitrary order, with boilerplate pages and faxed scans interleaved.

  • Verification rules span UCP 600 articles plus ISBP 821 banking practice, which makes a rules-only approach brittle.

  • Document formats vary widely: born-digital PDFs, scanned bills of lading, multi-column Word exports, faxes, and stamped originals versus carbon copies.

  • Common reserves are mechanical (date mismatches between SWIFT field 44C and the bill of lading, party-name inconsistencies between applicant 50 and the commercial invoice, missing or miscounted originals) and well-suited to extraction-and-comparison rather than judgment.

Constraints: verify against UCP 600 and ISBP 821 (not an invented rule set), produce auditable reasoning for every reserve, support multilingual output for the operator audience, and host the platform on AWS under kloia's operation so the customer does not staff the AI infrastructure itself.

Definition of success (engagement targets, per the project SOW):

  • Compress the manual five-day examination cycle to under one day through automation.

  • Classification accuracy above 95 percent, data extraction above 90 percent on key fields, language detection above 99 percent.

  • Sub-30-second processing per document, availability above 99.5 percent, capacity for 1,000 documents per day at 50-plus concurrency.

  • 70 percent reduction in manual processing time and 90 percent of documents processed without human intervention.

These targets define the engagement; measured outcomes against them are tracked by the customer's operations team.

Engagement Scope

kloia delivered the platform in five phases (Discovery, Core Development, AI/ML Training, Integration and Testing, Deployment), with kloia provisioning the AWS environment and the customer providing domain experts, sample documents, and business-rule validation.
In scope:

  • A multi-agent AI pipeline covering page classification, document segmentation, field extraction across 26 document types, L/C field verification, and multilingual report generation.

  • A retrieval-augmented generation (RAG) layer pre-loaded with UCP 600 and ISBP 821, comprising 330 regulatory text chunks indexed into a Milvus vector store.

  • A FastAPI control plane with JWT-authenticated upload, job submission, and webhook subscription, and an async worker tier that runs the agent pipeline with up to five concurrent jobs per worker process, heartbeat-monitored for crash recovery.

  • A regression-test harness with golden snapshots of real L/C transactions for parser and extractor stability.

Roles and stakeholders:

Role
Party
Responsibility
Solution Architect
kloia
Multi-agent pipeline design, RAG architecture, AWS topology
AI Engineering Lead
kloia
Agent prompts, tool-call schemas, judge fallback strategy
Backend Engineering Lead
kloia
FastAPI surface, async worker, webhook outbox, persistence
Trade Finance Domain Expert
Client
UCP 600 / ISBP 821 interpretation, reserve language review
Product and Operations Lead
Client
Acceptance testing, sample-document provision, bi-weekly steering

Solution Architecture

The platform has three planes: an API ingestion plane that authenticates uploads and enqueues jobs, an async agent plane that does the AI work, and a webhook delivery plane that notifies the bank. The two compute tiers (an API container and an async worker container) run as container images on AWS; the Milvus vector store runs on Amazon Elastic Compute Cloud (Amazon EC2) instances.

diagram01

The API authenticates the uploader, stores the object, and writes a job row. The worker runs a tiered text-extraction strategy (pypdf for born-digital, pdfplumber for multi-column layouts, Azure Document Intelligence for scans), then hands pages to the page classifier agent to detect document boundaries. Each segmented document is routed to a type-specific extractor that combines vision LLM inference with the OCR text. The judge agent walks the SWIFT MT 700 field list and, for each field, calls retrieval tools against the Milvus index and the extracted documents to produce a YES/NO decision with regulation and document evidence attached. The rapporteur agent assembles the multilingual report, and the webhook worker delivers it through an outbox table for at-least-once, replayable delivery.

Service Selection Rationale

Service
Why chosen
Alternative considered
Amazon EC2 for Milvus
A small (330 chunks) read-hot index on the agent's critical path. Self-managed Milvus on dedicated EC2 gave predictable tail latency and full index control.
A managed vector service: rejected in favor of index control.
Container compute on AWS for API and worker
The API is stateless and horizontal; the worker holds models, OCR clients, and the RAG cache in process. Container images isolated the scaling profiles.
AWS Lambda (originally proposed in the SOW): rejected once the multi-agent runtime profile (shared models, RAG cache, OCR client) made cold-start and per-invocation memory costs unworkable.
PostgreSQL on AWS for jobs, outbox, and metadata
Jobs run seconds to minutes and need transactional creation with an outbox for at-least-once webhook delivery. PostgreSQL gives that in one moving piece.
Amazon DynamoDB (originally proposed in the SOW): rejected because the job queue and outbox patterns rely on relational joins and transactional inserts the agent pipeline depends on.
Provider-agnostic vLLM endpoint over HTTPS
One interface selects between models. Currently consumes Gemma 4 31B IT (vision) and GPT-OSS 20B (text).
Embedding the vision model in the worker: rejected on memory grounds.
Azure Document Intelligence as OCR tier 3
Scanned bills of lading and faxed certificates yield no text from pypdf or pdfplumber. Routing only those pages to OCR kept cost proportional to an rate.
OCR on every page: rejected on cost; a character-density threshold escalates per page.

Key Architectural Decisions

Multi-agent decomposition, not a monolithic prompt. A single "is this L/C compliant?" prompt is brittle and untraceable. The platform splits the work across specialized agents (page classifier, 26-extractor factory, judge, document validator, cross-check, rapporteur), each emitting structured output that feeds the next so the reasoning chain is preserved for audit.

Retrieval-grounded verification. UCP 600 and ISBP 821 are indexed into Milvus as 330 chunks using google/embeddinggemma-300m embeddings. When the judge justifies a field decision, it retrieves the relevant articles by similarity and cites them in the reserve text. This protects against model drift and lets domain experts update the corpus without retraining.

Vision and text models split by task. google/gemma-4-31b-it handles page classification and field extraction from images; openai/gpt-oss-20b runs the judge and rapporteur, where the work is structured reasoning over tool calls. Splitting by task keeps each prompt short and lets the inference fleet route by capability.

Single-process worker with bounded concurrency. The worker runs five concurrent jobs in one async process so the RAG index, the embedding model, and the OCR client load once and are shared, aligned with the Performance Efficiency and Cost Optimization pillars of the AWS Well-Architected Framework.

Alternatives considered and rejected:

Approach
Why rejected
A pure rules engine, no LLMs
UCP 600 plus ISBP 821 plus the variety of MT 700 free-text fields produces too many edge cases to maintain. The platform still has rule code (port lookups, date arithmetic, amount tolerance), but the document-language reasoning is owned by the agents.
One end-to-end multimodal prompt per L/C
Untraceable. The customer requires every reserve to be defensible by citation, and a single opaque prompt cannot produce that.
The SOW's original serverless design (AWS Lambda + Amazon DynamoDB + Amazon SQS / Amazon SNS)
Rejected during build. The multi-agent runtime needs the embedding model, the OCR client, and the RAG cache resident in memory; the queue and outbox need transactional inserts. A long-running container tier plus PostgreSQL replaced both.

The Hardest Problem

Making the judge agent's tool-calling reliable enough to ship. The judge decides each SWIFT MT 700 field by walking a tool-call loop: search documents, read a candidate document, retrieve from the regulatory vector store, and finally call update_lc_field with the YES/NO verdict and reasoning. In early implementations the agent worked on clean fields but occasionally exited the loop without ever calling update_lc_field leaving the verdict undefined. In a verification platform, an undefined verdict is worse than a wrong one: the operator has nothing to act on.

Raising max iterations and tightening the system prompt reduced the failure rate but did not eliminate it. The working approach was an explicit three-layer decision strategy in the judge code. The main loop runs the tool-call dialogue. If it exits without a decision, the agent re-prompts with tool_choice forcing a call to update_lc_field ("forced finalization"). If that also fails, a last-resort text parser pulls a YES/NO from free-text content ("text fallback"). Each path is tagged on the result so the audit log can distinguish a confident main-loop verdict from a fallback-derived one. The fallback paths are rare in steady state, but their presence is what makes the platform safe behind a payment release decision.

Results

Capabilities below are verified against the codebase at the engagement-closing commit.

Capability
Before
After
Document types supported
None (manual examination only)
26 trade and shipping document types including bills of lading, commercial invoices, certificates of origin, insurance policies, and air waybills
Regulatory corpus indexed
None
330 chunks of UCP 600 (48 articles) and ISBP 821, retrieved at decision time
Specialized AI agents
None
Seven agents for classification, segmentation, extraction, verification, document validation, cross-check, and multilingual reporting
Concurrency per worker
Sequential manual review
Five concurrent L/C jobs per worker process, heartbeat-monitored
Regression coverage
None
Golden-snapshot regression suite seeded with real customer SWIFT MT 700 fixtures, validating parser, extractor, and format-detection layers on every change

The platform replaces manual document examination with a traceable, retrieval-grounded, multi-agent pipeline whose every decision is tied to a UCP 600 or ISBP 821 citation and a specific document field. Measured outcomes against the SOW targets (five-day to under-one-day cycle, classification accuracy above 95 percent, throughput of 1,000 documents per day) are tracked by the customer's operations team during pilot and are outside this case study.

Lessons Learned

Tool-call reliability is a platform feature, not a prompt-tuning afterthought. The three-layer judge fallback (main loop, forced finalization, text fallback) added more code than the loop itself, and it was worth every line. Any team putting open-ended LLM tool calling behind a regulated decision should plan forced-finalization and text-parse paths from the start.

Provider-agnostic LLM interfaces pay off the first time the inference plane changes. A single LLM_PROVIDER enum with vLLM, OpenAI, and Anthropic backends made it cheap to substitute models and providers as the cost and quality landscape shifted during the engagement.

Real customer documents beat synthetic test data. The golden-snapshot regression harness around live L/C presentations caught format-detection regressions no unit test would surface. Fixtures live in a private bucket with hash pins committed to the repo, keeping the test corpus reproducible without leaking sensitive content.

Conclusion

kloia delivered the multi-agent L/C compliance platform that the customer now embeds in its product offering to trade finance banks. The platform takes a combined SWIFT MT 700 presentation, segments and extracts the supporting documents, verifies each field against UCP 600 and ISBP 821, and returns a multilingual compliance report. The architecture is AI-first: agentic reasoning grounded in a retrieval layer, with rule code reserved for the parts that genuinely are rules. The AWS substrate (container compute for the API and worker, Amazon EC2 for Milvus) lets kloia operate the infrastructure while the customer focuses on banking workflow.

What this unlocks:

  • A second bank can be onboarded by deploying another worker fleet pointed at the same regulatory index: the platform is customer-agnostic by construction and ready for multi-tenant rollout in a later phase.

  • The same pattern (classifier, document extractors, retrieval-grounded judge, multilingual rapporteur) re-targets to adjacent trade finance instruments (standby L/Cs, documentary collections, guarantees) by adding extractors and regulatory chunks, not by rewriting the pipeline.

  • The per-decision audit trail (regulation citation plus document evidence) gives a defensible reserve workflow that a black-box classifier could not, keeping the offering compatible with future regulator scrutiny of AI-assisted decisioning.

A fintech building L/C compliance tooling for trade finance banks replaced manual document examination with a multi-agent AI platform operated by kloia on AWS, indexing 330 regulatory chunks across 26 document types behind a seven-agent verification pipeline targeting a five-day-to-under-one-day cycle reduction.

kloia is an AWS Premier Partner specializing in cloud modernization, AI-driven platform engineering, and AWS Marketplace solutions. To explore our related services or discuss a similar engagement, visit [kloia.com](https://kloia.com) or our [AWS Marketplace listings](https://aws.amazon.com/marketplace/seller-profile?id=kloia).

Case Studies

Let's Work Together

We are happy to help you modernize your platform foundation and accelerate your product delivery.