The Problem
No company context
AI tools write code without knowing your codebase, your specs, your stored procedures, your standards. Generic in, generic out.
Zero audit trail
Nobody knows what changed, by whom, against which spec, at what cost. When something breaks, the log is empty.
Knowledge stays siloed
The senior engineer's head is still the source of truth. The pipeline learns nothing. That person leaves, the knowledge walks out.
Pilot never becomes default
The shiny demo works. Then adoption stalls. The new way of working never replaces the old one.
Why AIDLC?
Faster delivery
Tens of dollars per feature, hours of elapsed time. Not weeks.
Built-in compliance
Every action audited. Every dollar attributed to an issue.
Institutional memory
The pipeline learns. The org's knowledge stops walking out the door.
Faster onboarding
New engineers ramp on a pipeline that already knows the codebase.
A pipeline that improves
Each cycle's retro feeds the next cycle's configuration.
Humans stay in control
Agents propose. Humans approve. Every merge is gated.
Problems
No company context
Zero audit trail
Knowledge stays siloed
Pilot never becomes default
Why AIDLC?
Faster delivery
Built-in compliance
Institutional memory
Faster onboarding
A pipeline that improves
Humans stay in control
OpenTelemetry:
Baked in, not bolted on
Auto-instrumentation of .NET and Java services → ADOT collector sidecars on EKS → AWS X-Ray for distributed tracing → CloudWatch for metrics and logs → custom dashboards and SLO alerting. All traces, metrics, and logs flow through a single OTel pipeline, swap backends without re-instrumenting.
VM → Container replatforming
Move off bare VMs onto EKS with proper orchestration, autoscaling, and workload isolation.
Custom metrics
Business and technical KPIs exported via OTel metrics SDK to CloudWatch and Managed Prometheus.
Structured logging
JSON-structured logs with trace correlation IDs, shipped to CloudWatch Logs Insights.
SLO alerting
Composite alarms and SLO burn-rate alerts wired from day one not as an afterthought.
What changes in your architecture
AWS Landing Zone & multi-account governance
AWS Control Tower
Security baseline
Identity & access
Network architecture
FinOps & cost governance
Compliance as code
Measurable results our clients achieve
60-80%
infrastructure cost reduction vs on-prem VMs
10x
faster deployments via GitOps pipelines
99.9%+
availability through EKS self-healing & multi-AZ
<5 min
MTTR with full OTel trace-to-log correlation
Automating Letter of Credit Compliance with Multi-Agent AI on AWS
Business Context
A fintech is building AI-driven Letter of Credit (L/C) compliance tooling for trade finance banks, and asked kloia to deliver the document-parser platform at the core of the offering. In the bank operations process the platform replaces, documents arrive physically by DHL or TNT, get scanned, then are examined manually against UCP 600 (ICC Uniform Customs and Practice, 2007 revision) and ISBP 821. Each packet passes a two-stage review (expert, then manager), and a single discrepancy in dates, amounts, party names, or document counts blocks payment. The cycle runs about five days end-to-end, with peak weeks at ten times normal volume.
kloia was asked to deliver a standalone AI service that ingests the packet, classifies and extracts the documents, verifies them against the L/C requirements, and emits a multilingual compliance report, deployed on AWS and operated by kloia.
Key contributing factors:
-
L/C presentations arrive as a single combined PDF mixing the SWIFT MT 700 message with six to ten supporting documents in arbitrary order, with boilerplate pages and faxed scans interleaved.
-
Verification rules span UCP 600 articles plus ISBP 821 banking practice, which makes a rules-only approach brittle.
-
Document formats vary widely: born-digital PDFs, scanned bills of lading, multi-column Word exports, faxes, and stamped originals versus carbon copies.
-
Common reserves are mechanical (date mismatches between SWIFT field 44C and the bill of lading, party-name inconsistencies between applicant 50 and the commercial invoice, missing or miscounted originals) and well-suited to extraction-and-comparison rather than judgment.
Constraints: verify against UCP 600 and ISBP 821 (not an invented rule set), produce auditable reasoning for every reserve, support multilingual output for the operator audience, and host the platform on AWS under kloia's operation so the customer does not staff the AI infrastructure itself.
Definition of success (engagement targets, per the project SOW):
-
Compress the manual five-day examination cycle to under one day through automation.
-
Classification accuracy above 95 percent, data extraction above 90 percent on key fields, language detection above 99 percent.
-
Sub-30-second processing per document, availability above 99.5 percent, capacity for 1,000 documents per day at 50-plus concurrency.
-
70 percent reduction in manual processing time and 90 percent of documents processed without human intervention.
These targets define the engagement; measured outcomes against them are tracked by the customer's operations team.
Engagement Scope
kloia delivered the platform in five phases (Discovery, Core Development, AI/ML Training, Integration and Testing, Deployment), with kloia provisioning the AWS environment and the customer providing domain experts, sample documents, and business-rule validation.
In scope:
-
A multi-agent AI pipeline covering page classification, document segmentation, field extraction across 26 document types, L/C field verification, and multilingual report generation.
-
A retrieval-augmented generation (RAG) layer pre-loaded with UCP 600 and ISBP 821, comprising 330 regulatory text chunks indexed into a Milvus vector store.
-
A FastAPI control plane with JWT-authenticated upload, job submission, and webhook subscription, and an async worker tier that runs the agent pipeline with up to five concurrent jobs per worker process, heartbeat-monitored for crash recovery.
- A regression-test harness with golden snapshots of real L/C transactions for parser and extractor stability.
Roles and stakeholders:
Solution Architecture
The platform has three planes: an API ingestion plane that authenticates uploads and enqueues jobs, an async agent plane that does the AI work, and a webhook delivery plane that notifies the bank. The two compute tiers (an API container and an async worker container) run as container images on AWS; the Milvus vector store runs on Amazon Elastic Compute Cloud (Amazon EC2) instances.
The API authenticates the uploader, stores the object, and writes a job row. The worker runs a tiered text-extraction strategy (pypdf for born-digital, pdfplumber for multi-column layouts, Azure Document Intelligence for scans), then hands pages to the page classifier agent to detect document boundaries. Each segmented document is routed to a type-specific extractor that combines vision LLM inference with the OCR text. The judge agent walks the SWIFT MT 700 field list and, for each field, calls retrieval tools against the Milvus index and the extracted documents to produce a YES/NO decision with regulation and document evidence attached. The rapporteur agent assembles the multilingual report, and the webhook worker delivers it through an outbox table for at-least-once, replayable delivery.
Service Selection Rationale
Key Architectural Decisions
Multi-agent decomposition, not a monolithic prompt. A single "is this L/C compliant?" prompt is brittle and untraceable. The platform splits the work across specialized agents (page classifier, 26-extractor factory, judge, document validator, cross-check, rapporteur), each emitting structured output that feeds the next so the reasoning chain is preserved for audit.
Retrieval-grounded verification. UCP 600 and ISBP 821 are indexed into Milvus as 330 chunks using google/embeddinggemma-300m embeddings. When the judge justifies a field decision, it retrieves the relevant articles by similarity and cites them in the reserve text. This protects against model drift and lets domain experts update the corpus without retraining.
Vision and text models split by task. google/gemma-4-31b-it handles page classification and field extraction from images; openai/gpt-oss-20b runs the judge and rapporteur, where the work is structured reasoning over tool calls. Splitting by task keeps each prompt short and lets the inference fleet route by capability.
Single-process worker with bounded concurrency. The worker runs five concurrent jobs in one async process so the RAG index, the embedding model, and the OCR client load once and are shared, aligned with the Performance Efficiency and Cost Optimization pillars of the AWS Well-Architected Framework.
Alternatives considered and rejected:
The Hardest Problem
Making the judge agent's tool-calling reliable enough to ship. The judge decides each SWIFT MT 700 field by walking a tool-call loop: search documents, read a candidate document, retrieve from the regulatory vector store, and finally call update_lc_field with the YES/NO verdict and reasoning. In early implementations the agent worked on clean fields but occasionally exited the loop without ever calling update_lc_field leaving the verdict undefined. In a verification platform, an undefined verdict is worse than a wrong one: the operator has nothing to act on.
Raising max iterations and tightening the system prompt reduced the failure rate but did not eliminate it. The working approach was an explicit three-layer decision strategy in the judge code. The main loop runs the tool-call dialogue. If it exits without a decision, the agent re-prompts with tool_choice forcing a call to update_lc_field ("forced finalization"). If that also fails, a last-resort text parser pulls a YES/NO from free-text content ("text fallback"). Each path is tagged on the result so the audit log can distinguish a confident main-loop verdict from a fallback-derived one. The fallback paths are rare in steady state, but their presence is what makes the platform safe behind a payment release decision.
Results
Capabilities below are verified against the codebase at the engagement-closing commit.
The platform replaces manual document examination with a traceable, retrieval-grounded, multi-agent pipeline whose every decision is tied to a UCP 600 or ISBP 821 citation and a specific document field. Measured outcomes against the SOW targets (five-day to under-one-day cycle, classification accuracy above 95 percent, throughput of 1,000 documents per day) are tracked by the customer's operations team during pilot and are outside this case study.
Lessons Learned
Tool-call reliability is a platform feature, not a prompt-tuning afterthought. The three-layer judge fallback (main loop, forced finalization, text fallback) added more code than the loop itself, and it was worth every line. Any team putting open-ended LLM tool calling behind a regulated decision should plan forced-finalization and text-parse paths from the start.
Provider-agnostic LLM interfaces pay off the first time the inference plane changes. A single LLM_PROVIDER enum with vLLM, OpenAI, and Anthropic backends made it cheap to substitute models and providers as the cost and quality landscape shifted during the engagement.
Real customer documents beat synthetic test data. The golden-snapshot regression harness around live L/C presentations caught format-detection regressions no unit test would surface. Fixtures live in a private bucket with hash pins committed to the repo, keeping the test corpus reproducible without leaking sensitive content.
Conclusion
kloia delivered the multi-agent L/C compliance platform that the customer now embeds in its product offering to trade finance banks. The platform takes a combined SWIFT MT 700 presentation, segments and extracts the supporting documents, verifies each field against UCP 600 and ISBP 821, and returns a multilingual compliance report. The architecture is AI-first: agentic reasoning grounded in a retrieval layer, with rule code reserved for the parts that genuinely are rules. The AWS substrate (container compute for the API and worker, Amazon EC2 for Milvus) lets kloia operate the infrastructure while the customer focuses on banking workflow.
What this unlocks:
-
A second bank can be onboarded by deploying another worker fleet pointed at the same regulatory index: the platform is customer-agnostic by construction and ready for multi-tenant rollout in a later phase.
-
The same pattern (classifier, document extractors, retrieval-grounded judge, multilingual rapporteur) re-targets to adjacent trade finance instruments (standby L/Cs, documentary collections, guarantees) by adding extractors and regulatory chunks, not by rewriting the pipeline.
-
The per-decision audit trail (regulation citation plus document evidence) gives a defensible reserve workflow that a black-box classifier could not, keeping the offering compatible with future regulator scrutiny of AI-assisted decisioning.
A fintech building L/C compliance tooling for trade finance banks replaced manual document examination with a multi-agent AI platform operated by kloia on AWS, indexing 330 regulatory chunks across 26 document types behind a seven-agent verification pipeline targeting a five-day-to-under-one-day cycle reduction.
kloia is an AWS Premier Partner specializing in cloud modernization, AI-driven platform engineering, and AWS Marketplace solutions. To explore our related services or discuss a similar engagement, visit [kloia.com](https://kloia.com) or our [AWS Marketplace listings](https://aws.amazon.com/marketplace/seller-profile?id=kloia).
Case Studies
Open-Source Observability Transformation on AWS
Open-Source Observability Transformation on AWS