Case Study - TDC

No company context
AI tools write code without knowing your codebase, your specs, your stored procedures, your standards. Generic in, generic out.

Zero audit trail
Nobody knows what changed, by whom, against which spec, at what cost. When something breaks, the log is empty.

Knowledge stays siloed
The senior engineer's head is still the source of truth. The pipeline learns nothing. That person leaves, the knowledge walks out.

Pilot never becomes default
The shiny demo works. Then adoption stalls. The new way of working never replaces the old one.

Why AIDLC?

Faster delivery
Tens of dollars per feature, hours of elapsed time. Not weeks.

Built-in compliance
Every action audited. Every dollar attributed to an issue.

Institutional memory
The pipeline learns. The org's knowledge stops walking out the door.

Faster onboarding
New engineers ramp on a pipeline that already knows the codebase.

A pipeline that improves
Each cycle's retro feeds the next cycle's configuration.

Humans stay in control
Agents propose. Humans approve. Every merge is gated.

No company context

Zero audit trail

Knowledge stays siloed

Pilot never becomes default

Faster delivery

Built-in compliance

Institutional memory

Faster onboarding

A pipeline that improves

Humans stay in control

VM → Container replatforming

Move off bare VMs onto EKS with proper orchestration, autoscaling, and workload isolation.

Custom metrics

Business and technical KPIs exported via OTel metrics SDK to CloudWatch and Managed Prometheus.

Structured logging

JSON-structured logs with trace correlation IDs, shipped to CloudWatch Logs Insights.

SLO alerting

Composite alarms and SLO burn-rate alerts wired from day one not as an afterthought.

Before

Monolithic .NET / Java app on VMs

After

Containerized microservices on EKS

Before

Self-managed relational DB on EC2

After

Amazon Aurora / RDS with redesigned schema

Before

On-prem RabbitMQ / ActiveMQ

After

Amazon SQS / MSK / EventBridge

Before

No distributed tracing or correlation

After

Full OTel traces, metrics, and logs

Before

Single AWS account, manual IAM

After

AWS Control Tower multi-account with guardrails

Before

Manual deployments, no CI/CD

After

GitOps pipelines with ArgoCD / CodePipeline

AWS Control Tower

Automated landing zone with account vending, OU hierarchy, and built-in guardrails.

Security baseline

AWS Security Hub, GuardDuty, Config rules, and CIS/NIST/PCI benchmarks enforced via SCPs.

Identity & access

IAM Identity Center (SSO), ABAC policies, least-privilege roles across all accounts.

Network architecture

Hub-spoke VPC with AWS Transit Gateway, PrivateLink, and Network Firewall for zero-trust connectivity.

FinOps & cost governance

Tagging strategy, AWS Cost Explorer, Budgets alerts, and Savings Plans from day one.

Compliance as code

AWS Config conformance packs, automated remediation, and audit-ready reporting.

Business Context

A fintech is building AI-driven Letter of Credit (L/C) compliance tooling for trade finance banks, and asked kloia to deliver the document-parser platform at the core of the offering. In the bank operations process the platform replaces, documents arrive physically by DHL or TNT, get scanned, then are examined manually against UCP 600 (ICC Uniform Customs and Practice, 2007 revision) and ISBP 821. Each packet passes a two-stage review (expert, then manager), and a single discrepancy in dates, amounts, party names, or document counts blocks payment. The cycle runs about five days end-to-end, with peak weeks at ten times normal volume.

kloia was asked to deliver a standalone AI service that ingests the packet, classifies and extracts the documents, verifies them against the L/C requirements, and emits a multilingual compliance report, deployed on AWS and operated by kloia.

Key contributing factors:

L/C presentations arrive as a single combined PDF mixing the SWIFT MT 700 message with six to ten supporting documents in arbitrary order, with boilerplate pages and faxed scans interleaved.
Verification rules span UCP 600 articles plus ISBP 821 banking practice, which makes a rules-only approach brittle.
Document formats vary widely: born-digital PDFs, scanned bills of lading, multi-column Word exports, faxes, and stamped originals versus carbon copies.
Common reserves are mechanical (date mismatches between SWIFT field 44C and the bill of lading, party-name inconsistencies between applicant 50 and the commercial invoice, missing or miscounted originals) and well-suited to extraction-and-comparison rather than judgment.

Constraints: verify against UCP 600 and ISBP 821 (not an invented rule set), produce auditable reasoning for every reserve, support multilingual output for the operator audience, and host the platform on AWS under kloia's operation so the customer does not staff the AI infrastructure itself.

Definition of success (engagement targets, per the project SOW):

Compress the manual five-day examination cycle to under one day through automation.
Classification accuracy above 95 percent, data extraction above 90 percent on key fields, language detection above 99 percent.
Sub-30-second processing per document, availability above 99.5 percent, capacity for 1,000 documents per day at 50-plus concurrency.
70 percent reduction in manual processing time and 90 percent of documents processed without human intervention.

These targets define the engagement; measured outcomes against them are tracked by the customer's operations team.

Engagement Scope

kloia delivered the platform in five phases (Discovery, Core Development, AI/ML Training, Integration and Testing, Deployment), with kloia provisioning the AWS environment and the customer providing domain experts, sample documents, and business-rule validation.
In scope:

A multi-agent AI pipeline covering page classification, document segmentation, field extraction across 26 document types, L/C field verification, and multilingual report generation.
A retrieval-augmented generation (RAG) layer pre-loaded with UCP 600 and ISBP 821, comprising 330 regulatory text chunks indexed into a Milvus vector store.
A FastAPI control plane with JWT-authenticated upload, job submission, and webhook subscription, and an async worker tier that runs the agent pipeline with up to five concurrent jobs per worker process, heartbeat-monitored for crash recovery.
A regression-test harness with golden snapshots of real L/C transactions for parser and extractor stability.

Roles and stakeholders:

Role

Party

Responsibility

Solution Architect

kloia

Multi-agent pipeline design, RAG architecture, AWS topology

AI Engineering Lead

kloia

Agent prompts, tool-call schemas, judge fallback strategy

Backend Engineering Lead

kloia

FastAPI surface, async worker, webhook outbox, persistence

Trade Finance Domain Expert

Client

UCP 600 / ISBP 821 interpretation, reserve language review

Product and Operations Lead

Client

Acceptance testing, sample-document provision, bi-weekly steering

Solution Architecture

The platform has three planes: an API ingestion plane that authenticates uploads and enqueues jobs, an async agent plane that does the AI work, and a webhook delivery plane that notifies the bank. The two compute tiers (an API container and an async worker container) run as container images on AWS; the Milvus vector store runs on Amazon Elastic Compute Cloud (Amazon EC2) instances.

The API authenticates the uploader, stores the object, and writes a job row. The worker runs a tiered text-extraction strategy (pypdf for born-digital, pdfplumber for multi-column layouts, Azure Document Intelligence for scans), then hands pages to the page classifier agent to detect document boundaries. Each segmented document is routed to a type-specific extractor that combines vision LLM inference with the OCR text. The judge agent walks the SWIFT MT 700 field list and, for each field, calls retrieval tools against the Milvus index and the extracted documents to produce a YES/NO decision with regulation and document evidence attached. The rapporteur agent assembles the multilingual report, and the webhook worker delivers it through an outbox table for at-least-once, replayable delivery.

Service

Why chosen

Alternative considered

Amazon EC2 for Milvus

A small (330 chunks) read-hot index on the agent's critical path. Self-managed Milvus on dedicated EC2 gave predictable tail latency and full index control.

A managed vector service: rejected in favor of index control.

Container compute on AWS for API and worker

The API is stateless and horizontal; the worker holds models, OCR clients, and the RAG cache in process. Container images isolated the scaling profiles.

AWS Lambda (originally proposed in the SOW): rejected once the multi-agent runtime profile (shared models, RAG cache, OCR client) made cold-start and per-invocation memory costs unworkable.

PostgreSQL on AWS for jobs, outbox, and metadata

Jobs run seconds to minutes and need transactional creation with an outbox for at-least-once webhook delivery. PostgreSQL gives that in one moving piece.

Amazon DynamoDB (originally proposed in the SOW): rejected because the job queue and outbox patterns rely on relational joins and transactional inserts the agent pipeline depends on.

Provider-agnostic vLLM endpoint over HTTPS

One interface selects between models. Currently consumes Gemma 4 31B IT (vision) and GPT-OSS 20B (text).

Embedding the vision model in the worker: rejected on memory grounds.

Azure Document Intelligence as OCR tier 3

Scanned bills of lading and faxed certificates yield no text from pypdf or pdfplumber. Routing only those pages to OCR kept cost proportional to an rate.

OCR on every page: rejected on cost; a character-density threshold escalates per page.

Key Architectural Decisions

Multi-agent decomposition, not a monolithic prompt. A single "is this L/C compliant?" prompt is brittle and untraceable. The platform splits the work across specialized agents (page classifier, 26-extractor factory, judge, document validator, cross-check, rapporteur), each emitting structured output that feeds the next so the reasoning chain is preserved for audit.

Retrieval-grounded verification. UCP 600 and ISBP 821 are indexed into Milvus as 330 chunks using google/embeddinggemma-300m embeddings. When the judge justifies a field decision, it retrieves the relevant articles by similarity and cites them in the reserve text. This protects against model drift and lets domain experts update the corpus without retraining.

Vision and text models split by task. google/gemma-4-31b-it handles page classification and field extraction from images; openai/gpt-oss-20b runs the judge and rapporteur, where the work is structured reasoning over tool calls. Splitting by task keeps each prompt short and lets the inference fleet route by capability.

Single-process worker with bounded concurrency. The worker runs five concurrent jobs in one async process so the RAG index, the embedding model, and the OCR client load once and are shared, aligned with the Performance Efficiency and Cost Optimization pillars of the AWS Well-Architected Framework.

Alternatives considered and rejected:

Approach

Why rejected

A pure rules engine, no LLMs

UCP 600 plus ISBP 821 plus the variety of MT 700 free-text fields produces too many edge cases to maintain. The platform still has rule code (port lookups, date arithmetic, amount tolerance), but the document-language reasoning is owned by the agents.

One end-to-end multimodal prompt per L/C

Untraceable. The customer requires every reserve to be defensible by citation, and a single opaque prompt cannot produce that.

The SOW's original serverless design (AWS Lambda + Amazon DynamoDB + Amazon SQS / Amazon SNS)

Rejected during build. The multi-agent runtime needs the embedding model, the OCR client, and the RAG cache resident in memory; the queue and outbox need transactional inserts. A long-running container tier plus PostgreSQL replaced both.

The Hardest Problem

Making the judge agent's tool-calling reliable enough to ship. The judge decides each SWIFT MT 700 field by walking a tool-call loop: search documents, read a candidate document, retrieve from the regulatory vector store, and finally call update_lc_field with the YES/NO verdict and reasoning. In early implementations the agent worked on clean fields but occasionally exited the loop without ever calling update_lc_field leaving the verdict undefined. In a verification platform, an undefined verdict is worse than a wrong one: the operator has nothing to act on.

Raising max iterations and tightening the system prompt reduced the failure rate but did not eliminate it. The working approach was an explicit three-layer decision strategy in the judge code. The main loop runs the tool-call dialogue. If it exits without a decision, the agent re-prompts with tool_choice forcing a call to update_lc_field ("forced finalization"). If that also fails, a last-resort text parser pulls a YES/NO from free-text content ("text fallback"). Each path is tagged on the result so the audit log can distinguish a confident main-loop verdict from a fallback-derived one. The fallback paths are rare in steady state, but their presence is what makes the platform safe behind a payment release decision.

Capability

Before

After

Document types supported

None (manual examination only)

26 trade and shipping document types including bills of lading, commercial invoices, certificates of origin, insurance policies, and air waybills

Regulatory corpus indexed

None

330 chunks of UCP 600 (48 articles) and ISBP 821, retrieved at decision time

Specialized AI agents

None

Seven agents for classification, segmentation, extraction, verification, document validation, cross-check, and multilingual reporting

Concurrency per worker

Sequential manual review

Five concurrent L/C jobs per worker process, heartbeat-monitored

Regression coverage

None

Golden-snapshot regression suite seeded with real customer SWIFT MT 700 fixtures, validating parser, extractor, and format-detection layers on every change

The platform replaces manual document examination with a traceable, retrieval-grounded, multi-agent pipeline whose every decision is tied to a UCP 600 or ISBP 821 citation and a specific document field. Measured outcomes against the SOW targets (five-day to under-one-day cycle, classification accuracy above 95 percent, throughput of 1,000 documents per day) are tracked by the customer's operations team during pilot and are outside this case study.

Lessons Learned

Tool-call reliability is a platform feature, not a prompt-tuning afterthought. The three-layer judge fallback (main loop, forced finalization, text fallback) added more code than the loop itself, and it was worth every line. Any team putting open-ended LLM tool calling behind a regulated decision should plan forced-finalization and text-parse paths from the start.

Provider-agnostic LLM interfaces pay off the first time the inference plane changes. A single LLM_PROVIDER enum with vLLM, OpenAI, and Anthropic backends made it cheap to substitute models and providers as the cost and quality landscape shifted during the engagement.

Real customer documents beat synthetic test data. The golden-snapshot regression harness around live L/C presentations caught format-detection regressions no unit test would surface. Fixtures live in a private bucket with hash pins committed to the repo, keeping the test corpus reproducible without leaking sensitive content.

Conclusion

kloia delivered the multi-agent L/C compliance platform that the customer now embeds in its product offering to trade finance banks. The platform takes a combined SWIFT MT 700 presentation, segments and extracts the supporting documents, verifies each field against UCP 600 and ISBP 821, and returns a multilingual compliance report. The architecture is AI-first: agentic reasoning grounded in a retrieval layer, with rule code reserved for the parts that genuinely are rules. The AWS substrate (container compute for the API and worker, Amazon EC2 for Milvus) lets kloia operate the infrastructure while the customer focuses on banking workflow.

What this unlocks:

A second bank can be onboarded by deploying another worker fleet pointed at the same regulatory index: the platform is customer-agnostic by construction and ready for multi-tenant rollout in a later phase.
The same pattern (classifier, document extractors, retrieval-grounded judge, multilingual rapporteur) re-targets to adjacent trade finance instruments (standby L/Cs, documentary collections, guarantees) by adding extractors and regulatory chunks, not by rewriting the pipeline.
The per-decision audit trail (regulation citation plus document evidence) gives a defensible reserve workflow that a black-box classifier could not, keeping the offering compatible with future regulator scrutiny of AI-assisted decisioning.

A fintech building L/C compliance tooling for trade finance banks replaced manual document examination with a multi-agent AI platform operated by kloia on AWS, indexing 330 regulatory chunks across 26 document types behind a seven-agent verification pipeline targeting a five-day-to-under-one-day cycle reduction.

kloia is an AWS Premier Partner specializing in cloud modernization, AI-driven platform engineering, and AWS Marketplace solutions. To explore our related services or discuss a similar engagement, visit [kloia.com](https://kloia.com) or our [AWS Marketplace listings](https://aws.amazon.com/marketplace/seller-profile?id=kloia).

The Problem

Why AIDLC?

Problems

Why AIDLC?

OpenTelemetry:
Baked in, not bolted on

VM → Container replatforming

Custom metrics

Structured logging

SLO alerting

What changes in your architecture

AWS Landing Zone & multi-account governance

AWS Control Tower

Security baseline

Identity & access

Network architecture

FinOps & cost governance

Compliance as code

Measurable results our clients achieve

Automating Letter of Credit Compliance with Multi-Agent AI on AWS

Business Context

Engagement Scope

Solution Architecture

Service Selection Rationale

Key Architectural Decisions

The Hardest Problem

Results

Lessons Learned

Conclusion

Case Studies

Open-Source Observability Transformation on AWS

Open-Source Observability Transformation on AWS

Open-Source Observability Transformation on AWS

Let's Work Together

The Problem

Why AIDLC?

Problems

Why AIDLC?

OpenTelemetry: Baked in, not bolted on

VM → Container replatforming

Custom metrics

Structured logging

SLO alerting

What changes in your architecture

AWS Landing Zone & multi-account governance

AWS Control Tower

Security baseline

Identity & access

Network architecture

FinOps & cost governance

Compliance as code

Measurable results our clients achieve

Automating Letter of Credit Compliance with Multi-Agent AI on AWS

Business Context

Engagement Scope

Solution Architecture

Service Selection Rationale

Key Architectural Decisions

The Hardest Problem

Results

Lessons Learned

Conclusion

Case Studies

Open-Source Observability Transformation on AWS

Open-Source Observability Transformation on AWS

Open-Source Observability Transformation on AWS

Let's Work Together

OpenTelemetry:
Baked in, not bolted on