Invoice parsing and telecom billing automation for multi-format vendor invoices and CDR reconciliation.
Context
Telecom billing workflows requiring itemized verification before downstream processing.
Timeline: Feb 2024 to Oct 2025.
Problem
Invoice layouts varied by vendor and document type, making manual validation expensive and limiting the reach of deterministic extraction alone.
What we delivered
- Initial deterministic text-based PDF parsing for known invoice formats.
- OCR and LLM-assisted extraction path for new or previously unseen layouts.
- Normalization and validation of extracted invoice data against CDR datasets.
- Billing checks for itemized billing, usage validation, and discrepancy detection.
- Controlled AI adoption with monitoring and acceptable-accuracy thresholds rather than full unsupervised automation.
Outcomes
- Reduced manual effort for invoice validation and onboarding of new document layouts.
- Better discrepancy detection confidence before downstream billing workflows.
- Safer integration of AI-assisted extraction into an existing billing process.
Stack
Node.js, .NET, PDF parsing, OCR, LLM integration, workflow automation
Sanitized artifact
A high-level view of the workflow/architecture used in this project, with sensitive details removed.
Invoices -> Extract -> Normalize -> Validate vs CDR -> Discrepancies
|-> OCR/LLM assist for unknown layouts