AX/T/05 — AX/TUTORIALS
Published: May 10, 2026 · 17 min read

How to automate invoice retrieval (e-invoicing portals + email)

E-invoicing APIs, scraping vendor portals, parsing PDF invoices from email, push to accounting. Without manual downloading.

IntermediateKSeF APIIMAPpdf-parsePlaywrightPostgreSQLOpenAI / Anthropic

Invoice retrieval is a classic time sink in accounting. Invoices come from 3 sources:

  1. E-invoicing API (mandatory in many EU countries for B2B) — via API
  2. Vendor portals (telecom, utilities, etc.) — login + download
  3. Email (PDF attachment) — most chaotic

We show a unified pipeline that catches all, parses to a common schema, pushes to accounting. Real savings: 4-12h/week of accounting work for a mid-sized company.

What you need
  • Node.js 20+ or Python 3.11+
  • Access tokens for relevant e-invoicing system
  • Access to accounting email inbox (Gmail / IMAP)
  • PDF parser (pdf-parse, pdfplumber)
  • Accounting system with API
Steps
  1. 01

    E-invoicing API integration

    E-invoicing API (system varies by country — e.g. KSeF in Poland, FatturaPA in Italy, SDI). Production endpoint exposed by government. Authentication via token (trusted profile or qualified certificate).

    // einvoice/client.ts
    import { z } from 'zod';
    
    const Invoice = z.object({
      referenceNumber: z.string(),
      invoiceNumber: z.string(),
      issueDate: z.string(),
      sellerVatId: z.string(),
      sellerName: z.string(),
      totalGross: z.number(),
      totalNet: z.number(),
      vat: z.number(),
    });
    
    export async function fetchInvoices(since: Date) {
      const session = await authenticate();
      
      const res = await fetch(`${EINVOICE_URL}/online/Invoice/Query`, {
        method: 'POST',
        headers: {
          'SessionToken': session.token,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          queryCriteria: {
            subjectType: 'subject1',
            type: 'incremental',
            invoicingDateRange: {
              from: since.toISOString(),
              to: new Date().toISOString(),
            },
          },
        }),
      });
      
      const data = await res.json();
      return data.invoiceHeaderList.map(h => Invoice.parse(h));
    }
  2. 02

    Vendor portal scraping with Playwright

    Telecom, utilities — each has its own portal with invoices. Standard flow: login → invoices → download PDF.

    // portals/vendor1.ts
    export async function downloadVendor1Invoices(creds) {
      const browser = await chromium.launch({ headless: true });
      const context = await browser.newContext();
      const page = await context.newPage();
      
      // Login
      await page.goto('https://www.vendor1.com/login');
      await page.locator('#login').fill(creds.login);
      await page.locator('#password').fill(creds.password);
      await page.locator('button[type=submit]').click();
      await page.waitForURL('**/account');
      
      // Invoices section
      await page.goto('https://www.vendor1.com/account/invoices');
      await page.waitForLoadState('networkidle');
      
      // Get PDF URLs
      const invoices = await page.$$eval('[data-test="invoice-row"]', rows =>
        rows.map(r => ({
          date: r.querySelector('.date')?.textContent,
          amount: r.querySelector('.amount')?.textContent,
          downloadUrl: r.querySelector('a.download')?.href,
        }))
      );
      
      // Download each PDF
      for (const inv of invoices) {
        const downloadPromise = page.waitForEvent('download');
        await page.goto(inv.downloadUrl);
        const download = await downloadPromise;
        const path = `./invoices/vendor1-${inv.date}.pdf`;
        await download.saveAs(path);
        inv.localPath = path;
      }
      
      await browser.close();
      return invoices;
    }

    Save storageState so you do not log in constantly.

  3. 03

    Email parsing — IMAP + PDF attachments

    Most chaotic source. IMAP filter on "invoice" in title + attached PDF:

    import { ImapFlow } from 'imapflow';
    import pdfParse from 'pdf-parse';
    
    const client = new ImapFlow({
      host: 'imap.gmail.com',
      port: 993, secure: true,
      auth: { user: 'accounting@company.com', pass: process.env.IMAP_PASS },
    });
    
    async function fetchEmailInvoices() {
      await client.connect();
      await client.mailboxOpen('INBOX');
      
      const messages = client.fetch(
        { since: new Date(Date.now() - 7*24*3600*1000) }, // last 7 days
        { source: true, envelope: true, bodyStructure: true }
      );
      
      for await (const msg of messages) {
        const subject = msg.envelope.subject.toLowerCase();
        if (!/invoice|faktura|rechnung/.test(subject)) continue;
        
        // Find PDF attachments
        const pdfParts = findPdfAttachments(msg.bodyStructure);
        for (const part of pdfParts) {
          const att = await client.download(msg.uid, part.part, { uid: true });
          const pdfBuffer = att.content;
          const parsed = await pdfParse(pdfBuffer);
          // ... extract data from parsed.text
        }
      }
      
      await client.logout();
    }
  4. 04

    AI extraction from PDF — when regex fails

    PDF invoices have varied layouts. Regex works for 60-70% — the rest needs AI:

    import Anthropic from '@anthropic-ai/sdk';
    const claude = new Anthropic();
    
    const InvoiceData = z.object({
      invoiceNumber: z.string(),
      issueDate: z.string(),
      sellerVatId: z.string(),
      sellerName: z.string(),
      totalNet: z.number(),
      totalGross: z.number(),
      vat: z.number(),
      currency: z.enum(['EUR', 'USD', 'GBP']),
      dueDate: z.string().optional(),
    });
    
    async function extractInvoiceData(pdfText: string) {
      const res = await claude.messages.create({
        model: 'claude-3-5-sonnet-latest',
        max_tokens: 1024,
        messages: [{
          role: 'user',
          content: `Extract structured data from this invoice. Return ONLY JSON matching schema:
            { invoiceNumber, issueDate (YYYY-MM-DD), sellerVatId, sellerName, totalNet, totalGross, vat, currency, dueDate }
            
            Invoice text:
            ${pdfText}`,
        }],
      });
      
      const json = JSON.parse(res.content[0].text);
      return InvoiceData.parse(json); // validate
    }

    Cost: ~$0.005 per invoice with Claude Sonnet. 200 invoices/month = $1. Marginal.

  5. 05

    Push to accounting

    Accounting software typically has a REST API. Each external cost invoice: POST /invoices.

    async function pushToAccounting(invoice, pdfPath) {
      const formData = new FormData();
      formData.append('invoice', JSON.stringify({
        sellerName: invoice.sellerName,
        sellerVatId: invoice.sellerVatId,
        issueDate: invoice.issueDate,
        totalNet: invoice.totalNet,
        vat: invoice.vat,
        totalGross: invoice.totalGross,
        currency: invoice.currency,
        documentNumber: invoice.invoiceNumber,
      }));
      formData.append('pdf', await fs.readFile(pdfPath));
      
      const res = await fetch('https://api.accounting-system.com/invoices', {
        method: 'POST',
        headers: { 'Authorization': `Bearer ${process.env.API_KEY}` },
        body: formData,
      });
      
      return res.json();
    }

    After each success: mark invoice as "uploaded", store reference ID, log for audit.

What it costs to run

Run cost for 200 invoices/month (typical mid-firm):

  • VPS (Hetzner CX22): €5/month
  • Anthropic API (AI extraction): ~$1-3/month
  • Storage (PDFs + DB): ~$5/month
  • E-invoicing portals, vendor APIs: $0 (free)

Total: ~$15/month. Savings: 4-12h accounting work/week × ~€30/h = €480-1440/month.

Common pitfalls
  • No idempotency — uploading the same invoice 3× to accounting creates 3 duplicates. Track sent invoices by external ID.
  • OCR vs text-layer PDF — some PDFs are images (scans). Require OCR (Tesseract, Google Vision) before parsing.
  • Email security — IMAP credentials in plain text. Use app-specific passwords + secrets manager.
  • Currency conversion — EUR/USD invoices need conversion at transaction-day exchange rate. Automate it.
  • Cross-source duplicates — the same invoice may arrive via email AND e-invoicing API. Dedup by invoice number + seller VAT.
Build yourself or hire?

A pipeline like the above removes 80-90% of manual invoice processing. The remaining 10-20% are edge cases (foreign vendors, corrections, proforma invoices) that require review.

This is a very common use-case we deliver for B2B companies. Write us if you want an audit of your current workflow.

Frequently asked questions
Is e-invoicing mandatory in 2026?
In Poland: yes from Feb 1, 2026 for all active VAT taxpayers. From April 1, 2026 mandatory for small taxpayers. Other EU countries have their own timelines (Germany 2025, France 2026, Belgium 2028). Invoice issued outside e-invoicing system after mandatory date = invalid for VAT purposes.
What about foreign vendor invoices?
Invoices from foreign vendors (e.g. Stripe, AWS) do not go through national e-invoicing — they arrive via email/portal. Pipeline must handle: email parsing (IMAP), portal scraping per vendor, currency conversion (daily rates), VAT handling (reverse charge for EU B2B).
How much does this actually save?
Mid-firm with 200 invoices/month: typical accountant spends 4-8h/week on receive → categorize → enter. Automation: 80-90% catch, 10-20% manual edge cases. Savings: 3-7h/week × €30/h = €390-840/month. Automation cost: $15-25/month. ROI: 15-30× in the first month.
Will the accounting system accept automated invoices?
Yes — most accounting systems (Xero, QuickBooks, SAP, local equivalents) have REST API specifically for bulk invoice ingestion. Requires: external ID dedup, PDF attachment retention (typically 5-10 years), audit log of each operation.