The Agent Lab.

We build AI agents. In the safest sandboxes for regulated industries.

Vertical agents that run in real operations. Auditable, evaluted before Go-Live, built for real bottlenecks in health insurance, regular insurance, and banks.

Clients & partners

OpenEuroLLMJAAIBitmarckhkkITZBundT-Systems

Where AI actually delivers and where the effort isn't worth it.

In regulated industries (insurance, banking, public health funds) complexity grows faster than your team can. More cases, data, and compliance pressure. The first reflex is to plug in AI somewhere. The better path is to figure out together where AI sustainably delivers results and where a script, a process change, or no automation at all is the right answer. Not every customer knows up front where the real problem sits. That's where we start.

Frontier Research

Custom development at the state of the art.

The Agent Lab - Vertical agents as a finished product.

What fits a packaged product comes from The Agent Lab. Production-ready vertical agents from claims processing to dispute hearings to product data maintenance.

Well known in industries like:
Health Insurance
Insurance
Banks & Finance
Public Sector
Construction
Retail

What changes when AI agents are deployed in the real world?

Only 14% of agent pilots reach production (Gartner 2026). The gap is rarely the model, but data quality, backlog pressure, workflow complexity, and audit readiness. Here are the four bottlenecks where Agent Lab delivers.

Heterogeneous input formats

Problem

Operations staff translate between formats and systems.

Entry errors.

Time lost before the actual work begins.

Solution

Emails, PDFs, scans, and voice messages are normalized.

Fed directly into the case workflow.

Growing Compliance Complexity

Problem

The EU has hard deadlines for how AI should act.

And until when this needs to be proven.

Additional documentation requirements.

Complex case volumes no team can oversee

Problem

Long processing times.

Partial decisions on incomplete context.

Knowledge loss at handovers.

Solution

Full case context (records, policies, history) is evaluated.

The agent proposes a decision for the caseworker to approve.

Poor data quality & inconsistency

Problem

Wrong decisions and weak customer experience.

Constant manual corrections.

Growing compliance risk.

Solution

Data is normalized, validated, and classified in the workflow.

Before it flows into core operations.

When does an agent make sense for your unit?

Four questions we clarify at the start:

More than 1 month of processing backlog in the process?

Stable process (no major overhaul planned in 12 months)?

Can you define today what "correctly processed" means?

Is there an owner in Operations (not IT) who will drive the rollout?

3 of 4 with "Yes" → the agent pays off. We clarify the exact fit in the ellaverse workshop.

Start with a workshop

Where Agent Lab works today.

GOÄ-based claim review

End-to-end review of medical claims against the German GOÄ fee schedule, no human in the loop.

  • Visual verification against the original PDF scan instead of blind trust in OCR
  • Complex GOÄ exclusion and capping rules (e.g. code 70 excludes codes 1 and 2 on the same day)
  • Full reasoning traces for every individual decision
Quantified impact

Accuracy doubled from 40% to 93% case correctness on a 30-claim evaluation. Same model, structured domain instructions.

Three vertical agents with an anchor customer

Three vertical agents with an anchor customer

Three vertical agents in joint development with a GKV anchor customer. Prototypes are running; production go-live in the coming months.

  • Hearing procedures: Document analysis and automated responses for disputes with hospitals
  • Self-payer claims: Contribution calculation and notice generation
  • WUV automation: Efficiency and inefficiency review procedures
Status

Concrete output numbers will be published together with the customer once production goes live.

Multimodal product-listing remediation

Key account managers face supplier catalogs with missing attributes: Material, dimensions, GPSR mandatory disclosures. Buried in manufacturer texts and product images.

  • Several hundred SKUs blocked from sale per mid-sized vendor portfolio
  • Several hundred more products held back by visibility defects
  • Dozens fail GPSR mandatory disclosures (new EU rule since Dec 2024)
Quantified impact

Manufacturer texts and product images are evaluated multimodally; missing attributes are extracted and normalized before the KAM ever steps in.

What our agents are built on.

We're not a sales funnel with an AI wrapper. We do the research that EU and federal programs fund and put the same engineering depth into every agent we ship.

Every agent ships through ellarun.

AI Act Art. 12 audit trail, credential brokering, policy enforcement. The video shows how an Agent Lab agent runs securely inside an ellarun sandbox.

Where does your organization stand on production AI? Find out in 2 minutes.

Take the AI readiness assessment

EU AI Act ready

Full audit trails, compliance documentation, and evidence-based reporting. Ready for August 2026.

Open-source foundation

Built on NVIDIA OpenShell and open standards. No vendor lock-in, no black boxes.

Made in Germany

German company, German data centers. Your data never leaves the EU. Built and operated under European data protection standards.

Model agnostic

Works with Claude, GPT, Mistral, Llama, or your own models. Switch providers anytime without retooling your workflows or losing evaluation history.

Careers at ellamind

We're hiring people who want to build AI that stands up to reality: regulation, scale, and responsibility. Open positions available across engineering, AI, product, and sales.

ellamind team

Most asked questions

Find answers to frequently asked questions. If you can't find your question here, feel free to contact us.

What is the The Agent Lab? +
The Agent Lab builds vertical AI agents for regulated industries. Primarily private and statutory health insurance, banking, and public health funds. We turn real operational bottlenecks like claims processing, dispute hearings, and product data maintenance into production-ready agents that are evaluated before go-live and auditable in production. Packaged vertical agents come from Agent Lab, custom development runs through Frontier Research, and every engagement starts with an ellaverse workshop.
What kinds of agents has ellamind already built? +
Three concrete examples are live today. For a private health insurer, our agent reviews medical claims end-to-end against the German GOÄ fee schedule, improving case correctness from 40% to 93% on a 30-claim evaluation. For a statutory health insurance anchor customer, three vertical agents are in production rollout: hearing procedures with hospitals, self-payer contribution calculation, and WUV efficiency reviews. For an e-commerce marketplace, a multimodal agent extracts missing product attributes, including GPSR mandatory disclosures, directly from manufacturer texts and product images.
How does an engagement with ellamind start? +
Every engagement begins with an ellaverse workshop. Your domain experts and our engineering team work through candidate use cases and reach an honest buy-vs-build decision per case. We are explicit about where an agent pays off and where a script, a process change, or no automation is the right answer. The four questions we clarify up front: is there a real backlog, is the process stable, can you define what 'correctly processed' means, and is there an owner in Operations to drive the rollout.
Can your agents help with EU AI Act and other compliance? +
Yes. Every Agent Lab agent ships through ellarun with EU AI Act Article 12 audit trails by default, and elluminate generates the technical documentation and evidence-based compliance reports aligned with EU AI Act requirements. For high-risk AI systems, we cover the documentation obligations so your team can move agents from experiment to regulated production with confidence.
Do I need technical expertise to work with ellamind agents? +
No. Our agents are designed so that domain experts and engineering teams can both contribute where it matters. Subject-matter experts define evaluation criteria, review agent decisions, and shape the workflow without writing code, while engineering teams get full API access and integration flexibility for the production rollout.

Unlock the power of AI

See how our products can help you evaluate, deploy, and monitor AI agents with confidence.