Industry & compliance

Building a HIPAA-Ready Architecture for Clinical Decision Support

Umair Khan··10 min read
HIPAABAALLM APIsComplianceArchitecture

Last week a CTO at a regional diagnostic lab emailed me the shortlist his procurement team was evaluating. Six AI vendors pitching some version of "automated NGS interpretation."

For each one he asked the same question: Can you produce an executed BAA that covers the LLM call itself?

Four of them went vague. Two said yes but couldn't name the API tier or the counterparty. Zero produced the actual document before the follow-up call.

That's the state of HIPAA compliance in AI-for-healthcare in April 2026. It's why lab procurement teams are (rightly) skeptical, and it's why this post exists. Consider it a reference you can hand to your compliance officer when they ask what "HIPAA-ready LLM pipeline" should actually mean.

What makes an LLM pipeline HIPAA-ready?

A HIPAA-ready LLM pipeline requires four simultaneous conditions: a BAA with the customer, BAAs with every downstream subprocessor including the LLM provider, contractually enforced zero-retention on LLM calls, and US-only data residency pinned at the configuration layer. Miss any one of them and you are, at best, HIPAA-adjacent.

HIPAA compliance for an LLM pipeline is not a single checkbox. It's four simultaneous conditions. Miss one and you are, at best, HIPAA-adjacent.

  1. A signed BAA between you (the vendor) and your customer (the covered entity or upstream business associate).
  2. A signed BAA between you and every downstream service that touches PHI — which in an LLM pipeline always includes the LLM provider itself, plus hosting, plus any database that stores request metadata.
  3. A contractually enforced zero-retention agreement with the LLM provider. "We don't train on it" is not sufficient. The retention clause governs whether PHI persists in provider logs, caches, or abuse-monitoring pipelines after your request completes.
  4. US-only data residency, verifiable at the configuration layer — not assumed from the provider's marketing copy.

Fail on any of the four and you're building on sand. The rest of this post is about how to get each one right in practice.

Which LLM API tiers support HIPAA compliance?

Only enterprise-tier LLM APIs with signed Business Associate Agreements support HIPAA-ready workflows. As of April 2026, this means OpenAI Enterprise / Scale Tier, Anthropic Claude Enterprise, Azure OpenAI with limited-access approval, AWS Bedrock (model-specific), and Google Vertex AI Enterprise. Consumer ChatGPT, default Claude API access, and Gemini Basic do not qualify.

The uncomfortable truth: the consumer tiers of ChatGPT, Claude, and Gemini are not HIPAA-ready. Neither is the default OpenAI API without enterprise enrollment. Neither is Anthropic's default API without a signed BAA.

Here's the lay of the land as of April 2026. Verify current terms with your counterparty before signing — these policies move.

OpenAI Enterprise / Scale Tier. BAA available. Zero-retention on completions when requested via the API (store: false). Abuse-monitoring retention has a separate, shorter window and a separate legal basis. Data residency is US by contract.

Anthropic Claude Enterprise. BAA available. Zero-retention on API traffic. Models run in customer-isolated regions. Data residency US (and EU regions for GDPR-adjacent work, though that's outside scope for a US-only diagnostic lab).

Azure OpenAI Service. BAA via Microsoft. Lets you pin region. Zero-retention requires approval through the limited-access program — it doesn't flip on automatically with signup.

AWS Bedrock. BAA via AWS. Model-level data-use policies differ. Claude on Bedrock inherits Anthropic's zero-retention stance; other models vary. Read the specific model's data-usage terms separately, not just the Bedrock BAA.

Google Vertex AI / Gemini Enterprise. BAA via Google Workspace or Vertex Enterprise. Zero-retention requires the no-data-training option; the default is 55 days of retention for abuse monitoring unless explicitly disabled.

The practical takeaway: no single "enterprise checkbox" makes a consumer LLM HIPAA-ready. You need to do four things explicitly:

  1. Sign the BAA with the provider.
  2. Confirm the specific retention terms for your tier and endpoint.
  3. Pass the correct flags on every API call to suppress retention.
  4. Build failover to a second BAA-backed provider for outage handling.

The flag in step 3 is not optional when PHI is in the prompt. Code sketch:

# OpenAI Enterprise (as of April 2026; verify current docs)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    store=False,  # suppress completion retention
    # BAA-backed accounts have model training disabled by default
)

# Anthropic Enterprise equivalent
response = anthropic.messages.create(
    model="claude-opus-4-7",
    messages=[...],
    # BAA-backed accounts are zero-retention by default on enterprise tier;
    # no per-call flag is required, but the BAA is what makes it enforceable.
)

If your code doesn't set store=False on every PHI-bearing OpenAI call, you are technically retaining. Audit your codepath.

What infrastructure BAAs does a HIPAA LLM pipeline need?

A HIPAA-ready LLM pipeline needs BAAs with every service that handles PHI: the hosting provider, the database and storage layer, the LLM provider, and any observability tool that ingests request content. UNMIRI's active BAA chain covers Vercel, AWS, and Anthropic.

The LLM is not the only service in the pipeline. Anywhere PHI touches, you need a BAA. For UNMIRI's stack:

  • Vercel — hosts the API edge on a HIPAA-covered tier. US regions pinned via edge-config. Do not run PHI workloads on non-covered tiers; they don't carry a BAA.
  • AWS — UNMIRI's architecture is built on AWS for the primary PHI path. RDS Postgres (Multi-AZ, encrypted at rest) holds structured clinical data, variant annotations, and audit logs. Encrypted S3 (SSE-KMS, access-logged, versioned) is the persistent document store for source NGS reports and generated outputs. A separate transient S3 bucket serves as Textract input and auto-deletes via an S3 Lifecycle rule after extraction completes. Textract handles PDF extraction. All AWS services operate under a single AWS BAA with us-east-1 pinned.
  • Anthropic — the one LLM in the pipeline. Used narrowly for extraction edge cases and long-tail variant fallback on Anthropic's HIPAA-ready API tier with a signed BAA. Anthropic does not train on customer inputs or outputs on that tier.
  • Self-managed Neo4j cluster — runs in UNMIRI's own US-pinned VPC. No external BAA needed because UNMIRI controls the infrastructure end-to-end.

Each is a named counterparty on UNMIRI's BAA chain, and each relationship has a documented BAA on file. Before a report is sent to a customer, the full chain is executed.

Notable omission: UNMIRI does not use OpenAI. Clinical-path reasoning happens in the knowledge graph, and the final 2-page cheat sheet is rendered by deterministic templates — no LLM in the output path. A single LLM provider (Anthropic) is sufficient for the narrow extraction role, and reducing the vendor surface reduces the BAA surface.

What does a HIPAA-ready LLM data flow look like?

A HIPAA-ready clinical AI data flow has six internal stages — authentication, normalization, knowledge-graph traversal, narrow LLM extraction (PHI-minimized), deterministic template rendering, and audit-logged response — with every external service boundary governed by an executed BAA. The diagram below shows UNMIRI's production pipeline.

Here's what the pipeline actually looks like. Every arrow crossing a service boundary is BAA-governed.

 ┌────────────────┐      ┌─────────────────────────────────┐
 │  Lab's LIMS    │      │          UNMIRI API             │
 │                │      │  ┌───────────────────────────┐  │
 │ (covered       │ POST │  │ 1. Auth + request log     │  │
 │  entity)       │─────▶│  │    (PHI-minimized log)    │  │
 │                │      │  └──────────────┬────────────┘  │
 │ BAA with       │      │                 ▼               │
 │ UNMIRI ─────┐  │      │  ┌───────────────────────────┐  │
 └────────────┼───┘      │  │ 2. Normalization          │  │
              │          │  │    (in-memory only)       │  │
              │          │  └──────────────┬────────────┘  │
              │          │                 ▼               │
              │          │  ┌───────────────────────────┐  │
              │          │  │ 3. Neo4j graph traversal  │  │
              │          │  │    (US-pinned VPC, ours)  │  │
              │          │  └──────────────┬────────────┘  │
              │          │                 ▼               │
              │          │  ┌───────────────────────────┐  │       ┌───────────────┐
              │          │  │ 4. Narrow LLM extraction  │──┼──────▶│ Anthropic     │
              │          │  │    (PHI-minimized prompt) │  │       │ HIPAA-ready   │
              │          │  │    edge-case + long-tail  │  │       │ BAA · 0-train │
              │          │  └──────────────┬────────────┘  │       └───────────────┘
              │          │                 ▼               │
              │          │  ┌───────────────────────────┐  │
              │          │  │ 5. Deterministic template │  │
              │          │  │    renders final 2-pager  │  │
              │          │  └──────────────┬────────────┘  │
              │          │                 ▼               │
              │          │  ┌───────────────────────────┐  │       ┌───────────────┐
              │          │  │ 6. Response + audit write │──┼──────▶│ AWS RDS + S3  │
              │          │  │    (PHI redacted from log)│  │       │ BAA · us-east-1│
              │          │  └──────────────┬────────────┘  │       └───────────────┘
              │ response │                 ▼               │
              └──────────┴─────────────────────────────────┘

Six stages inside our API. Three external counterparties on the diagram, each with an active BAA and zero-retention terms. No dotted lines, no handwaving.

How should PHI be handled inside an LLM pipeline?

PHI inside an LLM pipeline should be processed in-memory only (never persisted), logged with identifiers redacted, decoupled from your internal patient IDs, and stripped from LLM prompts at the payload layer — so the LLM call contains clinical concepts (variants, drugs, tiers) but no patient identifiers.

Once PHI is inside our infrastructure, the handling rules are conceptually simple and operationally strict.

In-memory processing. The raw NGS report, extracted variant profile, and intermediate computations exist only in request-scoped process memory. No disk writes. No cache persistence. When the request completes, the worker's memory is the last place the data existed on our infrastructure.

PHI-redacted audit logs. We log every request: timestamp, authenticated principal, endpoint, status, latency, insight ID, hash of the input payload. We do not log the variant profile, the source patient ID, or any free-text content. The audit log tells you that a report was processed and by whom, not what was in it. This is deliberate — an audit log containing PHI is itself a PHI surface area with its own BAA requirements.

Provenance separate from content. The insight_id we return to the lab is an opaque reference. The lab's own systems hold the mapping between insight_id and their patient record. We never need to know that mapping, and we don't store it.

Prompt-level PHI minimization. The prompt sent to the LLM is stripped to only what the formatter needs: variant nomenclature, drug, evidence tier, citations. Patient identifiers never reach the LLM call. This is belt-and-suspenders — the BAA would cover it anyway — but defense in depth matters when the consequence of a leak is a breach notification.

# Illustrative — prompt construction with PHI stripped before send
prompt_input = {
    "variant": graph_result["variant"],     # "EGFR L858R"
    "drug": graph_result["drug"],           # "Osimertinib"
    "evidence_tier": graph_result["tier"],  # "Level 1"
    "citations": graph_result["citations"], # ["OncoKB:EGFR-L858R", "FDA label", "FLAURA 2018"]
    # Not included: patient_id, name, DOB, MRN, report_date, source_file_id
}

If your prompt construction includes a patient identifier — even a hashed one — you're relying solely on your BAA with the LLM provider. Remove identifiers at the prompt layer and you have two independent controls.

What encryption, residency, and audit logging does HIPAA require for an LLM pipeline?

HIPAA expects TLS 1.3 encryption in transit, AES-256 at rest, US-only data residency pinned at configuration, and immutable audit logging covering authentication and PHI-access events with a seven-year retention window aligned to Breach Notification Rule requirements. These are the technical safeguards that the Security Rule operationalizes.

The easier parts, listed for completeness because they still need to hold.

Encryption in transit. TLS 1.3 on every external edge. Internal service-mesh traffic also TLS; no cleartext between services.

Encryption at rest. AES-256 everywhere PHI could conceivably land. That includes AWS RDS encrypted volumes, AWS S3 buckets (SSE-KMS), Neo4j cluster volumes, and any temporary processing queue. Key management through AWS KMS with customer-managed keys. Rotation schedule documented and audited.

US-only data residency. Vercel: US region pinned via config. AWS: us-east-1 pinned across RDS, S3, and Textract. Anthropic HIPAA-ready API: US inference zones. This is a configuration detail that gets verified on every new environment we spin up. We do not assume; we check.

Audit logging. Every PHI-relevant event — authentication, report ingestion, graph query, LLM call, response delivery — is logged with principal, timestamp, correlation ID, and result code. Retention is seven years to align with HIPAA breach notification requirements. Logs are append-only and stored separately from the application database. Access to audit logs is itself logged.

How do you evaluate an AI vendor's HIPAA compliance?

Evaluate an AI vendor's HIPAA posture with ten specific questions: executed BAA on demand, named LLM provider and tier, zero-retention terms in writing, data-flow diagram, BAA chain including hosting and database, US residency verification, audit-log content and exclusions, outage failover also BAA-backed, breach-notification SLA, and SOC-2 status with evidence.

If you're a lab CTO evaluating AI vendors, or an engineer building one, this is the checklist I would hand to procurement. Copy it into your vendor questionnaire. Ten questions.

  1. Can you produce an executed BAA — not a template — on request?
  2. Can you name the specific LLM provider and tier you use, and produce that provider's BAA?
  3. Can you show the zero-retention term in writing, including the separate abuse-monitoring retention policy?
  4. Can you produce a data-flow diagram labeling every service that touches PHI, with BAA status per hop?
  5. Does your BAA chain cover hosting (Vercel / AWS / GCP) and database + storage (AWS RDS + S3 / Supabase / Cloud SQL)?
  6. Is data residency US-only, pinned at configuration, and verifiable in your audit logs?
  7. What does your request-level audit log contain — and what does it specifically not log?
  8. How do you handle LLM provider outages? Is the failover also BAA-backed?
  9. What is your incident-response SLA for a suspected PHI breach? Who notifies whom, in what window?
  10. What is on your SOC-2 roadmap? If you claim SOC-2 today, provide the Type II report; don't accept "we're compliant" as shorthand.

If a vendor hedges on more than one of these, walk. The ones who can answer all ten have already done the work. The ones who can't are hoping you won't ask.

Does UNMIRI have SOC-2 Type II compliance?

No — UNMIRI's SOC-2 Type II audit is on the roadmap with a target completion of Q4 2026. The underlying controls are already in place, but the Type II report validating those controls has not yet been issued. We label this honestly because procurement teams should know the difference between implemented controls and externally audited controls.

UNMIRI's SOC-2 Type II audit is on the roadmap for Q4 2026. We're not SOC-2 Type II compliant today. We say that clearly on our security page and in this post because I would rather lose a deal over the timing than win one on a misrepresentation.

What's already in place: BAA-backed infrastructure, zero-retention LLM agreements, US-only residency, audit logging, encryption at rest and in transit, access controls. The Type II audit is a validation of controls we've already built — not a gate we're pretending we've already passed.

If your procurement team requires Type II before pilot, we're not the right fit yet. If your team will accept the interim security package (BAA, data-flow diagram, zero-retention attestations, Type II roadmap with dates), we can be ready for a pilot this quarter. Both answers are honest.

What does "HIPAA-ready" mean for an AI vendor?

"HIPAA-ready" means the architecture, contracts, and operational controls required to handle PHI are in place, documented, and executable today — but it is not a formal certification (HHS does not certify AI vendors). The term is deliberately narrower than "HIPAA-compliant" because compliance is a continuously-demonstrated property, not a point-in-time attestation.

The term I use is HIPAA-ready, and I'm deliberate about what that word is doing.

HIPAA-certified isn't a real status — HHS doesn't certify AI vendors. HIPAA-compliant is colloquially fine but typically means "we've implemented controls we believe meet the Security and Privacy Rules," which is itself a claim that needs substantiation. Ready means: the architecture, contracts, and operational controls to handle PHI are in place, documented, and executable today. That's the claim I'm comfortable defending.

If you're building one of these pipelines, use this post as a starting framework, verify every provider-specific term against the current documentation, and get your BAAs executed before the first byte of PHI crosses the wire. That order matters.

Questions, corrections, or counterexamples: email me at pilots@unmiri.com. If your lab is evaluating UNMIRI specifically, the Book a Pilot form is the fastest path in.

Frequently asked questions

Is ChatGPT HIPAA-compliant?
No — the consumer ChatGPT tier does not come with a Business Associate Agreement and may retain prompts and completions for service improvement. Only the OpenAI Enterprise tier supports a signed BAA with zero-retention terms, and even then every API request must explicitly set store: false to suppress retention of PHI-bearing content.
What is a Business Associate Agreement (BAA) for LLM APIs?
A BAA is a HIPAA-required contract between a covered entity (or upstream business associate) and any downstream vendor that handles PHI. For LLM APIs, the BAA must cover zero retention of requests and responses, no use of data for model training, and documented security controls at the provider level.
Does zero-retention mean the LLM provider never sees the data?
No. The provider processes each request to generate a response. Zero-retention means the request and response are not persisted in provider storage after the response is returned, and are not used for model training. Transient in-memory processing during response generation is still required and is covered by the BAA.
Can a diagnostic lab start an AI pilot before SOC-2 Type II is complete?
Yes, if the lab's procurement policy allows risk-based vendor acceptance. Most labs accept an interim security package — executed BAAs, documented controls, zero-retention attestations, and a credible SOC-2 roadmap — under a pilot agreement. If procurement strictly requires Type II before engagement, wait for the audit to complete.
UK

Umair Khan

Founder, UNMIRI

Building UNMIRI — a GraphRAG-based NGS interpretation engine for regional diagnostic labs. Previously: software engineer working on data-intensive systems. Writing here on architecture, clinical data, and HIPAA-ready AI.

Clinical advisor: UNMIRI is advised by a practicing oncologist with experience in molecular tumor boards at a regional cancer center. Full advisor profile coming with our About page.

Related posts

Want to see this architecture in action?

UNMIRI is recruiting regional diagnostic labs in PA and NJ for Q2 2026 pilot integrations. Applications open.