Healthcare coordination is one of the most promising — and most regulated — applications for AI agents. A patient transitioning from hospital to home care requires coordination between multiple specialists, scheduling follow-ups, verifying medication compatibility, and flagging risk factors. Today, this is done manually through phone calls and faxes.
In this tutorial, we build a care coordination workflow using LangGraph, Google’s Gemma 3 1B-IT model, and tool calling. The result is a stateful graph that routes patient cases through triage, specialist assignment, and follow-up scheduling — with human-in-the-loop checkpoints for high-risk decisions.
Why LangGraph for Healthcare
LangGraph’s graph-based orchestration is a natural fit for clinical workflows because:
- Explicit control flow. Every decision point is a named node in the graph. You can audit exactly what happened and why.
- Interrupt nodes. High-risk actions (prescriptions, referrals) can pause for human review before execution.
- State persistence. The workflow state is checkpointed at every step — if the system crashes, it resumes from the last checkpoint, not from scratch.
- Deterministic routing. Unlike chain-of-thought prompting, the graph structure ensures certain paths are always followed regardless of LLM output.
The Workflow
Our care coordination workflow has 4 nodes:
[Intake] → [Triage] → [Assign Specialist] → [Schedule Follow-up]
↓ (high-risk)
[Human Review]
Node 1: Intake
Parses the patient referral and extracts structured data.
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Literal
class PatientState(TypedDict):
referral_text: str
patient_id: str
diagnosis: str
risk_level: Literal["low", "medium", "high"]
assigned_specialist: str
follow_up_date: str
notes: list[str]
def intake_node(state: PatientState) -> PatientState:
"""Extract structured patient data from referral text."""
response = llm.invoke(
f"Extract diagnosis and risk level from this referral:\n{state['referral_text']}"
)
parsed = parse_intake_response(response)
return {
**state,
"diagnosis": parsed["diagnosis"],
"risk_level": parsed["risk_level"],
"notes": [f"Intake processed: {parsed['diagnosis']}"],
}
Node 2: Triage
Routes the patient based on risk level. High-risk cases go to human review.
def triage_router(state: PatientState) -> str:
"""Route based on risk level."""
if state["risk_level"] == "high":
return "human_review"
return "assign_specialist"
This is a conditional edge in LangGraph — pure Python logic, no LLM involved. The routing is deterministic.
Node 3: Specialist Assignment (with Tool Calling)
This is where tool calling comes in. The agent queries an external scheduling system to find available specialists.
from langchain_core.tools import tool
@tool
def find_available_specialist(
specialty: str,
location: str,
within_days: int = 7
) -> dict:
"""Find an available specialist in the scheduling system."""
# In production: API call to hospital scheduling system
return {
"specialist_name": "Dr. Martin",
"specialty": specialty,
"next_available": "2025-04-15",
"location": location,
}
@tool
def check_medication_interactions(
current_medications: list[str],
proposed_treatment: str
) -> dict:
"""Check for medication interactions."""
# In production: API call to drug interaction database
return {
"interactions_found": False,
"safe_to_proceed": True,
}
def assign_specialist_node(state: PatientState) -> PatientState:
"""Use tool calling to find and assign a specialist."""
llm_with_tools = llm.bind_tools([
find_available_specialist,
check_medication_interactions,
])
response = llm_with_tools.invoke(
f"Find a specialist for {state['diagnosis']} near the patient's location."
)
# Process tool calls from the response
specialist = process_tool_response(response)
return {
**state,
"assigned_specialist": specialist["specialist_name"],
"notes": state["notes"] + [f"Assigned to {specialist['specialist_name']}"],
}
Node 4: Human Review (Interrupt)
For high-risk patients, the workflow pauses and waits for a clinician’s approval.
from langgraph.types import interrupt
def human_review_node(state: PatientState) -> PatientState:
"""Pause for human review of high-risk cases."""
decision = interrupt(
{
"message": f"High-risk patient: {state['diagnosis']}",
"patient_id": state["patient_id"],
"proposed_action": f"Assign to specialist for {state['diagnosis']}",
"options": ["approve", "modify", "escalate"],
}
)
return {
**state,
"notes": state["notes"] + [f"Human review: {decision}"],
}
Assembling the Graph
workflow = StateGraph(PatientState)
# Add nodes
workflow.add_node("intake", intake_node)
workflow.add_node("triage", triage_router)
workflow.add_node("human_review", human_review_node)
workflow.add_node("assign_specialist", assign_specialist_node)
workflow.add_node("schedule_followup", schedule_followup_node)
# Add edges
workflow.add_edge(START, "intake")
workflow.add_edge("intake", "triage")
workflow.add_conditional_edges(
"triage",
triage_router,
{"human_review": "human_review", "assign_specialist": "assign_specialist"},
)
workflow.add_edge("human_review", "assign_specialist")
workflow.add_edge("assign_specialist", "schedule_followup")
workflow.add_edge("schedule_followup", END)
# Compile with checkpointer for state persistence
from langgraph.checkpoint.memory import MemorySaver
app = workflow.compile(checkpointer=MemorySaver())
Why Gemma 3 1B-IT
We use Google’s Gemma 3 1B-IT model for this workflow. Why a small model?
- Latency. Healthcare workflows need fast responses. A 1B parameter model runs in < 200ms on a single GPU.
- Cost. At scale (1000s of patients/day), inference costs matter. A small model is 10-50x cheaper than GPT-4 or Gemini Pro.
- Self-hosted. For healthcare data, keeping everything on-premise or in your VPC is often a regulatory requirement. Gemma 3 runs locally.
- Sufficient for structured tasks. Intake parsing, triage classification, and tool selection don’t require frontier model reasoning. A well-prompted 1B model handles these reliably.
For complex medical reasoning (differential diagnosis, treatment planning), you’d route to a larger model. The graph structure makes this easy — different nodes can use different models.
Governance Considerations
This workflow handles sensitive health data. Before deploying to production:
- Audit tool calls with
diplomat-agent— ensurefind_available_specialistandcheck_medication_interactionshave proper input validation and rate limits. - Log everything — every LLM call, tool invocation, and human decision must be traceable (HDS/HIPAA compliance).
- Test the interrupt — verify that high-risk cases actually pause for human review and don’t accidentally bypass the checkpoint.
- Validate outputs — add guardrails on specialist assignment (is this specialist actually qualified for this diagnosis?).
Next Steps
The full source code is available on GitHub. To adapt this for your use case:
- Replace mock tools with your actual hospital APIs
- Adjust triage thresholds for your clinical protocols
- Add more nodes (insurance verification, pharmacy notification)
- Deploy on Cloud Run with a VPC Connector to your healthcare infrastructure
Healthcare is one of the highest-stakes domains for AI agents. The graph-based approach gives you the control and auditability that the regulatory environment demands — without sacrificing the flexibility that makes agents useful.