Build a Care Coordination Workflow with LangGraph, Tool Calling and Gemma 3

Healthcare coordination is one of the most promising — and most regulated — applications for AI agents. A patient transitioning from hospital to home care requires coordination between multiple specialists, scheduling follow-ups, verifying medication compatibility, and flagging risk factors. Today, this is done manually through phone calls and faxes.

In this tutorial, we build a care coordination workflow using LangGraph, Google’s Gemma 3 1B-IT model, and tool calling. The result is a stateful graph that routes patient cases through triage, specialist assignment, and follow-up scheduling — with human-in-the-loop checkpoints for high-risk decisions.

Why LangGraph for Healthcare

LangGraph’s graph-based orchestration is a natural fit for clinical workflows because:

Explicit control flow. Every decision point is a named node in the graph. You can audit exactly what happened and why.
Interrupt nodes. High-risk actions (prescriptions, referrals) can pause for human review before execution.
State persistence. The workflow state is checkpointed at every step — if the system crashes, it resumes from the last checkpoint, not from scratch.
Deterministic routing. Unlike chain-of-thought prompting, the graph structure ensures certain paths are always followed regardless of LLM output.

The Workflow

Our care coordination workflow has 4 nodes:

[Intake] → [Triage] → [Assign Specialist] → [Schedule Follow-up]
                ↓ (high-risk)
          [Human Review]

Node 1: Intake

Parses the patient referral and extracts structured data.

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Literal

class PatientState(TypedDict):
    referral_text: str
    patient_id: str
    diagnosis: str
    risk_level: Literal["low", "medium", "high"]
    assigned_specialist: str
    follow_up_date: str
    notes: list[str]

def intake_node(state: PatientState) -> PatientState:
    """Extract structured patient data from referral text."""
    response = llm.invoke(
        f"Extract diagnosis and risk level from this referral:\n{state['referral_text']}"
    )
    parsed = parse_intake_response(response)
    return {
        **state,
        "diagnosis": parsed["diagnosis"],
        "risk_level": parsed["risk_level"],
        "notes": [f"Intake processed: {parsed['diagnosis']}"],
    }

Node 2: Triage

Routes the patient based on risk level. High-risk cases go to human review.

def triage_router(state: PatientState) -> str:
    """Route based on risk level."""
    if state["risk_level"] == "high":
        return "human_review"
    return "assign_specialist"

This is a conditional edge in LangGraph — pure Python logic, no LLM involved. The routing is deterministic.

Node 3: Specialist Assignment (with Tool Calling)

This is where tool calling comes in. The agent queries an external scheduling system to find available specialists.

from langchain_core.tools import tool

@tool
def find_available_specialist(
    specialty: str,
    location: str,
    within_days: int = 7
) -> dict:
    """Find an available specialist in the scheduling system."""
    # In production: API call to hospital scheduling system
    return {
        "specialist_name": "Dr. Martin",
        "specialty": specialty,
        "next_available": "2025-04-15",
        "location": location,
    }

@tool
def check_medication_interactions(
    current_medications: list[str],
    proposed_treatment: str
) -> dict:
    """Check for medication interactions."""
    # In production: API call to drug interaction database
    return {
        "interactions_found": False,
        "safe_to_proceed": True,
    }

def assign_specialist_node(state: PatientState) -> PatientState:
    """Use tool calling to find and assign a specialist."""
    llm_with_tools = llm.bind_tools([
        find_available_specialist,
        check_medication_interactions,
    ])
    response = llm_with_tools.invoke(
        f"Find a specialist for {state['diagnosis']} near the patient's location."
    )
    # Process tool calls from the response
    specialist = process_tool_response(response)
    return {
        **state,
        "assigned_specialist": specialist["specialist_name"],
        "notes": state["notes"] + [f"Assigned to {specialist['specialist_name']}"],
    }

Node 4: Human Review (Interrupt)

For high-risk patients, the workflow pauses and waits for a clinician’s approval.

from langgraph.types import interrupt

def human_review_node(state: PatientState) -> PatientState:
    """Pause for human review of high-risk cases."""
    decision = interrupt(
        {
            "message": f"High-risk patient: {state['diagnosis']}",
            "patient_id": state["patient_id"],
            "proposed_action": f"Assign to specialist for {state['diagnosis']}",
            "options": ["approve", "modify", "escalate"],
        }
    )
    return {
        **state,
        "notes": state["notes"] + [f"Human review: {decision}"],
    }

Assembling the Graph

workflow = StateGraph(PatientState)

# Add nodes
workflow.add_node("intake", intake_node)
workflow.add_node("triage", triage_router)
workflow.add_node("human_review", human_review_node)
workflow.add_node("assign_specialist", assign_specialist_node)
workflow.add_node("schedule_followup", schedule_followup_node)

# Add edges
workflow.add_edge(START, "intake")
workflow.add_edge("intake", "triage")
workflow.add_conditional_edges(
    "triage",
    triage_router,
    {"human_review": "human_review", "assign_specialist": "assign_specialist"},
)
workflow.add_edge("human_review", "assign_specialist")
workflow.add_edge("assign_specialist", "schedule_followup")
workflow.add_edge("schedule_followup", END)

# Compile with checkpointer for state persistence
from langgraph.checkpoint.memory import MemorySaver
app = workflow.compile(checkpointer=MemorySaver())

Why Gemma 3 1B-IT

We use Google’s Gemma 3 1B-IT model for this workflow. Why a small model?

Latency. Healthcare workflows need fast responses. A 1B parameter model runs in < 200ms on a single GPU.
Cost. At scale (1000s of patients/day), inference costs matter. A small model is 10-50x cheaper than GPT-4 or Gemini Pro.
Self-hosted. For healthcare data, keeping everything on-premise or in your VPC is often a regulatory requirement. Gemma 3 runs locally.
Sufficient for structured tasks. Intake parsing, triage classification, and tool selection don’t require frontier model reasoning. A well-prompted 1B model handles these reliably.

For complex medical reasoning (differential diagnosis, treatment planning), you’d route to a larger model. The graph structure makes this easy — different nodes can use different models.

Governance Considerations

This workflow handles sensitive health data. Before deploying to production:

Audit tool calls with diplomat-agent — ensure find_available_specialist and check_medication_interactions have proper input validation and rate limits.
Log everything — every LLM call, tool invocation, and human decision must be traceable (HDS/HIPAA compliance).
Test the interrupt — verify that high-risk cases actually pause for human review and don’t accidentally bypass the checkpoint.
Validate outputs — add guardrails on specialist assignment (is this specialist actually qualified for this diagnosis?).

Next Steps

The full source code is available on GitHub. To adapt this for your use case:

Replace mock tools with your actual hospital APIs
Adjust triage thresholds for your clinical protocols
Add more nodes (insurance verification, pharmacy notification)
Deploy on Cloud Run with a VPC Connector to your healthcare infrastructure

Healthcare is one of the highest-stakes domains for AI agents. The graph-based approach gives you the control and auditability that the regulatory environment demands — without sacrificing the flexibility that makes agents useful.