Human-in-the-Loop Invoice Approval with the OpenAI Agents API

Relayna Team • April 05, 2026 • 5 min read

In our LangGraph tutorial we built a human-in-the-loop invoice workflow using a hardcoded graph: each step was explicitly wired, and the LLM only filled in data at fixed points.

This tutorial takes a different approach. Instead of a graph, we give GPT-4o a set of tools and a task description, and let the model decide what to call, in what order, and when it’s done. The result is a true agent — one that can make autonomous decisions like auto-approving small invoices, while routing larger ones through Relayna for human review.

The full source is on GitHub: github.com/redlin/relayna-examples

Workflow vs. Agent

The key difference is where the business logic lives:

	LangGraph Workflow	OpenAI Agent
Execution order	Hardcoded graph edges	LLM decides
Branching logic	Conditional edge functions	System prompt rules
LLM role	Data extraction only	Orchestration + decision-making
Auto-approve	Explicit graph node	Instruction in system prompt

Neither is better in all cases. Workflows are more predictable and auditable. Agents are more flexible and require less upfront design. For invoice processing — where the rules are clear but exceptions exist — the agent approach works well.

The Agent Loop

The core is a simple while loop around chat.completions.create. The model receives the full tool history and decides whether to call another tool or return a final text response.

from openai import OpenAI

client = OpenAI()
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"Process the invoice at: {invoice_path}"},
]
context = {}  # shared mutable state between tool calls

while True:
    response = client.chat.completions.create(
        model="gpt-4o",
        tools=TOOLS,
        messages=messages,
    )
    choice = response.choices[0]
    messages.append(choice.message)

    # No tool calls → agent is done
    if choice.finish_reason == "stop" or not choice.message.tool_calls:
        return choice.message.content

    # Execute tools and feed results back into the message history
    for tool_call in choice.message.tool_calls:
        result = execute_tool(
            name=tool_call.function.name,
            arguments_json=tool_call.function.arguments,
            context=context,
        )
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

The System Prompt as Business Logic

Instead of conditional edges in a graph, the agent’s decision rules live in the system prompt. This includes the auto-approve threshold — invoices below it are approved without human review:

def build_system_prompt(max_revisions: int, review_threshold: float) -> str:
    return f"""\
You are an invoice review agent that processes PDF invoices through a human approval workflow.

AUTO-APPROVE RULE:
If the invoice total is ${review_threshold:,.2f} or less, you may approve it automatically
without human review — skip the upload and checkpoint steps entirely.
If the total exceeds ${review_threshold:,.2f}, you must route it through human review.

YOUR RESPONSIBILITIES:
1. Call extract_pdf_text to read the invoice and determine the total amount.
2. If total <= ${review_threshold:,.2f}: auto-approve and stop.
3. If total > ${review_threshold:,.2f}:
   a. Call upload_invoice_pdf to store the PDF securely.
   b. Call create_review_checkpoint with specific instructions based on what you read.
   c. Call poll_checkpoint_status to wait for the human's decision.
   d. On needs_changes: apply corrections and create a new checkpoint (max {max_revisions} revisions).
   e. On approved/rejected/expired: report the outcome and stop.
"""

The Tools

Five tools cover the full invoice lifecycle:

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "extract_pdf_text",
            "description": "Extract raw text from a PDF invoice. Call this first.",
            "parameters": {
                "type": "object",
                "properties": {
                    "pdf_path": {"type": "string", "description": "Absolute path to the PDF."},
                },
                "required": ["pdf_path"],
            },
        },
    },
    # upload_invoice_pdf — uploads to Relayna asset storage, returns asset_id
    # create_review_checkpoint — creates magic-link review page
    # poll_checkpoint_status — blocks until human decides
    # cancel_checkpoint — cancels a pending review
]

Each tool has a corresponding executor function. The executors receive a shared context dict — a critical detail explained below.

The `context` Dict: Working Around LLM Drift

When GPT-4o calls upload_invoice_pdf, it gets back an asset_id UUID. It must then pass that exact UUID to create_review_checkpoint. In practice, models sometimes substitute a placeholder value or slightly misremember the UUID from their context window.

We avoid this by storing the asset_id in a context dict on the Python side during upload, then reading it back in the checkpoint executor regardless of what the LLM passes:

def execute_upload_invoice_pdf(args: dict, context: dict) -> dict:
    asset_id = client.upload_asset(file_path=args["pdf_path"])
    context["asset_id"] = asset_id  # store on Python side
    return {"asset_id": asset_id}

def execute_create_review_checkpoint(args: dict, context: dict) -> dict:
    # Trust our stored value over what the LLM passed
    asset_id = context.get("asset_id") or args.get("asset_id")
    ...

This pattern — using a side-channel dict to propagate values between tools reliably — is worth keeping in any multi-step agent that passes data between tool calls.

Calling the Relayna API

The Relayna client is identical to the one in the LangGraph example: a thin httpx wrapper with trust_env=False to prevent proxy interference with localhost traffic.

import httpx

class RelaynaClient:
    def __init__(self, base_url: str, api_key: str):
        self._http = httpx.Client(
            headers={"Authorization": f"Bearer {api_key}", "Accept": "application/json"},
            trust_env=False,
            timeout=120.0,
        )

    def upload_asset(self, file_path: str) -> str:
        with open(file_path, "rb") as f:
            r = self._http.post(
                f"{self.base_url}/api/assets/upload",
                files={"file": (Path(file_path).name, f, "application/pdf")},
                data={"purpose": "invoice", "ttl_seconds": "86400"},
            )
        r.raise_for_status()
        return r.json()["asset"]["id"]

    def create_checkpoint(self, title, instructions, summary, items, **kwargs):
        r = self._http.post(f"{self.base_url}/api/checkpoints",
                            json={"title": title, "instructions": instructions,
                                  "summary": summary, "items": items, **kwargs})
        r.raise_for_status()
        data = r.json()
        return data["checkpoint"]["id"], data["review_url"]

What the Agent Generates

Because the system prompt instructs the agent to write specific instructions based on what it actually read from the invoice, each checkpoint is tailored to that invoice — not a generic template. For a $12,538.80 invoice from Acme Consulting, the agent might write:

“Please verify the invoice from Acme Consulting Ltd (INV-2026-0042) for USD 12,538.80 due April 30. Check that the 4 line items — consulting hours, LangGraph design, documentation, and integration setup — match your purchase orders. Confirm the tax amount of $927.32 is correct at 8%.”

Compare this to the LangGraph approach, where instructions are a static template filled in with extracted values. The agent version reads the actual invoice content and writes instructions specific to what it found.

Running It

git clone https://github.com/redlin/relayna-examples.git
cd relayna-examples/openai-agent-invoice-review

uv venv && uv pip install -e .
source .venv/bin/activate

cp .env.example .env
# Fill in RELAYNA_API_KEY and OPENAI_API_KEY

python main.py --invoice invoice.pdf
# With auto-approve threshold (invoices ≤ $500 skip human review):
python main.py --invoice invoice.pdf --threshold 500

When to Use Each Approach

Use the OpenAI agent approach when:

Your rules can be expressed in natural language and may evolve
You want the LLM to write context-specific review instructions
Flexibility matters more than strict auditability

Use the LangGraph workflow approach when:

You need a clear, auditable execution path
The workflow has complex conditional logic that’s hard to prompt-engineer
You want LangGraph’s built-in state persistence and checkpointing

Both connect to Relayna the same way — the REST API is identical. The difference is purely in how you orchestrate the agent around it.

Try It Yourself

The complete example is at github.com/redlin/relayna-examples. The repo also includes the LangGraph version if you want to compare the two approaches side by side.