Human-in-the-Loop Invoice Approval with the OpenAI Agents API
In our LangGraph tutorial we built a human-in-the-loop invoice workflow using a hardcoded graph: each step was explicitly wired, and the LLM only filled in data at fixed points.
This tutorial takes a different approach. Instead of a graph, we give GPT-4o a set of tools and a task description, and let the model decide what to call, in what order, and when it’s done. The result is a true agent — one that can make autonomous decisions like auto-approving small invoices, while routing larger ones through Relayna for human review.
The full source is on GitHub: github.com/redlin/relayna-examples
Workflow vs. Agent
The key difference is where the business logic lives:
| LangGraph Workflow | OpenAI Agent | |
|---|---|---|
| Execution order | Hardcoded graph edges | LLM decides |
| Branching logic | Conditional edge functions | System prompt rules |
| LLM role | Data extraction only | Orchestration + decision-making |
| Auto-approve | Explicit graph node | Instruction in system prompt |
Neither is better in all cases. Workflows are more predictable and auditable. Agents are more flexible and require less upfront design. For invoice processing — where the rules are clear but exceptions exist — the agent approach works well.
The Agent Loop
The core is a simple while loop around chat.completions.create. The model receives the full tool history and decides whether to call another tool or return a final text response.
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Process the invoice at: {invoice_path}"},
]
context = {} # shared mutable state between tool calls
while True:
response = client.chat.completions.create(
model="gpt-4o",
tools=TOOLS,
messages=messages,
)
choice = response.choices[0]
messages.append(choice.message)
# No tool calls → agent is done
if choice.finish_reason == "stop" or not choice.message.tool_calls:
return choice.message.content
# Execute tools and feed results back into the message history
for tool_call in choice.message.tool_calls:
result = execute_tool(
name=tool_call.function.name,
arguments_json=tool_call.function.arguments,
context=context,
)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
})
The System Prompt as Business Logic
Instead of conditional edges in a graph, the agent’s decision rules live in the system prompt. This includes the auto-approve threshold — invoices below it are approved without human review:
def build_system_prompt(max_revisions: int, review_threshold: float) -> str:
return f"""\
You are an invoice review agent that processes PDF invoices through a human approval workflow.
AUTO-APPROVE RULE:
If the invoice total is ${review_threshold:,.2f} or less, you may approve it automatically
without human review — skip the upload and checkpoint steps entirely.
If the total exceeds ${review_threshold:,.2f}, you must route it through human review.
YOUR RESPONSIBILITIES:
1. Call extract_pdf_text to read the invoice and determine the total amount.
2. If total <= ${review_threshold:,.2f}: auto-approve and stop.
3. If total > ${review_threshold:,.2f}:
a. Call upload_invoice_pdf to store the PDF securely.
b. Call create_review_checkpoint with specific instructions based on what you read.
c. Call poll_checkpoint_status to wait for the human's decision.
d. On needs_changes: apply corrections and create a new checkpoint (max {max_revisions} revisions).
e. On approved/rejected/expired: report the outcome and stop.
"""
The Tools
Five tools cover the full invoice lifecycle:
TOOLS = [
{
"type": "function",
"function": {
"name": "extract_pdf_text",
"description": "Extract raw text from a PDF invoice. Call this first.",
"parameters": {
"type": "object",
"properties": {
"pdf_path": {"type": "string", "description": "Absolute path to the PDF."},
},
"required": ["pdf_path"],
},
},
},
# upload_invoice_pdf — uploads to Relayna asset storage, returns asset_id
# create_review_checkpoint — creates magic-link review page
# poll_checkpoint_status — blocks until human decides
# cancel_checkpoint — cancels a pending review
]
Each tool has a corresponding executor function. The executors receive a shared context dict — a critical detail explained below.
The context Dict: Working Around LLM Drift
When GPT-4o calls upload_invoice_pdf, it gets back an asset_id UUID. It must then pass that exact UUID to create_review_checkpoint. In practice, models sometimes substitute a placeholder value or slightly misremember the UUID from their context window.
We avoid this by storing the asset_id in a context dict on the Python side during upload, then reading it back in the checkpoint executor regardless of what the LLM passes:
def execute_upload_invoice_pdf(args: dict, context: dict) -> dict:
asset_id = client.upload_asset(file_path=args["pdf_path"])
context["asset_id"] = asset_id # store on Python side
return {"asset_id": asset_id}
def execute_create_review_checkpoint(args: dict, context: dict) -> dict:
# Trust our stored value over what the LLM passed
asset_id = context.get("asset_id") or args.get("asset_id")
...
This pattern — using a side-channel dict to propagate values between tools reliably — is worth keeping in any multi-step agent that passes data between tool calls.
Calling the Relayna API
The Relayna client is identical to the one in the LangGraph example: a thin httpx wrapper with trust_env=False to prevent proxy interference with localhost traffic.
import httpx
class RelaynaClient:
def __init__(self, base_url: str, api_key: str):
self._http = httpx.Client(
headers={"Authorization": f"Bearer {api_key}", "Accept": "application/json"},
trust_env=False,
timeout=120.0,
)
def upload_asset(self, file_path: str) -> str:
with open(file_path, "rb") as f:
r = self._http.post(
f"{self.base_url}/api/assets/upload",
files={"file": (Path(file_path).name, f, "application/pdf")},
data={"purpose": "invoice", "ttl_seconds": "86400"},
)
r.raise_for_status()
return r.json()["asset"]["id"]
def create_checkpoint(self, title, instructions, summary, items, **kwargs):
r = self._http.post(f"{self.base_url}/api/checkpoints",
json={"title": title, "instructions": instructions,
"summary": summary, "items": items, **kwargs})
r.raise_for_status()
data = r.json()
return data["checkpoint"]["id"], data["review_url"]
What the Agent Generates
Because the system prompt instructs the agent to write specific instructions based on what it actually read from the invoice, each checkpoint is tailored to that invoice — not a generic template. For a $12,538.80 invoice from Acme Consulting, the agent might write:
“Please verify the invoice from Acme Consulting Ltd (INV-2026-0042) for USD 12,538.80 due April 30. Check that the 4 line items — consulting hours, LangGraph design, documentation, and integration setup — match your purchase orders. Confirm the tax amount of $927.32 is correct at 8%.”
Compare this to the LangGraph approach, where instructions are a static template filled in with extracted values. The agent version reads the actual invoice content and writes instructions specific to what it found.
Running It
git clone https://github.com/redlin/relayna-examples.git
cd relayna-examples/openai-agent-invoice-review
uv venv && uv pip install -e .
source .venv/bin/activate
cp .env.example .env
# Fill in RELAYNA_API_KEY and OPENAI_API_KEY
python main.py --invoice invoice.pdf
# With auto-approve threshold (invoices ≤ $500 skip human review):
python main.py --invoice invoice.pdf --threshold 500
When to Use Each Approach
Use the OpenAI agent approach when:
- Your rules can be expressed in natural language and may evolve
- You want the LLM to write context-specific review instructions
- Flexibility matters more than strict auditability
Use the LangGraph workflow approach when:
- You need a clear, auditable execution path
- The workflow has complex conditional logic that’s hard to prompt-engineer
- You want LangGraph’s built-in state persistence and checkpointing
Both connect to Relayna the same way — the REST API is identical. The difference is purely in how you orchestrate the agent around it.
Try It Yourself
The complete example is at github.com/redlin/relayna-examples. The repo also includes the LangGraph version if you want to compare the two approaches side by side.