Back to blog
AI Agents

Why We Don't Use OpenAI's API for Client Agents (And What We Use Instead)

··8 min read

Local LLM AI agent privacy protects your data. We built 47 client agents without OpenAI's API. Here's why and how you can too.

We've built 47 AI agents for clients in the past 18 months. Not one of them uses OpenAI's API anymore.

This isn't a statement about OpenAI's quality. Their models are excellent. But when you're processing customer data, financial records, or proprietary information through AI agents, the question isn't just about performance. It's about where that data goes and who controls it.

Here's what we learned after moving every client agent to local LLM infrastructure, the specific problems we solved, and the n8n workflows that make it practical.

The Privacy Problem with API-Based AI Agents

When you send data to OpenAI's API, you're transmitting information to external servers. OpenAI has strong privacy policies, but the fundamental architecture requires your data to leave your infrastructure.

For most businesses, this creates three specific problems:

Data residency requirements. We work with NHS trusts, financial advisors, and legal firms. Many have explicit requirements that client data cannot leave UK servers or their own infrastructure. API calls to US-based services violate these requirements immediately.

Audit trail complexity. When data flows through external APIs, your audit trail includes third-party services. One client needed to demonstrate data handling for ISO 27001 certification. Every external API added 40-60 hours of documentation work.

Rate limiting during critical operations. API-based agents fail predictably during high-volume operations. We had an agent processing 3,200 customer enquiries during a product launch. OpenAI's rate limits kicked in at enquiry 847. The agent stopped for 47 minutes. That's not acceptable for business-critical automation.

What Local LLM AI Agent Privacy Actually Means

Local LLM deployment means running the model on infrastructure you control. The data never leaves your servers or your chosen hosting environment.

We deploy local LLMs in three configurations depending on client needs:

On-premises servers. For clients with strict data requirements, we install models on their existing hardware. A legal firm runs Llama 3.1 70B on their server rack. Total setup time: 6 hours. They process 1,200 documents monthly with zero data leaving their building.

UK-based cloud instances. We provision dedicated instances on UK cloud providers. A financial advisor runs Mistral 7B on a dedicated Hetzner instance in their Falkenstein datacenter. Monthly cost: £89. All data stays within UK jurisdiction.

Private VPC deployments. For clients using AWS or Azure, we deploy models within their existing VPC. A healthcare client runs Llama 3.2 in their AWS VPC. The model accesses their patient database directly without external API calls.

The key difference: You control where the compute happens and where the data flows.

The Cost Mathematics of Local vs API LLMs

OpenAI's pricing seems reasonable until you run an agent at scale.

We built a customer service agent for an e-commerce client. The agent handles pre-sale questions, order tracking, and returns. Volume: roughly 850 conversations daily.

OpenAI API costs for this agent:

  • Average conversation: 12 exchanges
  • Average tokens per conversation: 3,400 tokens
  • Daily token usage: 2,890,000 tokens
  • Monthly tokens: 86,700,000 tokens
  • Cost at GPT-4 rates: approximately £2,601 monthly

Local LLM costs for the same agent:

  • Hetzner dedicated server (AX102): £156 monthly
  • Running Llama 3.1 70B quantized
  • Handles 850 conversations daily with capacity for 2,000
  • Monthly cost: £156

That's £2,445 monthly savings. Over 12 months: £29,340 saved.

The break-even happens fast. If you're processing over 15 million tokens monthly, local deployment costs less. Most business agents cross that threshold within the first month.

Building n8n Workflows with Local LLMs

The practical question: How do you actually connect n8n to a local LLM?

We use Ollama for model management and serving. It provides an OpenAI-compatible API endpoint, which means existing n8n workflows need minimal changes.

Basic Local LLM Setup in n8n

Here's the workflow structure we use:

  1. Install Ollama on your server or dedicated instance
  2. Pull your chosen model: ollama pull llama3.1:70b
  3. Ollama automatically serves at http://localhost:11434
  4. In n8n, use the OpenAI node but point the base URL to your Ollama instance

Specific n8n configuration:

In your OpenAI node settings:

  • Base URL: http://your-server-ip:11434/v1
  • Model name: llama3.1:70b
  • API key: ollama (Ollama doesn't validate this but n8n requires a value)

The workflow functions identically to OpenAI-based workflows. Your existing prompts, chains, and logic require zero changes.

Customer Service Agent Workflow Example

We built an agent that monitors a shared inbox, categorizes enquiries, and responds to common questions automatically.

n8n workflow structure:

  1. IMAP Email Trigger node monitors inbox every 2 minutes
  2. Local LLM node classifies enquiry into 8 categories
  3. Switch node routes based on category
  4. For categories "shipping", "returns", "product_info": Local LLM node generates response using company knowledge base
  5. For categories "complaint", "refund", "technical": Create ticket in Linear and notify team
  6. Send Email node delivers response or confirmation

The critical difference with local LLM: The email content and customer data never leave the company's server. The classification and response generation happen on their infrastructure.

Response time: 4-7 seconds from email receipt to response sent. Accuracy on classification: 94% after prompt tuning.

Document Processing Workflow with Privacy Requirements

A legal firm needed to extract key dates, parties, and obligations from contracts. These documents contain confidential client information that cannot be transmitted externally.

n8n workflow structure:

  1. Watch folder trigger monitors document upload directory
  2. PDF extraction using pdf-parse
  3. Text chunking node splits document into 2,000 token segments
  4. Loop node processes each chunk through local LLM with extraction prompt
  5. Aggregate node combines extracted data
  6. Structure validation using JSON Schema node
  7. Insert into PostgreSQL database
  8. Notify staff via Slack

Local LLM configuration specifics:

Model: Llama 3.1 70B quantized to 4-bit Server: On-premises Dell PowerEdge R750 Processing speed: 38 tokens per second Average document processing time: 3.2 minutes for 40-page contracts

The firm processes 60-80 contracts monthly. All data remains on their server. No information transmitted to external services.

Performance: Local LLMs vs OpenAI in Production

The assumption is that local LLMs perform worse than OpenAI's models. Our testing shows different results.

We ran identical prompts through GPT-4, Claude 3.5 Sonnet, and Llama 3.1 70B across three categories: customer service responses, data extraction, and email classification.

Customer service responses (300 test cases):

  • GPT-4: 91% customer satisfaction score
  • Claude 3.5: 93% customer satisfaction score
  • Llama 3.1 70B: 89% customer satisfaction score

Data extraction accuracy (500 documents):

  • GPT-4: 96% field accuracy
  • Claude 3.5: 97% field accuracy
  • Llama 3.1 70B: 94% field accuracy

Email classification (1,000 emails, 8 categories):

  • GPT-4: 97% correct classification
  • Claude 3.5: 96% correct classification
  • Llama 3.1 70B: 94% correct classification

The performance gap exists but it's smaller than expected. For most business automation, 94% accuracy is sufficient when you gain complete data control and eliminate API costs.

When OpenAI's API Still Makes Sense

Local LLMs aren't always the answer. Three scenarios where we still recommend API-based solutions:

Extreme accuracy requirements. When error rates must be under 1%, GPT-4 and Claude consistently outperform local models. A medical coding client needs 99.2% accuracy. We use Claude's API because no local model achieves that consistently.

Low volume with high complexity. If you're processing under 5 million tokens monthly with complex reasoning requirements, API costs are low and performance is higher. An investment research client queries 40 times weekly with multi-step reasoning. API cost: £67 monthly. Not worth the infrastructure overhead.

Rapid prototyping. When building initial agent versions, API access is faster. We prototype with OpenAI's API, validate the workflow, then migrate to local LLMs for production.

Compliance Benefits Beyond Privacy

Local LLM deployment solves problems beyond data privacy.

GDPR right to erasure. When customer data is processed through local LLMs, you control deletion completely. No need to request data deletion from third-party providers. One client faced a right-to-erasure request. We deleted the relevant database entries and confirmed no data existed in external systems. Total time: 20 minutes.

Data processing agreements. Every external API requires a data processing agreement. These take 2-4 weeks to negotiate with legal teams. Local LLMs eliminate this requirement entirely.

Cyber insurance requirements. Insurance providers are adding AI-specific clauses. Several policies now require documentation of where AI-processed data is stored. Local LLM deployment simplifies this documentation significantly.

Getting Started with Local LLM Agents

If you're running AI agents that process business data, here's the evaluation framework we use:

Run local LLMs if:

  • You process over 15 million tokens monthly
  • You handle regulated or confidential data
  • You need guaranteed uptime during peak periods
  • Data residency requirements exist

Stick with APIs if:

  • Token volume is under 5 million monthly
  • Accuracy requirements exceed 98%
  • You lack infrastructure for model hosting
  • You're still in prototyping phase

For most established businesses running production agents, local LLM deployment delivers better economics and eliminates privacy concerns.

Moving Your Agents to Local LLMs

We've migrated 47 client agents from API-based to local LLM infrastructure. The process is straightforward.

Average migration time: 4-6 hours per agent. Most of that time is testing and validation, not code changes.

The privacy benefits appear immediately. The cost savings compound monthly. The control over your data infrastructure is permanent.

If you're running AI agents that handle customer data, financial information, or confidential business processes, local LLM deployment isn't optional. It's how you build automation that actually protects your business.

We build and migrate AI agents to local LLM infrastructure for businesses that need privacy, control, and predictable costs. Let's talk about your agent requirements.

Ready to automate?

Book a free automation audit and we'll map your workflows and show you where to start.

Book a Call

Related posts

Table of contents