Connect an agent

One local API. Any assistant.

Agent Memory exposes a small HTTP API on http://127.0.0.1:8765. Anything that can speak HTTP can read from the same memory — Claude Desktop, ChatGPT Desktop, Cursor, Continue.dev, n8n, Zapier, plain shell scripts. This page is the full contract, plus copy-paste examples for each.

Last updated: 28 April 2026 · prices and integration details verified against current product docs.

Why local memory

Why local memory beats cloud memory.

Hosted vector databases and built-in chatbot memory are convenient — until you read the privacy policy, the per-query bill, or try to use them offline. Three reasons we built Agent Memory the way we did:

1. Your project knowledge is your project knowledge.

Decision logs, runbooks, internal docs, source code and PR notes are some of the most sensitive content a company has. The right home for them is your disk, not someone else’s vector database. Agent Memory stores chunks and embeddings under your user-data folder, and the API only listens on 127.0.0.1.

2. The cost curve is wrong.

Hosted vector DBs scale per record and per query. Built-in chatbot memory is bundled into a $20+ / month subscription per assistant — and only helps inside that one product. Agent Memory is a one-off $29 install, free to run, and serves every agent on your machine from the same store. The optional $5 / month Updates Pass adds new skills and is cancellable any time.

3. Memory should outlive any one assistant.

If you switch from ChatGPT to Claude, or add Cursor for coding, you don’t want to lose a year of accumulated project context. Agent Memory is intentionally model-agnostic. The same JSON index works for whichever assistant you use this quarter.

Bottom line Local memory is cheaper, more private, and more portable. The only thing the cloud does better is hosting itself for you — and Agent Memory does that on the machine you already own.
↑ Back to top

API contract

The local API contract.

All endpoints live under http://127.0.0.1:8765. If that port is already in use, Agent Memory automatically moves to the next available local port and reports it via GET /stats. Agents should read the active port at runtime rather than hard-coding 8765.

GET /health

Lightweight liveness check. Returns { "status": "healthy", "app": "Agent Memory", "updatedAt": "..." }.

GET /stats

Returns the active API port, app version, embedding model name, list of indexed sources, chunk count, whether a guidance profile is set, and the data directory.

POST /search

The main endpoint agents call. Body:

{ "query": "What did we decide about pdfaa.ai staging?",
  "limit": 8 }

Response:

{
  "query": "...",
  "profilePrompt": "Agent Memory user context. Treat this as
                    persistent user guidance before interpreting
                    search results.\n\n## About Me\n...",
  "results": [
    {
      "id": "...", "score": 0.81,
      "sourceId": "...", "sourceName": "PDF-AA V3",
      "filePath": "C:\\...\\HANDOVER.md",
      "fileName": "HANDOVER.md",
      "chunkIndex": 4,
      "text": "Use pdfaa.ai as V3 staging. ..."
    }
  ]
}

Agents should place profilePrompt in their context before the snippets, treating it like a system-prompt-style preface. Then read each text with its filePath and score.

POST /ingest

Index folders or files. Body:

{ "paths": ["C:\\Path\\To\\Repo"],
  "sourceName": "Project Name" }

POST /documents

Add a manual note (decision, deployment record, ad-hoc context).

{ "title": "Deployment decision",
  "sourceName": "Manual notes",
  "text": "Use pdfaa.ai as V3 staging." }

GET /profile · POST /profile

Read or write the About Me / How I Like to Work / Very Important guidance that becomes the profilePrompt on every /search response.

↑ Back to top

Connection pattern

The recommended connection pattern.

An assistant should call /search at the start of any non-trivial task, with a query that mirrors what the user just asked. Then:

  • Place profilePrompt in the assistant context before anything else.
  • Append the top N snippets, with filePath and score, as quoted reference material.
  • Treat snippets as guidance — open the actual files for exact details before making changes.
  • Optionally, ask the user “run a Skill” (e.g. Project Briefing, Launch Brief, Onboarder) and pull the matching skill recipe from the local memory before answering.
// Pseudocode
const stats   = await fetch("http://127.0.0.1:8765/stats").then(r => r.json());
const port    = stats.apiPort;
const search  = await fetch(`http://127.0.0.1:${port}/search`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: userTask, limit: 8 })
}).then(r => r.json());

const context = [
  search.profilePrompt,
  ...search.results.map(r =>
    `// ${r.fileName} (score ${r.score.toFixed(2)})\n${r.text}`
  )
].filter(Boolean).join("\n\n");

// Send `context` + user task to your LLM of choice.
↑ Back to top

Claude Desktop

Claude Desktop.

Claude Desktop supports MCP (Model Context Protocol) servers. You can wrap Agent Memory’s HTTP API as a small MCP server, or — for the simplest setup — use Claude’s file-system tools to call curl against the local API at the start of a task.

Quick-start tool prompt to paste into a Claude project:

At the start of every task, run:

  curl -s -X POST http://127.0.0.1:8765/search \
    -H "Content-Type: application/json" \
    -d "{\"query\": \"<TASK_SUMMARY>\", \"limit\": 8}"

Then place the returned `profilePrompt` at the top of your
context, and treat each `text` field as quoted reference
material with its `filePath` and `score`.
↑ Back to top

ChatGPT Desktop

ChatGPT Desktop.

The ChatGPT desktop app can run shell commands when given the appropriate tool. Add a custom GPT or system prompt that calls Agent Memory at the start of each task:

Whenever the user starts a new task, before answering:

  POST http://127.0.0.1:8765/search
  body: {"query": "<TASK_SUMMARY>", "limit": 8}

Use `profilePrompt` as system context. Use the ranked snippets
as project memory. Cite `filePath` when you quote a snippet.

If your ChatGPT setup cannot make HTTP calls directly, run a small bridge — for example a minimal Node or Python script that polls the clipboard or watches a folder, calls Agent Memory, and writes the result to a file ChatGPT can read.

↑ Back to top

Cursor

Cursor.

Cursor’s .cursorrules file is the cleanest place to wire Agent Memory in. Add a rule that tells the agent to fetch local memory before edits:

# .cursorrules

Before making non-trivial edits, query local memory:

  curl -s -X POST http://127.0.0.1:8765/search \
    -H "Content-Type: application/json" \
    -d '{"query": "<short summary of the task>", "limit": 8}'

Treat `profilePrompt` as a system-prompt-style preface.
Use the ranked `text` snippets to ground your edits before
opening the actual files referenced in `filePath`.

Cursor’s shell tool runs the curl command and the model gets the response in context.

↑ Back to top

Continue.dev

Continue.dev.

Continue supports custom context providers. A minimal HTTP context provider that calls Agent Memory:

// continue/config.ts
{
  name: "agent-memory",
  description: "Local Agent Memory search",
  type: "submenu",
  query: async (q) => {
    const r = await fetch("http://127.0.0.1:8765/search", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ query: q, limit: 8 })
    }).then(r => r.json());

    const header = r.profilePrompt
      ? `${r.profilePrompt}\n\n---\n\n` : "";
    return header + r.results
      .map(x => `// ${x.fileName} (score ${x.score.toFixed(2)})\n${x.text}`)
      .join("\n\n");
  }
}
↑ Back to top

n8n & Zapier

n8n & Zapier.

For automation pipelines, use a regular HTTP Request node:

  • Method: POST
  • URL: http://127.0.0.1:8765/search
  • Body type: JSON
  • Body: { "query": "{{$json.task}}", "limit": 8 }

Wire the response into the next node — typically an LLM node — passing profilePrompt as system context and results[*].text as memory snippets. Because Agent Memory only listens on 127.0.0.1, the n8n / Zapier desktop runner must run on the same machine as Agent Memory; cloud-only runs cannot reach the local port.

↑ Back to top

Your own scripts

Your own scripts.

cURL

curl -s -X POST http://127.0.0.1:8765/search \
  -H "Content-Type: application/json" \
  -d '{"query": "Windows installer reading-order decisions", "limit": 8}'

PowerShell

Invoke-RestMethod `
  -Method Post `
  -Uri http://127.0.0.1:8765/search `
  -ContentType 'application/json' `
  -Body '{"query":"Windows installer reading order","limit":8}'

Python

import requests

r = requests.post("http://127.0.0.1:8765/search",
                  json={"query": "Cloudflare deployment", "limit": 8},
                  timeout=10)
data = r.json()

print(data.get("profilePrompt") or "")
for hit in data["results"]:
    print(f"- {hit['fileName']}  score={hit['score']:.2f}")
    print(f"  {hit['text'][:200]}...")

Node.js

const res = await fetch("http://127.0.0.1:8765/search", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: "deployment decisions", limit: 8 })
});
const data = await res.json();
↑ Back to top

Vs. cloud memory

Side-by-side comparison.

The same comparison shown on the home page, in more detail. Indicative figures only — hosted vector database pricing changes often, and chatbot memory features differ between products.

Capability Hosted vector DB (Pinecone, Weaviate Cloud, Chroma Cloud) Built-in chatbot memory (ChatGPT, Claude Projects, Gemini) Agent Memory
Where your data lives Hosted vector DB in someone else’s cloud Vendor cloud JSON index on your disk
Works offline No — every query needs internet No — every query needs internet Yes — after first model download
Recurring cost $70 – $500+ / month for production tiers Bundled in $20+ / month assistant subs $0 / month (optional $5 / mo Updates Pass)
Per-query fee Per request, plus storage tier Per token, in your assistant subscription None
Indexes raw repos & folders You build the ingestion pipeline No — manual upload only Built in — point at folders, walk the tree
Works across multiple agents Yes, if you build glue code Locked to one vendor Any agent that can call HTTP
Built-in skills library None Vendor-specific 18 skills · 3 categories · editable
User guidance / system prompt Not included Vendor-specific, not portable Returned as profilePrompt on every /search
Setup time Hours — index, schema, auth, embeddings, glue Minutes inside one assistant Minutes — point at folders
Licence model Hosted SaaS subscription Hosted SaaS subscription $29 one-off lifetime · permissive components

Sources: each provider’s public pricing pages and product docs as of 28 April 2026. Agent Memory has no commercial relationship with the providers listed.

↑ Back to top

Ready when you are

One install. One local API.
Every agent on your machine, finally on the same page.

Lifetime licence for $29. Eighteen skills built in. 30-day money-back. Optional Updates Pass at $5 / month.