Connect an agent
One local API. Any assistant.
Agent Memory exposes a small HTTP API on http://127.0.0.1:8765. Anything that can speak HTTP can read from the same memory — Claude Desktop, ChatGPT Desktop, Cursor, Continue.dev, n8n, Zapier, plain shell scripts. This page is the full contract, plus copy-paste examples for each.
Last updated: 28 April 2026 · prices and integration details verified against current product docs.
Why local memory
Why local memory beats cloud memory.
Hosted vector databases and built-in chatbot memory are convenient — until you read the privacy policy, the per-query bill, or try to use them offline. Three reasons we built Agent Memory the way we did:
1. Your project knowledge is your project knowledge.
Decision logs, runbooks, internal docs, source code and PR notes are some of the most sensitive content a company has. The right home for them is your disk, not someone else’s vector database. Agent Memory stores chunks and embeddings under your user-data folder, and the API only listens on 127.0.0.1.
2. The cost curve is wrong.
Hosted vector DBs scale per record and per query. Built-in chatbot memory is bundled into a $20+ / month subscription per assistant — and only helps inside that one product. Agent Memory is a one-off $29 install, free to run, and serves every agent on your machine from the same store. The optional $5 / month Updates Pass adds new skills and is cancellable any time.
3. Memory should outlive any one assistant.
If you switch from ChatGPT to Claude, or add Cursor for coding, you don’t want to lose a year of accumulated project context. Agent Memory is intentionally model-agnostic. The same JSON index works for whichever assistant you use this quarter.
API contract
The local API contract.
All endpoints live under http://127.0.0.1:8765. If that port is already in use, Agent Memory automatically moves to the next available local port and reports it via GET /stats. Agents should read the active port at runtime rather than hard-coding 8765.
GET /health
Lightweight liveness check. Returns { "status": "healthy", "app": "Agent Memory", "updatedAt": "..." }.
GET /stats
Returns the active API port, app version, embedding model name, list of indexed sources, chunk count, whether a guidance profile is set, and the data directory.
POST /search
The main endpoint agents call. Body:
{ "query": "What did we decide about pdfaa.ai staging?",
"limit": 8 }
Response:
{
"query": "...",
"profilePrompt": "Agent Memory user context. Treat this as
persistent user guidance before interpreting
search results.\n\n## About Me\n...",
"results": [
{
"id": "...", "score": 0.81,
"sourceId": "...", "sourceName": "PDF-AA V3",
"filePath": "C:\\...\\HANDOVER.md",
"fileName": "HANDOVER.md",
"chunkIndex": 4,
"text": "Use pdfaa.ai as V3 staging. ..."
}
]
}
Agents should place profilePrompt in their context before the snippets, treating it like a system-prompt-style preface. Then read each text with its filePath and score.
POST /ingest
Index folders or files. Body:
{ "paths": ["C:\\Path\\To\\Repo"],
"sourceName": "Project Name" }
POST /documents
Add a manual note (decision, deployment record, ad-hoc context).
{ "title": "Deployment decision",
"sourceName": "Manual notes",
"text": "Use pdfaa.ai as V3 staging." }
GET /profile · POST /profile
Read or write the About Me / How I Like to Work / Very Important guidance that becomes the profilePrompt on every /search response.
Connection pattern
The recommended connection pattern.
An assistant should call /search at the start of any non-trivial task, with a query that mirrors what the user just asked. Then:
- Place
profilePromptin the assistant context before anything else. - Append the top
Nsnippets, withfilePathandscore, as quoted reference material. - Treat snippets as guidance — open the actual files for exact details before making changes.
- Optionally, ask the user “run a Skill” (e.g. Project Briefing, Launch Brief, Onboarder) and pull the matching skill recipe from the local memory before answering.
// Pseudocode
const stats = await fetch("http://127.0.0.1:8765/stats").then(r => r.json());
const port = stats.apiPort;
const search = await fetch(`http://127.0.0.1:${port}/search`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query: userTask, limit: 8 })
}).then(r => r.json());
const context = [
search.profilePrompt,
...search.results.map(r =>
`// ${r.fileName} (score ${r.score.toFixed(2)})\n${r.text}`
)
].filter(Boolean).join("\n\n");
// Send `context` + user task to your LLM of choice.
↑ Back to top
Claude Desktop
Claude Desktop.
Claude Desktop supports MCP (Model Context Protocol) servers. You can wrap Agent Memory’s HTTP API as a small MCP server, or — for the simplest setup — use Claude’s file-system tools to call curl against the local API at the start of a task.
Quick-start tool prompt to paste into a Claude project:
At the start of every task, run:
curl -s -X POST http://127.0.0.1:8765/search \
-H "Content-Type: application/json" \
-d "{\"query\": \"<TASK_SUMMARY>\", \"limit\": 8}"
Then place the returned `profilePrompt` at the top of your
context, and treat each `text` field as quoted reference
material with its `filePath` and `score`.
↑ Back to top
ChatGPT Desktop
ChatGPT Desktop.
The ChatGPT desktop app can run shell commands when given the appropriate tool. Add a custom GPT or system prompt that calls Agent Memory at the start of each task:
Whenever the user starts a new task, before answering:
POST http://127.0.0.1:8765/search
body: {"query": "<TASK_SUMMARY>", "limit": 8}
Use `profilePrompt` as system context. Use the ranked snippets
as project memory. Cite `filePath` when you quote a snippet.
If your ChatGPT setup cannot make HTTP calls directly, run a small bridge — for example a minimal Node or Python script that polls the clipboard or watches a folder, calls Agent Memory, and writes the result to a file ChatGPT can read.
↑ Back to topCursor
Cursor.
Cursor’s .cursorrules file is the cleanest place to wire Agent Memory in. Add a rule that tells the agent to fetch local memory before edits:
# .cursorrules
Before making non-trivial edits, query local memory:
curl -s -X POST http://127.0.0.1:8765/search \
-H "Content-Type: application/json" \
-d '{"query": "<short summary of the task>", "limit": 8}'
Treat `profilePrompt` as a system-prompt-style preface.
Use the ranked `text` snippets to ground your edits before
opening the actual files referenced in `filePath`.
Cursor’s shell tool runs the curl command and the model gets the response in context.
Continue.dev
Continue.dev.
Continue supports custom context providers. A minimal HTTP context provider that calls Agent Memory:
// continue/config.ts
{
name: "agent-memory",
description: "Local Agent Memory search",
type: "submenu",
query: async (q) => {
const r = await fetch("http://127.0.0.1:8765/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query: q, limit: 8 })
}).then(r => r.json());
const header = r.profilePrompt
? `${r.profilePrompt}\n\n---\n\n` : "";
return header + r.results
.map(x => `// ${x.fileName} (score ${x.score.toFixed(2)})\n${x.text}`)
.join("\n\n");
}
}
↑ Back to top
n8n & Zapier
n8n & Zapier.
For automation pipelines, use a regular HTTP Request node:
- Method: POST
- URL:
http://127.0.0.1:8765/search - Body type: JSON
- Body:
{ "query": "{{$json.task}}", "limit": 8 }
Wire the response into the next node — typically an LLM node — passing profilePrompt as system context and results[*].text as memory snippets. Because Agent Memory only listens on 127.0.0.1, the n8n / Zapier desktop runner must run on the same machine as Agent Memory; cloud-only runs cannot reach the local port.
Your own scripts
Your own scripts.
cURL
curl -s -X POST http://127.0.0.1:8765/search \
-H "Content-Type: application/json" \
-d '{"query": "Windows installer reading-order decisions", "limit": 8}'
PowerShell
Invoke-RestMethod `
-Method Post `
-Uri http://127.0.0.1:8765/search `
-ContentType 'application/json' `
-Body '{"query":"Windows installer reading order","limit":8}'
Python
import requests
r = requests.post("http://127.0.0.1:8765/search",
json={"query": "Cloudflare deployment", "limit": 8},
timeout=10)
data = r.json()
print(data.get("profilePrompt") or "")
for hit in data["results"]:
print(f"- {hit['fileName']} score={hit['score']:.2f}")
print(f" {hit['text'][:200]}...")
Node.js
const res = await fetch("http://127.0.0.1:8765/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query: "deployment decisions", limit: 8 })
});
const data = await res.json();
↑ Back to top
Vs. cloud memory
Side-by-side comparison.
The same comparison shown on the home page, in more detail. Indicative figures only — hosted vector database pricing changes often, and chatbot memory features differ between products.
| Capability | Hosted vector DB (Pinecone, Weaviate Cloud, Chroma Cloud) | Built-in chatbot memory (ChatGPT, Claude Projects, Gemini) | Agent Memory |
|---|---|---|---|
| Where your data lives | Hosted vector DB in someone else’s cloud | Vendor cloud | JSON index on your disk |
| Works offline | No — every query needs internet | No — every query needs internet | Yes — after first model download |
| Recurring cost | $70 – $500+ / month for production tiers | Bundled in $20+ / month assistant subs | $0 / month (optional $5 / mo Updates Pass) |
| Per-query fee | Per request, plus storage tier | Per token, in your assistant subscription | None |
| Indexes raw repos & folders | You build the ingestion pipeline | No — manual upload only | Built in — point at folders, walk the tree |
| Works across multiple agents | Yes, if you build glue code | Locked to one vendor | Any agent that can call HTTP |
| Built-in skills library | None | Vendor-specific | 18 skills · 3 categories · editable |
| User guidance / system prompt | Not included | Vendor-specific, not portable | Returned as profilePrompt on every /search |
| Setup time | Hours — index, schema, auth, embeddings, glue | Minutes inside one assistant | Minutes — point at folders |
| Licence model | Hosted SaaS subscription | Hosted SaaS subscription | $29 one-off lifetime · permissive components |
Sources: each provider’s public pricing pages and product docs as of 28 April 2026. Agent Memory has no commercial relationship with the providers listed.
↑ Back to topReady when you are
One install. One local API.
Every agent on your machine, finally on the same page.
Lifetime licence for $29. Eighteen skills built in. 30-day money-back. Optional Updates Pass at $5 / month.