Think about the last time you started coding at a new job. On day one, you read the README, skimmed the coding guidelines, and asked a lot of questions. But you didn’t become productive overnight. Real understanding came from working in the codebase – discovering unwritten conventions, learning which files need to stay in sync, and building a mental model of how everything fits together.
AI coding assistants face the same challenge. Every time you open a new chat session, the assistant has no memory of your previous conversations. It doesn’t remember that your team uses conventional commits, that the API version must match across three files, or that you spent an hour yesterday explaining a tricky database migration. You’re back to square one, every single time.
Agentic memory is the solution to this problem. It gives AI agents the ability to retain and recall information across interactions, allowing them to accumulate knowledge over time rather than starting fresh with each conversation. The result is an assistant that gets progressively better at helping you – much like a new team member who gradually learns the ropes.
Why memory matters
Without memory, every session starts from zero. You re-explain your project’s conventions, constraints, and quirks each time you open a new chat. The obvious fix is to capture that knowledge somewhere the AI can find it – files that describe your coding standards, folder structure, and architectural decisions. That’s a basic form of memory, and it works. But it only captures what you’ve thought to write down, and keeping it current is entirely on you. The AI follows what you’ve documented but can’t learn anything new on its own.
The real promise of agentic memory goes further. When an AI agent can store and recall what it discovers while working, it captures knowledge that may not make it into documentation – which files need to stay in sync, which logging format the team uses, and how the code is structured. Memory also unlocks cross-agent learning. When one AI agent discovers a pattern in your codebase – say, that certain configuration files must stay synchronized – other agents working in the same repository can benefit from that insight. The knowledge doesn’t stay siloed in a single conversation; it becomes shared understanding.
These ideas may sound intuitive, but the industry is just starting to explore what’s possible. The impact is measurable. GitHub’s own evaluation of their Copilot Memory system found a 7% increase in pull request merge rates when the coding agent had access to memories (90% vs. 83% without) and a 2% improvement in positive feedback on code review comments. These are statistically significant gains that compound over time as the memory pool grows. In my own code, I’ve seen the right memory solutions cut the number of tokens required to get to a solution in half, shaving minutes off of my AI session.
The four types of agentic memory
Not all memory is created equal. Different kinds of information need different lifespans and scopes. It helps to think of agentic memory as a spectrum, ranging from the most persistent to the most ephemeral, with many different approaches and ways of working.
Long-term memory
The most persistent form of memory is the set of instructions that shape every interaction with an AI assistant. These are files you create and maintain – they persist indefinitely and apply automatically without the AI needing to “learn” them. Think of them as the house rules you’d post on the wall of a professional kitchen.
In the GitHub Copilot ecosystem, these include repository-wide custom instructions (.github/copilot-instructions.md), path-specific instruction files (.github/instructions/*.instructions.md) that apply only to matching files, prompts, and skills. I covered these in detail in
a previous post about customizing Copilot.
The strength of long-term memory is its reliability – it’s always there, always applied. The trade-off is that it requires manual maintenance. When your conventions change, you need to update these files yourself. The AI won’t do it for you (unless you ask), and stale content can actively mislead.
Medium-term memory
Medium-term memory occupies the space between permanent instructions and disposable session notes. This is information that persists across conversations, often scoped to a specific user or repository, but is more dynamic than instruction files. The key difference is that AI agents can create and update these memories on their own, without you editing configuration files.
For example, you might tell your AI assistant “remember that I prefer tabs over spaces and always use single quotes in JavaScript.” The agent stores this as a note in a local file, and it is automatically loaded into future conversations – even in different workspaces. Similarly, an agent working in a specific repository might note that the project uses the repository pattern for data access, storing that observation where future sessions in the same workspace can pick it up.
This type of memory bridges the gap between “things you’ve explicitly documented” and “things the AI has figured out on its own.” It’s flexible enough to capture details that are often missed in official documentation and specifications, like your personal coding style or a team convention that everyone follows but nobody wrote down.
Short-term memory
Short-term memory is scoped to a single conversation and intentionally disposable. It’s useful when an agent is working through a multi-step task and needs to track progress, intermediate results, or a plan that guides its work. It’s saved and loaded as one or more agents work, and it’s tailored to their tasks.
Consider an agent that’s helping you refactor a large module. As it analyzes the code, it might create notes about which functions depend on each other, which tests need updating, and which files it has already modified. These notes help the agent stay organized across multiple tool calls within the same conversation, but they’re not worth preserving long-term. When the session ends, the notes are cleared automatically.
Short-term memory also helps with compaction, the process used to preserve the context window – the limited amount of text a Large Language Model (LLM) can hold in its active working memory at one time. When the system needs to free up space, it asks the LLM to summarize the most important details. Many of the specifics may get lost. By storing key findings and details in short-term memory, key details can be quickly restored if and when they are needed.
In other words, short-term memory is like the sticky note on the monitor – useful right now, in the trash by tomorrow.
Dynamic memory through retrieval
The first three types of memory all involve storing and loading information either before the work begins or throughout the session based on discovery. Dynamic memory works differently. Instead of being “remembered,” it’s queried on demand from external sources when the agent needs it. This type of memory is deeper – it augments what the LLM “knows” with additional facts and details that can be restored on demand.
The most common form is Retrieval-Augmented Generation (RAG), where an agent searches a knowledge base for information relevant to the current task and injects the results into its context. Model Context Protocol (MCP) servers are another example – they expose structured resources, tools, and prompt templates that agents can access as needed. While some systems may allow for storage, most of these systems are read-only knowledge stores.
Dynamic memory shines when the knowledge base is too large to load upfront or changes frequently. Instead of trying to remember everything, the agent can look things up as needed. The trade-off is latency – retrieving information takes time – and the quality depends heavily on how well the retrieval system matches queries to relevant content.
A perfect example of this is referring to an issue or work item. By making those searchable or retrievable by the agent, it can find important facts if and when it needs them. They augment what the agent knows when necessary. It’s worth mentioning that there are a lot of tools being created in this space (such as Graphify) that use this approach to improve how the agent reasons about and discovers (or implements) code.
How VS Code and Copilot implement memory
The concepts above aren’t just theoretical. VS Code and GitHub Copilot have shipped two complementary memory systems that put these ideas into practice. They take different approaches to storage, scope, and lifecycle, and understanding both helps you get the most out of each.
The VS Code local memory tool
VS Code includes a built-in memory tool (currently in preview) that stores notes as local files on your machine. It organizes memories into three scopes:
- User memory (
/memories/) - Persists across all workspaces and conversations. The first 200 lines are automatically loaded into the agent’s context at the start of every session. This is where personal preferences and general insights live.
- User memory (
- Repository memory (
/memories/repo/) - Scoped to the current workspace. Persists across conversations but only applies when you’re working in that specific project. Ideal for codebase conventions, architecture decisions, and build commands.
- Repository memory (
- Session memory (
/memories/session/) - Scoped to the current conversation. Automatically cleared when the chat session ends. Used for task-specific notes and in-progress plans.
- Session memory (
You can explicitly ask the agent to remember something (“remember that our team uses conventional commits”), or the agent may store observations on its own as it works. Everything stays local – no data leaves your machine. You can view all memory files through the Chat: Show Memory Files command and clear them with Chat: Clear All Memory Files.
GitHub Copilot cloud memory
Copilot Memory is a separate, GitHub-hosted system that takes a fundamentally different approach. It is shared across the repository and automatic. It’s currently in preview.
The most interesting aspect of Copilot Memory is its architecture. As agents work in your repository – reviewing pull requests, implementing features, or running CLI commands – they automatically capture tightly scoped insights. Each insight is stored as an observation tied to specific code locations – a file and line number. These citations are the key to making the system work.
Before any agent uses a stored memory, it verifies the citations against the current codebase in real time. If the code at those locations has changed in a way that contradicts the memory, or the files no longer exist, the memory is discarded or corrected. GitHub calls this “just-in-time verification,” and it helps to eliminate staleness. Rather than building complex offline curation pipelines to keep memories fresh, the system simply checks whether each memory still holds true at the moment it’s needed.
Memories also auto-expire after 28 days. When an agent verifies a memory and finds it accurate, the memory is recreated, extending its life. Observations that are no longer used gradually fade away. This creates a knowledge base that curates key facts that are frequently used.
Copilot Memory is currently used by the cloud coding agent, code review, Copilot CLI, and VS Code. That means that unlike VS Code’s memory, these details may be available to other types of agents. To keep sessions focused, Copilot retrieves the most recent memories for the repository rather than pushing everything the repository knows into a single prompt.
Choosing between local and cloud memory
The two systems complement each other rather than competing. In practice, you’ll use both. Your local memory captures preferences and the details of specific work or relationships. Copilot Memory captures “this repository uses OpenTelemetry for logging, and the details start on line 57 of logger.ts.”
Looking forward
Agentic memory transforms AI coding assistants from tools that forget everything between sessions into collaborators that build genuine understanding over time. The combination of approaches gives you and your AI control over the details it remembers and for how long. As these systems mature – with broader agent integration, smarter retrieval, and richer memory structures – the distinction between “assistant” and “team member” will continue to blur. And that’s an exciting shift for how you build software.
