Cache
Semantic caching layer for LLM responses — exact match, semantic similarity, and hybrid strategies.
Installation
pnpm add @lov3kaizen/agentsea-cacheFor semantic matching, also install the embeddings package:
pnpm add @lov3kaizen/agentsea-embeddingsQuick Start
Create a SemanticCache with a store and a match strategy, then wrap your LLM calls. Identical requests hit the cache via hash-based exact matching:
import {
SemanticCache,
MemoryCacheStore,
ExactMatchStrategy,
} from '@lov3kaizen/agentsea-cache';
// Create cache with memory store and exact matching
const cache = new SemanticCache(
{
defaultTTL: 3600, // 1 hour
matchStrategy: 'exact',
},
new MemoryCacheStore({ type: 'memory', maxEntries: 10000 }),
new ExactMatchStrategy(),
);
// Wrap your LLM call
const response = await cache.wrap(
{
model: 'gpt-5.5',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
},
async (request) => {
// Your LLM call here
return await openai.chat.completions.create(request);
},
);
console.log('Cached:', response._cache?.hit);Exact Match Caching
ExactMatchStrategy uses a hash of the normalized request so identical requests return the cached response instantly. Pair it with any store and set matchStrategy to 'exact':
import {
SemanticCache,
MemoryCacheStore,
ExactMatchStrategy,
} from '@lov3kaizen/agentsea-cache';
const cache = new SemanticCache(
{ defaultTTL: 3600, matchStrategy: 'exact' },
new MemoryCacheStore({ type: 'memory', maxEntries: 10000 }),
new ExactMatchStrategy(),
);Semantic Similarity Caching
Use SemanticMatchStrategy with a SimilarityEngine backed by an embedding provider to match semantically similar queries. Set a similarityThreshold to control how close a match must be:
import {
SemanticCache,
MemoryCacheStore,
SemanticMatchStrategy,
SimilarityEngine,
} from '@lov3kaizen/agentsea-cache';
import { OpenAIProvider } from '@lov3kaizen/agentsea-embeddings';
// Create embedding provider
const embeddingProvider = new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
// Create similarity engine
const similarity = new SimilarityEngine({
provider: embeddingProvider,
cacheEmbeddings: true,
});
const cache = new SemanticCache(
{
defaultTTL: 3600,
similarityThreshold: 0.92, // 92% similarity required
matchStrategy: 'semantic',
},
new MemoryCacheStore({ type: 'memory' }),
new SemanticMatchStrategy(),
similarity,
);Hybrid Strategy
HybridMatchStrategy tries exact matching first and falls back to semantic similarity — the best of both for high hit rates with low latency. Inspect _cache on the response for hit status and similarity score:
import {
SemanticCache,
MemoryCacheStore,
HybridMatchStrategy,
SimilarityEngine,
} from '@lov3kaizen/agentsea-cache';
import { OpenAIProvider } from '@lov3kaizen/agentsea-embeddings';
const similarity = new SimilarityEngine({
provider: new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
}),
cacheEmbeddings: true,
});
const cache = new SemanticCache(
{
defaultTTL: 3600,
similarityThreshold: 0.92,
matchStrategy: 'hybrid', // Try exact first, then semantic
},
new MemoryCacheStore({ type: 'memory' }),
new HybridMatchStrategy(),
similarity,
);
const response1 = await cache.wrap(
{
model: 'gpt-5.5',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
},
llmCall,
);
// This hits the cache due to semantic similarity!
const response2 = await cache.wrap(
{
model: 'gpt-5.5',
messages: [{ role: 'user', content: "What's France's capital city?" }],
},
llmCall,
);
console.log('Second call cached:', response2._cache?.hit); // true
console.log('Similarity:', response2._cache?.similarity); // ~0.95Wrap Options
The wrap method accepts per-call WrapOptions to override TTL, tag entries for grouped invalidation, scope to a namespace, or bypass the cache entirely:
interface WrapOptions {
ttl?: number; // Custom TTL
tags?: string[]; // Tags for grouping
namespace?: string; // Namespace override
skipCache?: boolean; // Bypass cache
forceRefresh?: boolean; // Force update
}
const response = await cache.wrap(request, llmCall, {
ttl: 1800,
tags: ['support', 'faq'],
});Streaming Replay
StreamCache records streaming LLM responses and replays cached streams transparently chunk by chunk:
import { StreamCache, MemoryCacheStore } from '@lov3kaizen/agentsea-cache';
const store = new MemoryCacheStore({ type: 'memory' });
const streamCache = new StreamCache(store, {
minLengthToCache: 50,
cacheIncomplete: false,
streamTtl: 3600,
});
// Wrap streaming calls
const stream = streamCache.wrapStream('gpt-5.5', messages, async function* () {
for await (const chunk of llm.stream(request)) {
yield chunk;
}
});
// Cached streams are replayed transparently
for await (const chunk of stream) {
process.stdout.write(chunk.content);
}Backends & Stores
Swap the store to change where cache entries live. Memory, Redis, SQLite, and Pinecone backends are available, each configured by its store config:
import {
MemoryCacheStore,
RedisCacheStore,
PineconeCacheStore,
} from '@lov3kaizen/agentsea-cache';
// In-memory with LRU eviction
const memoryStore = new MemoryCacheStore({
type: 'memory',
maxEntries: 10000,
maxSizeBytes: 1024 * 1024 * 1024, // 1GB
evictionPolicy: 'lru',
});
// Redis
const redisStore = new RedisCacheStore({
type: 'redis',
url: 'redis://localhost:6379',
keyPrefix: 'llm-cache',
});
// Pinecone (for semantic vector search)
const pineconeStore = new PineconeCacheStore({
type: 'pinecone',
apiKey: process.env.PINECONE_API_KEY,
index: 'llm-cache',
namespace: 'production',
});Multi-Tier Caching
TieredCacheStore composes multiple stores into an L1/L2/L3 hierarchy with write-through and promotion on hit — keep a fast in-memory L1 in front of a shared Redis L2:
import {
TieredCacheStore,
MemoryCacheStore,
RedisCacheStore,
} from '@lov3kaizen/agentsea-cache';
const memoryStore = new MemoryCacheStore({ type: 'memory' });
const redisStore = new RedisCacheStore({
type: 'redis',
url: 'redis://localhost:6379',
});
const tieredStore = new TieredCacheStore({
type: 'tiered',
tiers: [
{ name: 'l1-memory', priority: 1, store: memoryStore, ttl: 300 },
{ name: 'l2-redis', priority: 2, store: redisStore, ttl: 3600 },
],
writeThrough: true,
promoteOnHit: true,
});TTL & Eviction
Set a defaultTTL in seconds on the cache config (0 = no expiry) and override per call via WrapOptions.ttl. The memory store evicts entries with its configured evictionPolicy when maxEntries or maxSizeBytes is exceeded:
const cache = new SemanticCache(
{
defaultTTL: 3600, // entries expire after 1 hour
maxEntries: 10000,
matchStrategy: 'exact',
},
new MemoryCacheStore({
type: 'memory',
maxEntries: 10000,
maxSizeBytes: 1024 * 1024 * 1024,
evictionPolicy: 'lru',
}),
new ExactMatchStrategy(),
);Invalidation
Invalidate stale entries by key pattern or by the tags you attached at write time, or clear the cache entirely:
// Remove entries matching a pattern
await cache.invalidateByPattern('user:123:*');
// Remove all entries tagged 'faq'
await cache.invalidateByTags(['faq']);
// Delete a single key or clear everything
await cache.delete(key);
await cache.clear();Analytics
Track hit rate, tokens saved, and cost reduction with getStats and the detailed getAnalytics report:
// Get statistics
const stats = cache.getStats();
console.log(`Hit Rate: ${(stats.hitRate * 100).toFixed(1)}%`);
console.log(`Tokens Saved: ${stats.tokensSaved.toLocaleString()}`);
console.log(`Cost Savings: $${stats.costSavingsUSD.toFixed(2)}`);
// Get detailed analytics
const analytics = cache.getAnalytics();
const report = analytics.getCostSavingsReport();
console.log(`Reduction: ${report.reductionPercent.toFixed(1)}%`);AgentSea Integration
Wrap any provider with CachedProvider, or add caching to an agent pipeline with CacheMiddleware:
import { CachedProvider, CacheMiddleware } from '@lov3kaizen/agentsea-cache';
// Transparent caching around a provider
const cachedProvider = new CachedProvider({
provider: anthropicProvider,
cache: semanticCache,
skipModels: ['gpt-4-vision'], // Don't cache vision models
});
const response = await cachedProvider.complete({
model: 'claude-sonnet-4-6',
messages: [{ role: 'user', content: 'Hello' }],
});
// Caching middleware for agent pipelines
const middleware = new CacheMiddleware({
cache: semanticCache,
skipToolRequests: true, // Don't cache tool-using requests
defaultTTL: 1800,
});