Chunks
Your documents are split into semantically meaningful sections (typically a few hundred to a few thousand characters each). That might be single paragraphs, sections of a contract or entries in a protocol.
Retrieval, not copy-paste. Your documents as a second source for every AI answer.
Contracts, policies, case files, product documentation, research notes – put them into a knowledge base once, and the AI pulls from them in every conversation. Intelligently, targeted, only the passages that are relevant right now. No manual uploads per chat. No forgotten context. All GDPR-compliant, shareable across your team, usable with our own European models too.
What is a knowledge base?
A knowledge base in anymize is a store for your documents, prepared for AI use. Technically it's built on Retrieval-Augmented Generation (RAG) – an architecture that has become the standard in the AI world for handling large internal bodies of knowledge.
The three core building blocks
Your documents are split into semantically meaningful sections (typically a few hundred to a few thousand characters each). That might be single paragraphs, sections of a contract or entries in a protocol.
Each chunk is translated into a mathematical representation (a vector) that captures its meaning. Chunks with similar meaning sit close to each other in vector space.
Stores all chunks and their embeddings. When you later ask a question, your question is turned into a vector too, and the database returns the chunks closest to it in meaning.
The result: The AI only receives the passages that match your question. Not everything at once. Not irrelevant pages. Targeted.
In practice you don't see any of this. You upload documents, switch on the knowledge base in chat, ask questions. anymize does the rest.
RAG vs. context window
Frontier models have impressively large context windows – a hundred thousand tokens, sometimes over a million. In theory you could stuff an entire company archive into a single prompt. In practice, this runs into three hard problems:
Individual documents you open for a single analysis – upload them directly into the chat. Anything you need again and again – put it into a knowledge base.
How you use them
Every knowledge base appears as a toggle in the chat interface. One click – and from now on the AI pulls fitting passages from your database for every answer. Several databases can be active in parallel: e.g. „Client XY“ + „Case-law archive“ + „Firm standards“.
Working regularly on a specific case, client or topic? Attach the fitting knowledge bases to your project. New chats inside that project have them active automatically. No switching on, nothing to forget.
For automated workflows: the anymize API offers retrieval straight from a knowledge base – including source metadata, chunk scores and an optional AI answer in a single request. Integrates with your own apps, CRM systems, agent workflows (n8n, Make.com, Zapier, Flowise, MCP servers).
Anonymization in knowledge bases
When uploading each document, you choose how it should be stored:
The document is automatically anonymized before indexing – over 40 categories of personal and business-sensitive data are replaced with placeholders (names, addresses, IBANs, case numbers, etc.). Chunks and embeddings in the database contain only placeholder versions. When the AI later cites this database, you get the answer back with your original data thanks to bidirectional anonymization.
Consequence: Even when an international frontier model (GPT, Claude, Gemini) is used for the answer, it only ever sees placeholders. No personal data leaves the anymize platform.
Internal handbooks without any personal reference, public studies, product documentation, company policies – for content of this kind, you can skip anonymization. That saves processing time and avoids unnecessary placeholders in contexts where they would distort meaning.
You decide per upload, not per database. Inside the same database, some documents may be anonymized while others are not – depending on content.
With our own models
Knowledge bases work with every model in anymize – international frontier models (GPT, Claude, Gemini, Mistral, Perplexity, Kimi) as well as our own models anymize Waterfall and anymize Fountain.
For the most sensitive scenarios, this produces a setup that has been hard to find in Europe until now:
Data in the EU.
Retrieval in the EU.
Model in the EU.
Answer in the EU.
When you combine your knowledge base with Waterfall or Fountain, no byte of your data leaves the EU. No anonymization needed, because the models run with us anyway. No third-country transfer, no additional DPA, no compliance grey zone. For professional-secrecy holders, sensitive industries and high-security compliance, this is the hardest standard available.
Use cases
Six prototypical use cases – drawn from the real working contexts of our customers:
Client files, contract templates, case law
„How did we argue this the last time …?“ — instantly with citations from your own briefs.
Earlier due-diligence reports, market studies, interview transcripts
Pattern analyses across multiple client projects without rereading every single report.
Treatment guidelines, internal standards, specialist publications
„What is our standard protocol for …?“ — answers with a pointer to the internal SOP.
Internal research reports, compliance rules, regulatory updates
Same-day assessments that draw on your entire internal knowledge.
Employment contracts, works agreements, policies
„What does our works agreement say about home office?“ — answer with the exact clause.
API documentation, internal standards, post-mortems
Code reviews grounded in your own conventions; debugging informed by past failure patterns.
The pattern: Everywhere the model's general knowledge isn't enough, because the answer is your specific context – client history, company standards, internal processes. That is exactly what knowledge bases deliver.
Frequently asked questions
A collection of your documents, prepared for AI use. Technically it is based on Retrieval-Augmented Generation (RAG): documents are split into chunks, stored as embeddings in a vector database and retrieved selectively when you ask a question. The AI receives only the relevant passages as context – not the entire archive. This saves cost, improves answer quality and turns your company knowledge into a second source for every conversation.
We stand behind anymize. And we know – when an AI tool touches client, patient or employee data, a demo video isn't enough. That's why we give you 14 days of full access – all models, all features, no credit card. Enough time to be certain, before you trust us.
Your AI workplace awaits.