Live text as you speak
The transcribed text appears right during the recording – you see what was understood and can correct or refine as soon as the meeting pauses. No waiting on a batch result.
Live transcription with speaker recognition. GDPR-compliant, in 13 languages.
Start the recording, talk to colleagues, clients, patients or yourself – anymize transcribes live, detects speakers after the recording ends and files the finished protocol as a document in your account. From then on: reusable in chats, knowledge bases, projects. Without a single word going to a US service.
What you get
The transcribed text appears right during the recording – you see what was understood and can correct or refine as soon as the meeting pauses. No waiting on a batch result.
The moment you stop the recording, speaker recognition runs across the full transcript. Every passage is assigned to the right speaker. No participant cap – two people, five, twelve or more, the model separates as many voices as are distinguishable.
The finished transcript lives as a document inside anymize – just like any uploaded PDF. From there it becomes a building block of your further workflow: dropped into chats, added to knowledge bases, linked to projects, summarized as an artifact.
How it works
The standard path for the office. Pick your microphone from the dropdown, click “Start recording” – transcription is live from that moment. Pause button for interruptions, stop button ends the session and kicks off speaker recognition.
Perfect for meeting rooms, bedside rounds, interviews at the customer's site. anymize shows a QR code on your desktop – scan it with your phone and the same recording surface opens in the mobile browser. You record with the phone, the transcript appears in parallel on both devices. No app download, no account switch, no separate recording software.
Meeting in the conference room, laptop out of reach or fan too loud. Phone out, scan the QR, recording runs. Back at your desk the finished transcript is ready in your anymize account.
The speech stack
Speech recognition runs on Voxtral Transcribe 2 – the open speech model from Mistral, a European provider. We operate it ourselves inside our infrastructure: no audio stream goes to US services like Otter.ai or Rev, no transcript lands with any third party.
Mixed-language meetings (e.g. DE + EN alternating) are recognized without manual switching.
Open model from Mistral, operated by us.
No routing to Otter.ai, Rev or any other US provider.
Language switches within the same meeting without manual toggles.
Why self-hosted
Voice recordings are among the most sensitive documents of any company: client meetings, patient rounds, investor conversations, internal strategy sessions. With most transcription services, the audio lands with the provider – often in the US, often with unclear privacy commitments. With anymize, processing happens exclusively in our European infrastructure (EU, hosted at Hetzner in Germany). No audio, no transcript, no metadata leaves to third parties.
Speaker recognition
After the recording ends, speaker recognition runs once across the full transcript. The result: every passage carries a speaker label – “Speaker 1”, “Speaker 2” and so on.
Two people, five, twelve – the model separates as many speakers as it can distinguish vocally. Ideal for round-table meetings, board sessions, conferences with many participants.
You rename the labels (“Speaker 1”, “Speaker 2”) once at the top of the transcript. The new name is automatically propagated throughout the entire document. “Speaker 1” becomes “Dr. Schmidt”, “Speaker 2” becomes “Client” or “Anna”. In one step, not passage by passage.
Speaker separation becomes noticeably more accurate when the model can analyze the full conversation. Individual voices are learned better through many samples than through the first three sentences. That is why recognition runs once at the end – with clearly better results than any live variant could deliver.
Anonymization
Transcripts typically contain a lot of personal data: names of participants, addresses, phone numbers, medical conditions, company references. Whether these appear in clear text or as placeholders in the document is your choice.
If you enable anonymization, the finished transcript is run once through the anymize anonymization pipeline after the recording ends: names, addresses, IBANs, case numbers and the remaining 40+ categories become placeholders. The anonymized transcript can then safely be handed to international frontier models (for summarization, analysis, translation) – without any participant being identifiable.
Thanks to bidirectional anonymization, the original data remains accessible: in preview you see it normally. In chats based on the transcript, you get answers back with the real names. On export you decide per file whether originals or placeholders are written.
For internal meetings without sensitive content or for audio from public sources (lectures, podcasts) you skip anonymization – saves processing steps, the transcript stays in clear text.
Reuse
A finished transcript is rarely the end product. You want a summary, a to-do list, an anonymization for sending, a structure for filing. In anymize this happens without media breaks:
Start a chat, point to the transcript document. The AI reads it in and you ask: “Summarize the core decisions in 5 points.” · “Extract all to-dos with owner.” · “What did Dr. Schmidt say about topic X?”
For recurring topics (weekly standups, client meetings, clinical rounds) set up a knowledge base and collect all relevant transcripts there. The AI pulls from them when you ask later – with citations.
Knowledge basesDoes the transcript belong to an ongoing mandate, advisory project or product launch? Link it with the matching project. All participants in the project now have the transcript as context – without you manually sharing it.
ProjectsLet the AI produce a finished artifact from the transcript: a formal meeting report, a decision protocol, a to-do list as a table, a client memo as a letter. Edit in the WYSIWYG editor, export as Word or PDF.
ArtifactsUse cases
Six realistic deployments – drawn from the working realities of our customers:
Complete conversation protocol with speaker assignment client / firm
Anonymized for peer review, original for the file
Transcript with speaker assignment physician / patient / care
Directly used as a source for the discharge letter, § 203 StGB preserved
Protocol with all to-dos and decisions, labeled per participant
Highlights in an artifact for colleagues who did not attend
Conversation transcript with reference to the CRM contact
Extraction of objections, commitments, next steps
Verbatim transcript, speakers clearly separated
Reused as a citable source in research projects
Full-text script for post-processing
Generation of summaries, flashcards, glossaries
Wherever post-processing is required – meeting report, discharge letter, case note, meeting summary – live transcription plus AI post-processing saves the main work: the manual typing and structuring.
Frequently asked questions
Thirteen languages: German, English, French, Italian, Spanish, Portuguese, Dutch, Russian, Arabic, Hindi, Chinese, Japanese, Korean. Mixed-language meetings (e.g. German/English alternating) are recognized without manual switching.
We stand behind anymize. And we know – when an AI tool touches client, patient or employee data, a demo video isn't enough. That's why we give you 14 days of full access – all models, all features, no credit card. Enough time to be certain, before you trust us.
Your AI workplace awaits.