VANTA
Target speaker extraction. Upload a 5-second reference clip of one voice and a messy recording — the model isolates that voice and returns it without everything else.
backend: checking…
01 · reference voice
Drop audio file or click to browse≈5 seconds, clean audio of the target speaker
02 · noisy recording
Drop audio file or click to browsethe messy audio you want cleaned up (up to 30s)