STT: clean temp audio files on transcription failure

STTService._transcribe_local writes the audio to a NamedTemporaryFile
(delete=False) and only unlinks it on the success path, before the except.
If model.transcribe() raises (corrupt audio, model/runtime error, etc.) the
function logs, returns None, and leaves the .webm temp file behind — so
every failed local transcription leaks a file in the system temp dir.

Initialize tmp_path = None up front and move the unlink into a finally
block so the temp file is cleaned up whether transcription succeeds or
raises.

tests/test_stt_leak.py stubs the whisper model to raise during transcribe,
runs _transcribe_local, and asserts it returns None and leaves no new .webm
file in the temp dir. Fails before this change.
This commit is contained in:
Tatlatat
2026-06-02 18:43:24 +07:00
committed by GitHub
parent f8e3bfeaff
commit 3885f9fa90
2 changed files with 34 additions and 3 deletions
+30
View File
@@ -0,0 +1,30 @@
import os
import tempfile
from services.stt.stt_service import STTService
def test_stt_local_transcribe_leak_on_error():
service = STTService()
class MockWhisper:
def transcribe(self, *args, **kwargs):
raise ValueError("Simulated transcribe error")
service._get_whisper = lambda: MockWhisper()
# Track WebM files in the temp directory before running transcription
temp_dir = tempfile.gettempdir()
webm_before = {f for f in os.listdir(temp_dir) if f.endswith(".webm")}
# Run transcription, which will raise ValueError internally
result = service._transcribe_local(b"dummy_audio_data")
# Track WebM files in the temp directory after running transcription
webm_after = {f for f in os.listdir(temp_dir) if f.endswith(".webm")}
# Assert that it returned None (failure)
assert result is None
# Assert that no new temp files were leaked
leaked = webm_after - webm_before
assert len(leaked) == 0, f"Leaked files: {leaked}"