The backend accepted social media URLs, uploaded files, and video-plus-SRT inputs through FastAPI. Each request became a PostgreSQL task with target languages, enabled pipeline flags, status, step outputs, failure reason, retry count, lease timing, and per-step timing metadata. Workers acquired jobs directly from Postgres using leased task rows, moved them through explicit statuses, refreshed leases during long GPU/API calls, and saved progress after every step.
The pipeline could download social media audio/video, fetch existing subtitles, or fall back to SRT extraction. For subtitle generation and repair, it integrated an ASR-based SRT extractor plus an SRT corrector that used the vocal audio to adjust subtitle timing, merge/split short segments, respect speaker similarity, and better align text with real speech. This covered diarization, segmentation, timestamp correction, and audio-aware subtitle cleanup before translation or TTS.
For translation, I built an LLM subtitle translator that chunked SRT files, supported many target languages, retried failed chunks, split difficult chunks into smaller pieces, preserved subtitle order, and validated model output. The translator checked line counts, timestamp structure, forbidden meta prefixes, no-dub markers, and target-language Unicode/codepoint ranges for languages like Japanese, Korean, and Thai so wrong-script or malformed outputs could be caught before downstream TTS.
The TTS pipeline generated zero-shot speech from reference vocal audio. It sliced reference audio per subtitle, dispatched line-level or batch TTS jobs to deployed TTS services, reconstructed a full timeline from generated lines, aligned output to subtitle slots, handled reruns for edited lines, uploaded line artifacts, and then ran post-TTS processing. Finalization mixed generated speech with accompaniment from the separation service, optionally muxed audio back into video, converted final audio, uploaded final SRT/audio/video outputs to object storage, and cached reusable artifacts by video, language, separation service, and duration limits.