Tech Lead ASR / TTS / Speech LLM (IC Mentor)
Company: OutcomesAI
Location: Boston
Posted on: February 14, 2026
|
|
|
Job Description:
Job Description Job Description OutcomesAI is a healthcare
technology company building an AI-enabled nursing platform designed
to augment clinical teams, automate routine workflows, and safely
scale nursing capacity. Our solution combines AI voice agents and
licensed nurses to handle patient communication, symptom triage,
remote monitoring, and post-acute care — reducing administrative
burden and enabling clinicians to focus on direct patient care. Our
core product suite includes: ? Glia Voice Agents – multimodal
conversational agents capable of answering patient calls, triaging
symptoms using evidence-based protocols (e.g., Schmitt-Thompson),
scheduling visits, and delivering education and follow-ups. ? Glia
Productivity Agents – AI copilots for nurses that automate
charting, scribing, and clinical decision support by integrating
directly into EHR systems such as Epic and Athena. ? AI-Enabled
Nursing Services – a hybrid care delivery model where AI and
licensed nurses work together to deliver virtual triage, remote
patient monitoring, and specialty patient support programs (e.g.,
oncology, dementia, dialysis). Our AI infrastructure leverages
multimodal foundation models — incorporating speech recognition
(ASR), natural language understanding, and text-to-speech (TTS) —
fine-tuned for healthcare environments to ensure safety, empathy,
and clinical accuracy. All models operate within a HIPAA-compliant
and SOC 2–certified framework. OutcomesAI partners with leading
health systems and virtual care organizations to deploy and
validate these capabilities at scale. Our goal is to create the
world’s first AI nurse hybrid workforce , improving access, safety,
and efficiency across the continuum of care. Lead the end-to-end
technical development of speech models (ASR, TTS, Speech-LLM) —
from architecture, training strategy, and evaluation to production
deployment.You’ll act as an individual contributor and mentor,
guiding a small team working on model training, synthetic data
generation, active learning, and inference optimization for
healthcare applications. As a Tech Lead specializing in ASR, TTS,
and Speech LLM, you will spearhead the technical development of
speech models. This involves everything from architectural design
and training strategies to evaluation and production deployment.
This role is a blend of individual contribution and mentorship. You
will guide a small team focused on model training, synthetic data
generation, active learning, and inference optimization, all within
the context of healthcare applications. What You’ll Do Own the
technical roadmap for STT/TTS/Speech LLM model training: from model
selection ? fine-tuning ? deployment. Evaluate and benchmark
open-source models (Parakeet, Whisper, etc.) using internal test
sets for WER, latency, and entity accuracy. Design and review data
pipelines for synthetic and real data generation (text selection,
speaker selection. voice synthesis, noise/distortion augmentation).
Architect and optimize training recipes (LoRA/adapters, RNN-T,
multi-objective CTC MWER). Lead integration with Triton Inference
Server (TensorRT/FP16) and ensure K8s autoscaling for 1000
concurrent streams. Implement Language Model biasing APIs , WFST
grammars, and context biasing for domain accuracy. Guide evaluation
cycles, drift monitoring, and model switcher/failover strategies.
Mentor engineers on data curation, fine-tuning, and model serving
best practices. Collaborate with backend/ML-ops for production
readiness, observability, and health metrics. Desired Skills Deep
expertise in speech models (ASR, TTS, Speech LLM) and training
frameworks (PyTorch, NeMo, ESPnet, Fairseq). Proven experience with
streaming RNN-T / CTC architectures, LoRA/adapters, and TensorRT
optimization. Telephony robustness: Codec augmentation (G.711
?-law, Opus, packet loss/jitter), AGC/loudness norm, band-limit
(300–3400 Hz), far-field/noise simulation. Strong understanding of
telephony noise, codecs, and real-world audio variability.
Experience in Speaker Diarization, turn detection model, smart
voice activity detectionEvaluation: WER/latency curves, Entity-F1
(names/DOB/meds), confidence metrics. TTS : VITS/FastPitch/Glow
-TTS/Grad-TTS/StyleTTS2, CosyVoice/NaturalSpeech -3 style transfer,
BigVGAN/UnivNet vocoders, zero-shot cloning. Speech LLM: Model
development and integration with Voice agent pipeline. Experience
deploying models with Triton Inference Server, Kubernetes, and GPU
scaling. Hands-on with evaluation metrics (WER, F1 on entities,
latency p50/p95). Familiarity with LM biasing, WFST grammars, and
context injection. Strong mentorship and code-review discipline.
Qualifications M.S. / Ph.D. in Computer Science, Speech Processing,
or related field. 7–10 years of experience in applied ML, at least
3 in speech or multimodal AI. Track record of shipping production
ASR/TTS models or inference systems at scale. We may use artificial
intelligence (AI) tools to support parts of the hiring process,
such as reviewing applications, analyzing resumes, or assessing
responses. These tools assist our recruitment team but do not
replace human judgment. Final hiring decisions are ultimately made
by humans. If you would like more information about how your data
is processed, please contact us.
Keywords: OutcomesAI, Fall River , Tech Lead ASR / TTS / Speech LLM (IC Mentor), IT / Software / Systems , Boston, Massachusetts