DescriptionThis role is for one of the Weekday's clients
Salary range: Rs 2400000 - Rs 3500000 (ie INR 24-35 LPA)
Min Experience: 6 years
Location: Bangalore
JobType: full-time
We are seeking a skilled Speech Data Scientist to design, develop, and optimize advanced speech analytics and automatic speech recognition (ASR) solutions. The ideal candidate will work on end-to-end speech pipelines, multilingual audio processing, and model deployment in production environments. You will also drive research and innovation in speech processing, contributing to model enhancement and high-impact technical solutions.
RequirementsKey Responsibilities
Core Development & Implementation
- Design and implement end-to-end speech analytics pipelines for production.
- Develop ASR engines using frameworks such as Wav2vec, Whisper, and Deep Speech with PyTorch or TensorFlow.
- Build and optimize speaker diarization, language identification (LID), and text post-processing systems.
- Focus on multilingual audio processing and domain adaptation strategies.
- Lead data selection and preprocessing for improved model performance.
Model Development & Enhancement
- Develop and analyze objective measures for speech quality evaluation and enhancement.
- Implement speaker-conditioned personalization techniques to improve ASR accuracy in noisy environments.
- Optimize on-device ASR models, emphasizing multi-language scenarios.
- Guide teams on best practices for model accuracy and performance optimization.
Research & Innovation
- Conduct research on advanced speech processing and neural speech enhancement techniques.
- Develop novel solutions for multi-speaker and complex audio scenarios.
- Contribute to patents, publications, and technical thought leadership in speech technology.
- Stay updated on transformer models, attention mechanisms, and foundation models.
Technical Integration & Deployment
- Design integration architectures for speech-to-text services and related technologies.
- Implement MLOps processes and CI/CD pipelines for speech model deployment.
- Deploy and scale speech solutions on cloud platforms (AWS, GCP).
- Develop production-ready applications using Python, C++, and Java.
Required Qualifications
Education
- Ph.D./M.S./M.Tech in Computer Science, Signal Processing, or related field preferred.
- B.Tech/B.E in ECE, CSE, or related technical field required.
Technical Expertise
- Speech Processing: 3–6 years of hands-on experience in ASR and speech analytics. Strong knowledge of HMMs, GMMs, ANNs, language modeling, CNNs, RNNs, LSTMs, CTC, and attention mechanisms.
- Machine Learning / Deep Learning: Proficiency in PyTorch and TensorFlow; experience with transformer models (BERT, Wav2vec 2.0, Whisper) and end-to-end ASR implementation.
- Programming & Tools: Strong Python skills (numpy, pandas, scikit-learn), experience with C++/Java for production, bash scripting, and Git.
- Cloud & Deployment: Hands-on experience with AWS/GCP, containerization (Docker, Kubernetes), MLOps, CI/CD pipelines, and scalable model serving.
Skills
ASR, Speech Recognition, Speech Analytics, Multilingual Audio Processing, Python, PyTorch, TensorFlow, Deep Learning