
Review the role details and submit your application.





MISSION
Lead the R&D of the AI models that power Shizuku’s voice and intelligence. With TTS (Text-to-Speech) as the core pillar, push the boundaries across NLP, speech recognition, and — looking ahead — computer vision and humanoid robotics, evolving Shizuku’s expressive capabilities across multiple modalities.
Balance continuous improvement of production TTS models with exploration and development of next-generation architectures, while owning the MLOps cycle to drive Shizuku’s ongoing evolution.
Lead the R&D of the AI models that power Shizuku’s voice and intelligence. With TTS (Text-to-Speech) as the core pillar, push the boundaries across NLP, speech recognition, and — looking ahead — computer vision and humanoid robotics, evolving Shizuku’s expressive capabilities across multiple modalities.
Balance continuous improvement of production TTS models with exploration and development of next-generation architectures, while owning the MLOps cycle to drive Shizuku’s ongoing evolution.
Shizuku is a Japan-born AI companion actively engaging audiences on YouTube and X (formerly Twitter). Already running live streams and cultivating a growing community, Shizuku is now entering its next phase of rapid scale.
As the first Japanese startup to receive investment from a16z, we closed our seed round and are on a mission to bring Japanese entertainment × AI to the global stage.
TEAM STRUCTURE
You will work directly alongside co-founder Aki — an ML engineer and researcher with experience at Meta and Luma AI — to drive Shizuku’s model development. Expect daily sparring sessions on research direction and architecture design with a founder who brings firsthand experience at the frontier. Initially, you’ll handle lightweight MLOps pipeline work yourself; as we hire a dedicated MLOps engineer, responsibilities will gradually separate.
DEVELOPMENT ENVIRONMENT & RESOURCES
Existing Models: A TTS model is already in production. You’ll drive improvements in parallel with next-gen model exploration
Training Data: Shizuku’s publicly available YouTube data serves as a foundational dataset. You’ll be involved from collection pipeline design onward
Evaluation Infrastructure: TTS quality evaluation framework is greenfield — you’ll design evaluation criteria (MOS, PESQ, etc.) from scratch
Submit your application for this role at Shizuku AI.