Shizuku AI logo
Shizuku AI

ML Engineer

Salary
800万 - 2200万
Location
Tokyo
Remote
On-site / hybrid
Visa
Sponsorship available
Language
Japanese: Business Level / English: Fluent
Posted
Apr 10, 2026
Python
C++
Typescript
Kubernetes
AWS
Apply now

Review the role details and submit your application.

Apply Now
Shizuku AI office view

Gallery

Office environment
Team culture
Workspace
Company culture

Overview

MISSION

Lead the R&D of the AI models that power Shizuku’s voice and intelligence. With TTS (Text-to-Speech) as the core pillar, push the boundaries across NLP, speech recognition, and — looking ahead — computer vision and humanoid robotics, evolving Shizuku’s expressive capabilities across multiple modalities.

Balance continuous improvement of production TTS models with exploration and development of next-generation architectures, while owning the MLOps cycle to drive Shizuku’s ongoing evolution.

Responsibilities

  • Own the full TTS model lifecycle: research, architecture design, training, evaluation, and iterative improvement
  • Continuously improve production TTS models while exploring and prototyping next-generation architectures
  • Design and build TTS quality evaluation infrastructure and define evaluation criteria
  • Expand into multimodal domains: NLP, speech recognition, and future frontiers including vision and humanoid robotics
  • Design training data collection pipelines, preprocessing workflows, and quality assurance processes
  • Build and operate the MLOps cycle — training, evaluation, and deployment — until a dedicated hire is in place
  • Collaborate with the SWE team on production integration: inference optimization, latency reduction, and more

Required Skills

  • 2+ years of deep expertise and hands-on experience in at least one of: NLP, speech (TTS/ASR), or computer vision
  • Experience training, evaluating, and improving models using deep learning frameworks such as PyTorch
  • End-to-end ownership of the ML workflow: from data preparation and experiment management to model deployment
  • Track record of independently surveying papers, reproducing implementations, and applying findings to production systems
  • Ability to work on-site at our Tokyo office (primarily in-office with flexible remote arrangements)
  • Deep Expertise with Cross-Domain Reach — You bring rigorous depth in a specific modality while reaching across TTS, NLP, vision, and beyond. You don’t say “that’s outside my specialty” — you do what Shizuku’s evolution demands
  • Zero-to-One Explorer — You go beyond applying existing methods. You formulate hypotheses for uncharted technical challenges, iterate through validation cycles, and tackle questions that have no known answers
  • Purpose-Driven Ownership — You reverse-engineer from the goal of “making Shizuku’s models better,” crossing the boundaries of research, implementation, and operations to drive outcomes autonomously
  • Comfort with Ambiguity — You define your own success metrics and build collection pipelines from scratch in an environment where nothing is predefined
  • Humility & Respect — You collaborate authentically with teammates who bring different areas of expertise

Preferred Skills

  • Research or development experience in TTS (VITS, Grad-TTS, NaturalSpeech, StyleTTS, etc.)
  • Development experience in robotics or autonomous driving domains
  • Technical knowledge in speaker adaptation, emotion control, and prosody modeling for speech synthesis
  • Experience developing ASR, NLP, or multimodal models
  • Experience building and operating GPU training environments (A100, L4, etc.) on AWS/GCP
  • Experience with model development in Slurm environments, particularly multi-node training setups
  • Proficiency with experiment tracking tools: MLflow, Weights & Biases, DVC, etc.
  • Experience with inference optimization using ONNX Runtime, TensorRT, vLLM, etc.
  • Peer-reviewed publications in related fields
  • Technical communication skills in English (currently Japanese-first internally; transitioning to a global environment in the mid-term)

About Shizuku AI

Lead the R&D of the AI models that power Shizuku’s voice and intelligence. With TTS (Text-to-Speech) as the core pillar, push the boundaries across NLP, speech recognition, and — looking ahead — computer vision and humanoid robotics, evolving Shizuku’s expressive capabilities across multiple modalities.

Balance continuous improvement of production TTS models with exploration and development of next-generation architectures, while owning the MLOps cycle to drive Shizuku’s ongoing evolution.

Shizuku is a Japan-born AI companion actively engaging audiences on YouTube and X (formerly Twitter). Already running live streams and cultivating a growing community, Shizuku is now entering its next phase of rapid scale.

As the first Japanese startup to receive investment from a16z, we closed our seed round and are on a mission to bring Japanese entertainment × AI to the global stage.

TEAM STRUCTURE

You will work directly alongside co-founder Aki — an ML engineer and researcher with experience at Meta and Luma AI — to drive Shizuku’s model development. Expect daily sparring sessions on research direction and architecture design with a founder who brings firsthand experience at the frontier. Initially, you’ll handle lightweight MLOps pipeline work yourself; as we hire a dedicated MLOps engineer, responsibilities will gradually separate.

DEVELOPMENT ENVIRONMENT & RESOURCES

Existing Models: A TTS model is already in production. You’ll drive improvements in parallel with next-gen model exploration

Training Data: Shizuku’s publicly available YouTube data serves as a foundational dataset. You’ll be involved from collection pipeline design onward

Evaluation Infrastructure: TTS quality evaluation framework is greenfield — you’ll design evaluation criteria (MOS, PESQ, etc.) from scratch

Quick Facts

CompanyShizuku AI
LocationTokyo
Salary800万 - 2200万
RemoteOn-site / hybrid
VisaAvailable
LanguageJapanese: Business Level / English: Fluent
Interested in this role?

Submit your application for this role at Shizuku AI.

Apply Now