Research Scientist, Foundation Model
Pika · Palo Alto HQ
Posted May 16, 2026
Job Description
About the Role
At Pika, we are pioneering the next generation of creative infrastructure built around real-time, multimodal generation and intelligent agentic platforms. We are seeking accomplished Research Scientists in Foundation Models with expertise in pre-training and mid-training large-scale multimodal foundation models to advance our mission of making agentic, real-time generative technology accessible and transformative for millions of creators. This is a staff and lead-level opportunity.
As a key member of our research team, you will design and implement core technologies, develop new methodologies for large-scale multimodal pre-training/mid-training (text, image, audio, and video), and drive innovative approaches for foundational model architecture. You will collaborate closely with engineering and product teams, shaping the future of real-time creative and agentic platforms at scale.
What You’ll Do
Lead research and development on pre-training and mid-training of multimodal foundation models at scale.
Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interaction across modalities.
Focus on scalable data pipeline curation and model training strategies for broad, diverse, and sensory-rich datasets.
Advance state-of-the-art techniques in diffusion, autoregressive, and other generative models for large-scale pre-training and fine-tuning.
Identify, create, and leverage large, high-quality cross-modal datasets.
Bring research advancements into production-ready systems in collaboration with engineering and product teams.
Publish work in top-tier conferences and journals, and clearly communicate research both internally and externally.
Stay at the forefront of foundational model and real-time multimodal AI research.
What We’re Looking For
5+ years of research experience in large-scale pre-training/mid-training of multimodal foundation models (LLMs, VLMs, Audio LMs, or similar), ideally at the staff or lead scientist level.
Track record as a first author on major publications in top conferences or journals (e.g., NeurIPS, ICML, ICLR).
Extensive hands-on experience with large-scale multimodal model design, training, and deployment.
Deep understanding and implementation experience with generative architectures (diffusion, autoregressive, cross-modal, etc.).
Expertise in high-throughput, scalable dataset curation and model pipeline optimization for multimodal applications.
Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.) and experience deploying research into production systems.
Excellent communication and collaboration skills, and a passion for building creative enabling technology.
What We Offer
Competitive salary and substantial equity in a high-growth startup
Full health benefits + 401k matching and more
Collaborative, mission-driven team environment with major growth opportunities
Flexible on-site/remote hybrid (HQ in Palo Alto, CA)
About Pika
Pika empowers creators by building state-of-the-art agentic and multimedia platforms. Our vision is to break down technical barriers to creativity, making real-time generative and intelligent orchestration accessible to all. Join us and help shape the next evolution of creative technology!
If you are a leading researcher excited to build and scale real-time multimodal foundation models, we want to hear from you.
More jobs at Pika
See how well your resume matches this job before you apply
Run a free ATS check