Founding Engineer
Klavis AI · San Francisco, CA, US / Remote (US)
Job Description
About Klavis AI
Klavis AI is building high-quality agentic and coding data for frontier AI post-training.
Frontier models are increasingly bottlenecked not just by compute, but by the quality of the coding environments, trajectories, rubrics, rewards, and verification data used to train them. We build that data layer: long-horizon coding tasks, terminal-based software engineering environments, hidden-test verification, expert rubrics, gold trajectories, dockerized environments, and agentic tool-use workflows ready for RL and SFT. We already work with multiple frontier AI labs on production coding and agentic data.
Klavis is founded by Xiangkai Zeng and Zihao Lin. Xiangkai was a Senior Software Engineer on Google Gemini at Google DeepMind, where he built function-calling infrastructure, shipped agentic features, and co-authored the Gemini paper. Zihao was a Senior Software Engineer and Tech Lead at Lyft Recommendations ML and Nordstrom Data Infra, where he built products and infrastructure serving millions of users.
The role
We’re hiring a founding engineer who is genuinely exceptional at using LLMs and coding agents to build, test, debug, and ship real software.
You should be the kind of engineer who can make Claude Code, Codex, MCP tools, shell environments, custom evals, Docker, and agent pipelines feel like an extension of your hands. We are not looking for someone who has only tried basic ChatGPT prompts or simple API wrappers. We are looking for someone who already uses AI agents to build, debug, refactor, test, and ship faster than traditional engineering teams.
You’ll work directly with the founders to build the systems and datasets that help frontier labs train better coding and tool-use agents. In your first 30 days, you’ll onboard into our internal systems, ship improvements to them, and personally produce high-quality tasks end-to-end. By the end of the first month, you should understand what makes a task valuable for frontier post-training. In 90 days, you’ll own a core product or infrastructure area from design to production. You’ll help define what “excellent” agentic and coding data means, build systems that scale production, and directly influence how frontier labs train their next-generation agents.
What you’ll work on
- Build high-quality long-horizon coding datasets for frontier AI post-training
- Create coding tasks, hidden tests, gold solutions, rubrics, dockerized environments, and agent trajectories
- Design realistic software engineering workflows across terminals, repos, APIs, databases, and developer tools
- Use LLMs and coding agents aggressively to accelerate engineering and data production
- Build infrastructure for data generation, environment orchestration, verification, evaluation, and QA
- Work with human experts to define difficult, realistic, and verifiable coding tasks
- Design agentic tool-use workflows across SaaS apps, APIs, MCP servers, and external tools where needed
- Turn ambiguous customer needs into reliable, scalable data products
What we're looking for
- Have 3+ years of professional software engineering experience
- Are extremely strong with LLMs, coding agents, and AI-assisted engineering workflows
- Can build across Python, TypeScript, shell, Docker, Git, APIs, databases, and modern dev tools
- Have strong taste for realistic long-horizon coding tasks, test design, agent workflows, evals, and edge cases
- Care deeply about correctness, verification, reproducibility, and data quality
- Move fast without accepting sloppy work
- Want the ownership, ambiguity, and intensity of joining at the founding stage
Strong signals
- You have created long-horizon coding benchmarks, hidden tests, task environments, coding-agent evals, or data pipelines
- You have built custom coding-agent workflows, MCP servers, eval systems, or LLM orchestration tools
- You use Claude Code, Codex, Cursor, or similar tools daily and deeply understand their failure modes
- You have strong open-source, infra, systems, ML engineering, devtools, or competitive programming experience
- You can show examples of agents helping you ship real software, not just demos
Why join
- Work on a core bottleneck for frontier AI: post-training data quality
- Build products already used by frontier AI labs
- Join a YC-backed company at the founding stage
- Work directly with technical founders with deep agentic AI, infra, and ML systems experience
- Own important engineering and product decisions from day one
- Help define how future AI coding and tool-use agents are trained
Related jobs
Datadog for Startups Founding Engineering Lead
Datadog · San Francisco, California, USA · Hybrid
Founding Engineer
Ghost · San Francisco
Founding Engineering Team Lead (HandsOn)
TechBiz Global GmbH · Berlin
Founding Engineer
Mendral · San Francisco, CA, US · On-site · $180k - $220k
Founding Engineer (Full Stack Web)
Pax Historia · San Francisco, CA, US · On-site · $150k - $215k
See how well your resume matches this job before you apply
Run a free ATS check