Founding Engineer

Klavis AI · San Francisco, CA, US / Remote (US)

$150k - $200k

Remote

Full-time

Mid

Check your resume against this job Apply on Ycwaas

Get personalized match scores and job alerts

Job Description

About Klavis AI

Klavis AI is building high-quality agentic and coding data for frontier AI post-training.

Frontier models are increasingly bottlenecked not just by compute, but by the quality of the coding environments, trajectories, rubrics, rewards, and verification data used to train them. We build that data layer: long-horizon coding tasks, terminal-based software engineering environments, hidden-test verification, expert rubrics, gold trajectories, dockerized environments, and agentic tool-use workflows ready for RL and SFT. We already work with multiple frontier AI labs on production coding and agentic data.

Klavis is founded by Xiangkai Zeng and Zihao Lin. Xiangkai was a Senior Software Engineer on Google Gemini at Google DeepMind, where he built function-calling infrastructure, shipped agentic features, and co-authored the Gemini paper. Zihao was a Senior Software Engineer and Tech Lead at Lyft Recommendations ML and Nordstrom Data Infra, where he built products and infrastructure serving millions of users.

The role

We’re hiring a founding engineer who is genuinely exceptional at using LLMs and coding agents to build, test, debug, and ship real software.

You should be the kind of engineer who can make Claude Code, Codex, MCP tools, shell environments, custom evals, Docker, and agent pipelines feel like an extension of your hands. We are not looking for someone who has only tried basic ChatGPT prompts or simple API wrappers. We are looking for someone who already uses AI agents to build, debug, refactor, test, and ship faster than traditional engineering teams.

You’ll work directly with the founders to build the systems and datasets that help frontier labs train better coding and tool-use agents. In your first 30 days, you’ll onboard into our internal systems, ship improvements to them, and personally produce high-quality tasks end-to-end. By the end of the first month, you should understand what makes a task valuable for frontier post-training. In 90 days, you’ll own a core product or infrastructure area from design to production. You’ll help define what “excellent” agentic and coding data means, build systems that scale production, and directly influence how frontier labs train their next-generation agents.

What you’ll work on

Build high-quality long-horizon coding datasets for frontier AI post-training
Create coding tasks, hidden tests, gold solutions, rubrics, dockerized environments, and agent trajectories
Design realistic software engineering workflows across terminals, repos, APIs, databases, and developer tools
Use LLMs and coding agents aggressively to accelerate engineering and data production
Build infrastructure for data generation, environment orchestration, verification, evaluation, and QA
Work with human experts to define difficult, realistic, and verifiable coding tasks
Design agentic tool-use workflows across SaaS apps, APIs, MCP servers, and external tools where needed
Turn ambiguous customer needs into reliable, scalable data products

What we're looking for

Have 3+ years of professional software engineering experience
Are extremely strong with LLMs, coding agents, and AI-assisted engineering workflows
Can build across Python, TypeScript, shell, Docker, Git, APIs, databases, and modern dev tools
Have strong taste for realistic long-horizon coding tasks, test design, agent workflows, evals, and edge cases
Care deeply about correctness, verification, reproducibility, and data quality
Move fast without accepting sloppy work
Want the ownership, ambiguity, and intensity of joining at the founding stage

Strong signals

You have created long-horizon coding benchmarks, hidden tests, task environments, coding-agent evals, or data pipelines
You have built custom coding-agent workflows, MCP servers, eval systems, or LLM orchestration tools
You use Claude Code, Codex, Cursor, or similar tools daily and deeply understand their failure modes
You have strong open-source, infra, systems, ML engineering, devtools, or competitive programming experience
You can show examples of agents helping you ship real software, not just demos

Why join

Work on a core bottleneck for frontier AI: post-training data quality
Build products already used by frontier AI labs
Join a YC-backed company at the founding stage
Work directly with technical founders with deep agentic AI, infra, and ML systems experience
Own important engineering and product decisions from day one
Help define how future AI coding and tool-use agents are trained

Related jobs

Datadog for Startups Founding Engineering Lead

Datadog · San Francisco, California, USA · Hybrid

Founding Engineer

Ghost · San Francisco

Founding Engineer / Senior Full Stack (AI-native Freight Forwarder)

Stetig · Frankfurt am Main · Remote

Founding Engineer / Senior Full Stack (AI-native Freight Forwarder)

Stetig Logistics GmbH · Frankfurt am Main · Remote

Founding Engineer

Bravi · Paris, IDF, FR / Paris, Île-de-France, FR / Remote (US) · Remote

See how well your resume matches this job before you apply

Run a free ATS check