Full Time

Harness Engineer (AI Agent Systems) - Truewind - San Francisco, CA

Truewind

San Francisco, CA
180K–200K a year
Posted 14 days ago

What this role is

We build AI agents that do real work.

Not assistants. Not demos.

Agents that execute workflows end-to-end and produce correct outcomes.

Your job is to:
• build those agents
• and build the systems that make them reliable

This is not prompt engineering.

This is making AI work in production.

What you’ll do
• Build agents that execute multi-step workflows
• Design systems for validation, retry, and failure handling
• Define constraints (schemas, invariants, contracts)
• Add feedback loops (detect → debug → improve)
• Turn failures into reusable systems

What this role is NOT
• Not prompt engineering
• Not one-shot demos
• Not feature-heavy product work

You are building agents that do the work, and the systems that ensure they do it correctly.

Note: This is different from “vibe coding.” You won’t just prompt and accept outputs. You’ll build systems so results are reliable and repeatable.

What we’re looking for
• Strong systems thinking
• Background in:
• infrastructure, backend, or data systems
• developer tools or internal platforms
• Experience building reliable systems (not just features)
• Comfortable debugging complex, ambiguous problems

Important:

LLM experience alone is not enough.

We care about how you make systems reliable.

Good fit if you:
• Think in constraints, invariants, and feedback loops
• Care about correctness, not just output quality
• Have automated real workflows end-to-end
• Prefer building systems over features

Not a fit if you:
• Mostly prompt models and accept outputs
• Have only built demos or prototypes
• Avoid debugging or failure handling

Application (required)

1. Project (GitHub)

An agent system that:
• performs a multi-step task
• includes validation
• handles failures (retry, fallback, etc.)

2. Short answer (5–10 sentences)

Describe a system where an AI agent failed.

What caused it, and how would you fix it?

How we measure success
• Agents complete real workflows with minimal human input
• Outputs a