Full Time

AI/LLM Evaluation & Alignment Software Engineer - LeoTech - Austin, TX

LeoTech

Austin, TX
Posted 14 days ago

At LeoTech, we are passionate about building software that solves real-world problems in the Public Safety sector. Our software has been used to help the fight against continuing criminal enterprises, drug trafficking organizations, identifying financial fraud, disrupting sex and human trafficking rings and focusing on mental health matters to name a few.

Role
• This is a remote, WFH role.
• As an AI/LLM Evaluation & Alignment Engineer on our Data Science team, you will play a critical role in ensuring that our Large Language Model (LLM) and Agentic AI solutions are accurate, safe, and aligned with the unique requirements of public safety and law enforcement workflows. You will design and implement evaluation frameworks, guardrails, and bias-mitigation strategies that give our customers confidence in the reliability and ethical use of our AI systems. This is an individual contributor (IC) role that combines hands-on technical engineering with a focus on responsible AI deployment. You will work closely with AI engineers, product managers, and DevOps teams to establish standards for evaluation, design test harnesses for generative models, and operationalize quality assurance processes across our AI stack.

Core Responsibilities
• Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases.
• Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows.
• Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability).
• Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems.
• Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios.
• Research and integrate third-party evaluation framew