Full Time

Data & LLM Systems Engineer - AllSpice - Boston, MA

AllSpice

Boston, MA
Posted 18 days ago

About the position

We’re looking for a Data & LLM Systems Engineer to help us design, build, and operate the systems that sit at the intersection of hardware design, data, applications, and large language models (LLMs).\nIn this role, you’ll own how data flows from raw inputs into structured systems, how that data is exposed through our suite of applications, and how LLM interactions are instrumented, analyzed, and improved over time. You’ll work closely with our GenAI, Platform, and Infrastructure teams to ensure DRCY is reliable, observable, and continuously getting better.\nThis is not a research-only role and not a frontend-only role. It’s a hands-on engineering position focused on building real systems that people and products depend on.

Responsibilities

Design, build, and maintain data pipelines for ingesting, cleaning, transforming, and storing dataDefine and evolve data schemas that support analytics, applications, and LLM workflowsWork with relational databases, analytical data stores, and vector databasesEnsure data reliability, performance, and cost efficiencyImplement best practices around data versioning, lineage, and access controlBuild backend services and APIs that expose data to internal tools and user-facing applicationsDevelop applications and internal tooling for managing datasets, experiments, prompts, and configurationsCollaborate with product and design to ensure tools are usable, safe, and scalableSupport both real-time and batch processing workflowsDesign systems that track prompt versions, context construction, and model configurationsInstrument LLM interactions to capture inputs, outputs, metadata, latency, and costHelp establish standards for monitoring, debugging, and evaluating LLM behavior in productionAnalyze LLM outputs and user interactions to identify failure modes, drift, and quality issues and help ensure overall reliability and consistencyDefine and track metrics related to response quality, task success, and user outcomesRun