LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer - Qualcomm - San Diego, CA

Company: Qualcomm Technologies, Inc.

Job Area: Engineering Group, Engineering Group > Machine Learning Engineering

General Summary:

LLM Serving Engineer (Cloud AI Engineering )

Qualcomm is utilizing its traditional strengths in digital wireless technologies to play a central role in the evolution of Cloud AI. We are investing in several supporting technologies including Deep Learning. The Qualcomm Cloud AI team is developing hardware and software solutions for Inference Acceleration.

We are hiring LLM Serving Engineers at multiple levels to join our dynamic, collaborative team. This role spans the full product lifecycle—from cutting-edge research and development to commercial deployment—and demands strategic thinking, strong execution, and excellent communication skills.

This role involve s the following activities:

Building a scalable LLM inference platform using inference techniques ( e.g. disaggregated serving and KV - C ache management , advanced parallelism, speculative algorithms , model optimization , specialized kernels ) .

Contribute to the development of LLM Serving packages ( e.g. vLLM , SGLang , TGI, Triton -Inferen ce server , D ynamo, LLM-d ) .

Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.

Work at the fore front of Gen AI by understanding advanced algorithms ( e.g. attention mechanisms, MoEs ) and numerics to identify new optimization opportunities .

Drive efficient serving through smart autoscaling, load balancing and routing .

Engage with open - source serving communities to evolve the framework .

Candidates for this position will demonstrate the following :

Hands-on e xperience in one or more of the following LLM serving /Orchestration packages ( Triton