LLMOps Lead
Cognichip Inc.
At Cognichip, we are building the next generation, enterprise product suite to empower semiconductor design engineers to achieve a 10x productivity boost with proprietary AI/ML models and modern cloud technologies.
We are seeking a Staff LLMOps Engineer to architect, deploy, and optimize our large language model (LLM) infrastructure on the cloud. This role focuses on taking trained models to production, scaling them efficiently across GPU clusters, and driving innovations in inference optimization. You will work closely with AI scientists, DevOps, and platform teams to ensure low-latency, high-throughput model serving for our enterprise SaaS product.
Core Responsibilities
● Design and implement production-ready LLM deployment pipelines on AWS and Kubernetes/EKS.
● Build and scale LLM inference infrastructure (multi-GPU, multi-node) for high availability, low latency, and cost efficiency.
● Optimize inference performance using vLLM, SGLang, or similar frameworks.
● Implement advanced serving techniques: continuous batching, speculative decoding, KV-cache management, paged attention, and distributed scheduling.
● Collaborate with AI researchers to operationalize model training outputs into production-grade services.
● Establish monitoring and observability for LLM serving: latency, throughput, GPU utilization, failure recovery.
● Drive automation of infrastructure provisioning, scaling, and updates using IaC (Terraform) and CI/CD pipelines.
● Partner with security and compliance teams to ensure secure multi-tenant model hosting aligned with enterprise-grade requirements.
Required Qualifications
● 5+ years of experience in DevOps/AI infrastructure, with 2+ years focused on LLMOps (production deployment & optimization).
● Proven track record of deploying and scaling LLMs in production environments.
● Hands-on experience with GPU-accelerated inference and distributed AI serving.
● Strong understanding of cloud-native architectures and secure enterprise SaaS deployment.
What We Offer
● Opportunity to own and scale LLM infrastructure at a disruptive AI startup.
● Competitive compensation package, including equity participation.
● A team of high-caliber collaborators at the intersection of AI, cloud, and semiconductor design.
● A culture of innovation, precision, and impact, where your work directly shapes the future of engineering.