Machine Learning Systems Engineer
Inception Labs
Software Engineering
San Francisco, CA, USA
Posted on Aug 7, 2025
Machine Learning Systems Engineer
Bay Area
Engineering
In office
Full-time
About Us:
Inception is a generative AI startup. Leveraging breakthrough AI research, we are training next-generation large language models (LLM) powered by diffusion. Unlike existing auto-regressive models, which only output one token at a time, diffusion LLMs can output many tokens in parallel. This means that they are several times faster and can leverage their additional test-time compute to improve quality. They also enable fine-grained control over their outputs to adhere to specific schema and semantic constraints, and they provide a unified paradigm for combining language with other data modalities, including audio, images, and videos.
Our team is led by Stefano Ermon (co-inventor of diffusion models, flash attention, and DPOl faculty at Stanford), Aditya Grover (co-inventor of node2vec and decision transformers; faculty at UCLA), and Volodymyr Kuleshov (prev. co-founder and CTO at Afresh Technologies; faculty at Cornell), and includes engineers from Google Deepmind, Meta AI, Microsoft AI, and OpenAI. We are in the process of deploying our models at Fortune 500 companies.
Role Overview:
We are looking for ML Systems Engineers with a strong background in distributed systems, infrastructure engineering, and machine learning operations. In this role, you will work on designing and implementing the infrastructure that powers our ML training and inference systems. You will collaborate with ML researchers and engineers to build efficient, reliable, and scalable systems that enable the development and deployment of state-of-the-art LLMs.
Key Responsibilities:
- Design and implement distributed training infrastructure for large-scale machine learning models
- Build and optimize high-performance model serving systems for low-latency inference
- Develop automated pipelines for data preprocessing, model training, and deployment
- Create monitoring and observability solutions for ML systems in production
- Optimize infrastructure costs and resource utilization across GPU clusters
- Design and implement efficient data storage and retrieval systems for ML workloads
- Collaborate with ML researchers to translate theoretical requirements into practical system designs
Qualifications:
- BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience)
- Strong software engineering fundamentals and systems design principles
- Extensive experience with distributed systems and cloud computing platforms (AWS/GCP/Azure)
- Proficiency in Python and at least one systems programming language (C++/Rust/Go)
- Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines
- Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective
- Familiarity with high-performance computing and GPU programming (CUDA)
Preferred Skills:
- Experience building and maintaining large-scale ML training clusters
- Knowledge of ML serving frameworks (vLLM, TensorRT, ONNX Runtime)
- Familiarity with distributed training techniques (data parallel, model parallel, pipeline parallel)
- Experience with ML workflow orchestration tools (Kubeflow, Airflow)
- Background in performance optimization and profiling of ML systems
- Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.)
- Experience with MLOps practices and tooling
Why Join Us:
- Impact: Build the infrastructure that enables next-generation AI development
- Innovation: Solve complex distributed systems challenges in the ML domain
- Growth: Shape the architecture of our ML platform from the ground up
Perks & Benefits:
- Competitive salary and equity in a rapidly growing startup
- Flexible vacation and paid time off (PTO)
- Health, dental, and vision insurance
- Professional development budget for conferences and courses
- Access to the latest GPU hardware and cloud resources
- A collaborative and inclusive culture where your voice matters
This is an exciting opportunity to join a startup at the forefront of LLM development! If you're ready to build the systems that power the future of AI, apply today.
We are an equal opportunity employer and encourage candidates of all backgrounds to apply.
First name *
Last name *
Email *
LinkedIn URL
Resume *
Click to upload or drag and drop here
By applying you agree to Gem's terms and privacy policy.
Save your info to apply to other roles faster & help employers reach you.
Req ID: R3