Raleigh, North Carolina, United States
Principal Software Engineer, Model Inference
About the Company:
Our client's scalable artificial intelligence (AI) and machine learning (ML) platform enables enterprises to create and deliver AI-enabled applications at scale across hybrid cloud environments. Built using open-source technologies, OpenShift AI provides trusted, operationally consistent capabilities for teams to experiment, serve models, and deliver innovative apps.
Role Overview:
The OpenShift AI team seeks a Principal Software Engineer with Kubernetes and Model Inference Runtimes experience to join their rapidly growing engineering team. This team focuses on making machine learning model deployment and monitoring seamless and scalable across the hybrid cloud and the edge. This is a fascinating opportunity to build and impact the next generation of hybrid cloud MLOps platforms.
What will you do?
-
Develop and maintain a high-quality, high-performing ML inference runtime platform for multi-modal and distributed model serving.
-
Contribute directly to upstream inference runtime communities such as vLLM, TGI, PyTorch, OpenVINO, and others.
-
Maintain CI/CD build pipelines for container images that allow faster, more secure, reliable, and frequent releases
-
Coordination and communication with various stakeholders
-
Applying a growth mindset by staying up to date with AI and ML advancements
What will you bring?
-
Highly experienced with programming in Python and PyTorch
-
Familiarity with model parallelization, quantization, and memory optimization using vLLM, TGI, and other inference libraries.
-
Experience with Python packaging such as PyPI libraries
-
Development experience with C++ especially with the CUDA APIs is a big plus
-
Solid understanding of the fundamentals of model inferencing architectures
-
Experience with Jenkins, Git, shell scripting, and related technologies
-
Experience with the development of containerized applications in Kubernetes
-
Experience with Agile development methodologies
-
Experience with Cloud Computing using at least one of the following Cloud infrastructures AWS, GCP, Azure, or IBM Cloud
-
Ability to work across a large distributed hybrid engineering team
-
Experience with open-source development is a plus
This position follows a hybrid work model, requiring 3 days per week on-site in Raleigh, NC.
This is a fantastic opportunity to work on cutting-edge AI/ML technology and contribute to innovative solutions in hybrid cloud environments. If this role excites you and matches your background, we encourage you to apply!