hero

THE FUTURE OF TECH IS YOURS TO BUILD

Learn more about opportunities in Alkeon’s VC Portfolio
companies
Jobs

AI ML Software Engineering PMTS

Own Company

Own Company

Software Engineering, Data Science
Bengaluru, Karnataka, India
Posted on Dec 8, 2025

Description

Role Description:
As a Principal Engineer on the Agentforce Deployment Platform team, you will own the end-to-end architecture, strategy, and execution of our AI/ML deployment and operationalization systems. You’ll collaborate closely with software engineers, data scientists, product managers, and data teams to build and turn cutting-edge architecture and research into scalable, highly available, and compliant production-ready systems.
You are not just a coder — you are a thought leader, innovator, builder and mentor who thrives on ownership and pushing boundaries in production MLOps, AI infrastructure, and reliable delivery in a rapidly changing and cutting edge space.

Key Responsibilities:

* Lead the architectural vision for our global-scale ML serving, inference, and model management platform.
* Design and optimize low-latency, high-throughput model serving infrastructure and data flow for training and inference at scale.
* Strategize and implement AI assisted migration platform that is proactive governance and reactive autonomous remediation by enforcing policies at every stage for Deployment lifecycle.
* Work with product and business teams to translate user needs into technical requirements, focusing on platform capabilities for rapid iteration and secure deployment.
* Set long-term technical strategy and direction, serving as a top-tier technical mentor for engineers across teams.
* Drive adoption of cutting-edge MLOps best practices for model training, secure and automated deployment, proactive monitoring, and robust governance.
* Innovate not just in model building, but in how models are packaged, delivered, and operated in a mission-critical environment.
* Make strategic technical decisions on build vs buy, model selection, and core platform infrastructure to ensure scalability and cost-efficiency.

Required Skills:

* 15+ years of software engineering experience; 7+ years building and operating AI/ML systems at scale.
* Demonstrable Principal-level impact and ownership on large-scale engineering initiatives.
* Expertise in at least one object-oriented programming language (Java/C++/GoLang) and one ML native language (Python).
* Strong experience in Applied AI, specifically focusing on the infrastructure and platform services required to operationalize deployment vehicles effectively.
* Deep experience with high-scale ML serving frameworks (e.g., TorchServe, TensorFlow Serving, NVIDIA Triton).
* Familiarity with LLMs, vector databases, and applied generative AI deployment patterns (e.g., containerization, traffic management, and cost optimization of RAG pipelines).
* Deep mastery of system design, distributed systems, and cloud-native architectures (AWS/GCP, Kubernetes, Service Mesh).
* Exceptional track record in building and scaling ML serving pipelines, real-time inference systems, and API platforms.
* Proven ability to influence and drive technical consensus across cross-functional teams and mentor senior engineers.
* Strong communication and collaboration skills across technical and non-technical teams.
* Ability to translate complex AI concepts into pragmatic and compliant engineering decisions.
* Experience in startups or high-growth tech companies.

Preferred Skills:

* Contributions to open-source AI/ML infrastructure or MLOps projects.
* Patents, papers, blogs, or other external publications related to large-scale ML deployment, observability, or governance.
* Strong platform and product-centric mindset demonstrated by high-leverage infrastructure projects