The Service Network team plays a critical role in ensuring the fast, secure, and reliable delivery of Slack to over 14M+ daily active users worldwide. At the heart of our work is the design and operation of robust service-to-service networking, powered by advanced service mesh technologies and service discovery. This enables secure, scalable, and resilient communication between internal services—supporting high availability and enforcing strong security boundaries.

Slack’s infrastructure is always evolving to support our fast-growing business. Service Network’s roadmap is aimed at improving ease-of-use of our infrastructure by providing our developers' features such as blue-green deployments out of the box. We are a small team making a large impact. We rapidly iterate and work closely with other teams in engineering ensuring resilient systems built to scale. We have a strong commitment to quality and understand that simplicity and reliability should be primary aspects of the systems that we build.

Reliability is Slack’s most critical feature! Accordingly, the Service Network team is responsible for systems vital to Slack’s availability. We work to make our systems scalable, efficient, and operating according to our high standards in production. We also partner with other engineering teams to find solutions to improve end-to-end customer experience in Slack.

What you will be doing:

Contribute to the design, development, and operation of scalable, reliable, and secure service mesh infrastructure that enables service-to-service communication across the platform.
Implement and support service mesh capabilities including service discovery, observability, traffic routing, and security features such as mTLS and policy enforcement.
Assist in troubleshooting production issues across distributed systems, Kubernetes environments, networking, and Linux-based infrastructure.
Participate in improving platform reliability, performance, and operational efficiency through automation and tooling.
Contribute enhancements and fixes to internal tooling and, where appropriate, open-source technologies such as Envoy.
Support incident response and operational excellence efforts to help maintain platform availability and service-level objectives (SLOs).
Stay current with emerging cloud-native and service mesh technologies and apply best practices to improve the platform.

What you should have:

U.S. Citizenship. We are unable to provide visa sponsorship for this role.
4+ years of experience in software engineering, infrastructure engineering, or site reliability engineering.
Ability to independently drive projects and contribute to technical design discussions.
Hands-on experience with Kubernetes and cloud platforms such as AWS or GCP.
Experience working with distributed systems, microservices, or cloud-native applications.
Strong collaboration and communication skills with the ability to work effectively across engineering teams.

Qualifications:

Proficiency in one or more programming or scripting languages such as Go, Python, Ruby or C/C++.
Experience with configuring and operating service mesh on larger-scale production operations, focusing on stability, scalability, and performance limits of web services
Experience with TCP/IP, DNS, and network-related protocols
Experience with Linux / Unix operating on high volume systems at scale
Experience with algorithms, data structures, complexity analysis, distributed systems and software development

Apply now

See more open positions at Own Company

Powered by Getro.com

Privacy policy Cookie policy