Description
About the Team
The Service Network team is at the core of what makes Slack work — fast, secure, and reliably — for over 14 million daily active users worldwide. We design and operate the service-to-service networking fabric that powers Slack's platform, built on advanced service mesh technologies and intelligent service discovery. Our work enables secure, scalable, and resilient communication across internal services, enforcing strong security boundaries and maintaining the high availability that our users depend on.
Slack's infrastructure is constantly evolving to match our fast-growing business. The Service Network roadmap is focused on making infrastructure easier to use for the engineers who build on top of it — delivering capabilities like blue-green deployments out of the box, so developers can move faster without compromising stability. We're a small team with an outsized impact: we iterate quickly, collaborate closely across engineering, and hold ourselves to a high standard of simplicity and reliability in everything we ship.
Reliability isn't just a metric for us — it's Slack's most critical feature. The systems we own are central to platform availability, and we take that responsibility seriously. We work to make our infrastructure scalable, efficient, and production-ready, while partnering with other engineering teams to improve the end-to-end customer experience.
Slack has a positive, diverse, and supportive culture. We look for people who are curious, inventive, and committed to growing a little better every day. We work to be smart, humble, hardworking — and above all, collaborative. If that sounds like you, we'd love to connect.
About the Role
We're hiring a Software Engineer I to join the Service Network team in a full-time, U.S.-based role. This is a high-impact position at the intersection of distributed systems, cloud-native infrastructure, and platform reliability.
What You'll Be Doing
Design, build, and operate scalable, secure, and reliable service mesh infrastructure that underpins service-to-service communication across Slack's platform.
Implement and maintain core service mesh capabilities — including service discovery, observability, traffic routing, mTLS, and policy enforcement.
Troubleshoot and resolve production issues spanning distributed systems, Kubernetes environments, networking layers, and Linux-based infrastructure.
Drive improvements in platform reliability, performance, and operational efficiency through thoughtful automation and tooling.
Contribute enhancements and fixes to internal tooling and open-source technologies, including Envoy.
Play an active role in incident response and operational excellence, helping uphold platform availability and service-level objectives (SLOs).
Stay ahead of emerging cloud-native and service mesh technologies and bring best practices into the team's work.
What you should have:
Must have lawful permanent residency in the U.S.
2+ years of experience in software engineering, infrastructure engineering, or site reliability engineering.
Hands-on experience with Kubernetes and cloud platforms such as AWS or GCP.
Proven ability to work within distributed systems, microservices, or cloud-native environments.
Strong collaboration and communication skills — you work well across teams and can make complex technical topics accessible.
Nice to have:
Proficiency in Go, Python, Ruby, C, or C++.
Experience configuring and operating service mesh at production scale, with a focus on stability, scalability, and performance.
Solid understanding of TCP/IP, DNS, and network-related protocols.
Experience operating Linux/Unix systems at high volume and scale.
Strong foundation in algorithms, data structures, complexity analysis, and distributed systems design.
For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.