Learn more about opportunities in Alkeon’s VC Portfolio

Resilience Engineer



Milan, Italy
Posted on Tuesday, April 16, 2024

Your opportunity:

Mollie is one of the fastest-growing companies in Europe, and we are facing many exciting technical and organisational challenges. You will play a key role in shaping the Production environment experience and augmenting our best practices, and you’ll have the opportunity to shape the rest of your career at Mollie.

As a Resilience Engineer within Mollie's Developer Enablement domain, you'll be instrumental in building and maintaining robust processes and systems designed to meet the highest standards of reliability and operational excellence. You will steward our production readiness, support engineers in best practices, and advocate for an improved developer experience in creating resilient services.

What you'll be doing:

  • Champion production readiness across the company, setting technical directions and automating processes for sustainable production practices.

  • Conduct readiness reviews for new services, collaborating across teams to embed best practices for resilience and operational excellence.

  • Run training workshops on Site Reliability Engineering concepts and author reliability bulletins to address engineering challenges.

  • Implement and advocate for resilience patterns such as Feature Flags, Circuit Breakers, and graceful degradation, to dynamically manage system dependencies and enable safer deployment practices.

  • Identify and advocate for solutions to reduce friction in production service management, collaborating with infrastructure teams to enhance developer tools and practices.

  • Facilitate and engage in Incident Reviews and participate in cross-functional squads to address complex reliability issues. Collaborate with Chaos Engineering on game day and disaster recovery exercises.

  • Foster a culture of continuous learning and improvement, mentoring peers, and contributing to a collaborative work environment.

What you'll bring:

  • Possess around 4 years of experience with distributed systems, with a knack for quickly understanding unfamiliar services and production patterns.

  • Have prior experience facilitating Incident Reviews, root cause analysis, writing post-mortem documents for continuous improvement.

  • A strong educator at heart, experienced in teaching others or creating developer-focused documentation.

  • Good coding skills in languages relevant to our stack (e.g., PHP, Java, Python, Go), with a taste for automating manual processes to scale impact.

  • Empathetic, collaborative, with excellent verbal and written communication skills, capable of working effectively in a remote setting across teams.

  • Driven by challenges, eager to dive into new codebases and systems, with the determination to navigate and clarify ambiguity.