This position needs a highly skilled SRE with in depth knowledge of Unix, Python and Kubernetes to join a team of production engineers who are taking reliability to the next level. The team is focussed on a push to improve automation, testing and monitoring of systems and processes.
Standardisation of monitoring methodologies, systems, tools, libraries
Automation of operational processes to improve reliability and efficiency and to reduce alert fatigue
Owning and evolving our systems through pushing for changes that improve resilience and reliability
Developing and enabling development of high quality, resilient, scalable and secure systems
Wearing a strategic resilience and reliability hat in architecture and design discussions
Maintain the highest levels of systems availability – mostly proprietary applications, across the enterprise
A passion for automation and continual improvement, with a track record of identifying high value automation opportunities
Intense focus on improving system availability and resilience through testing, standardisation and automation
Ability to build positive and collaborative relationships with colleagues across teams and geographies.
Broad technical knowledge and strong communication skills, credible across the full technology stack
Systematic and methodical approach to problem-solving and debugging