This position needs a highly skilled SRE with in depth knowledge of Unix, Python and Kubernetes to join a team of production engineers who are taking reliability to the next level. The team is focussed on a push to improve automation, testing and monitoring of systems and processes.
- Standardisation of monitoring methodologies, systems, tools, libraries
- Automation of operational processes to improve reliability and efficiency and to reduce alert fatigue
- Owning and evolving our systems through pushing for changes that improve resilience and reliability
- Developing and enabling development of high quality, resilient, scalable and secure systems
- Wearing a strategic resilience and reliability hat in architecture and design discussions
- Maintain the highest levels of systems availability – mostly proprietary applications, across the enterprise
- A passion for automation and continual improvement, with a track record of identifying high value automation opportunities
- Intense focus on improving system availability and resilience through testing, standardisation and automation
- Ability to build positive and collaborative relationships with colleagues across teams and geographies.
- Broad technical knowledge and strong communication skills, credible across the full technology stack
- Systematic and methodical approach to problem-solving and debugging
- Knowledge of cyber security risks