About This Role – Site Reliability Engineer (SRE) We are seeking a talented Site Reliability Engineer with experience building and designing systems, monitors, tools, frameworks, and methodologies to ensure the reliability of our trading platforms. You will join the SRE team who work closely with software development and engineering teams—positioned as the stewards of our production systems.
Design and implement a wide variety of systems that support our codebase. Primary focus being cloud-native, and Kubernetes systems.
Define and manage meaningful and actionable SLI/SLO metrics
Recommend and execute platform changes to improve service-levels
Build infrastructure as code templates that allow Devops to deploy
Manage existing and build new continuous integration pipelines
Maintenance of all environments via automated patching systems
Participates in releases and rotating on-call schedules
Owns production incident response
Design and manage alerting to react to breaches of SLOs
Automate platform/system recovery
Excellent presentation skills and strong negotiation skills
Superior time management skills
Proven track record of strong scope and change control
Bachelor's degree in computer science or a related discipline, or equivalent work experience required.
3+ years of experience in SRE, DevOps, SWE or cloud architecture roles.
Hands on experience in Public Cloud, Terraform/Ansible/Cloudformation, PagerDuty/OpsGenie, Kubernetes, Linux, modern monitoring platforms: