About TRUMID XT Trumid XT is a financial technology company bringing efficiency, connectivity, and access to credit trading through innovative technology and product design. Trumid XT product ecosystem leverages data and the power of the network effect to create transparency, liquidity, and efficient trade execution. Our company’s electronic trading platform connects corporate bond market professionals to a broad network of liquidity and provides a range of trading protocols to access it. For more information, visit www.trumidxt.com.
We count on our site reliability engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand our customer deployments, we are currently seeking an experienced SRE who will bring deep expertise designing and supporting highly scalable, highly available infrastructure and applications in Kubernetes, as well as promoting microservice design patterns in complex working environments within the cloud. This role will serve as a subject matter expert on all aspects of our containerized deployments, including deployment, configuration, scaling, and upgrades. The ideal candidate will be passionate about collaborating with a cross-functional team on the adoption of new technologies and design principles, as well as promoting DevOps culture. and collaboration. This role will also work closely to ensure deployments are successful in both production and non-production environments.
Objectives of this Role
- You will be accountable for the engineering, reliability and overall SRE enablement across our APAC-based platforms.
- Troubleshoot complicated, cross platform issues handling OS, AWS, networking and databases.
- Work closely with Development, QA and Production Support teams to make sure releases are on time and successful.
- Ensure the reliability and security of the infrastructure while building proactive dynamic monitoring, alerting and metrics solutions to make sure each environment is meeting the SLA requirements.
- Build infrastructure in both AWS and GCP using Terraform.
- Seek to minimize or eliminate manual hand-offs and to also link all automated workflows.
- Support the Kubernetes application/infrastructure in both production and non-production environments.
- Establish and test disaster recovery policies and procedures.
- Responsible for resiliency and scalability of the infrastructure
- Track and apply all required patches.
- Demonstrate experience in the creation and management of technical documentation.
- Enable SecOps across APAC, collaborating between our development and core-SRE teams to implement region-specific cybersecurity requirements.
Required Skills and Qualifications
- BA in Computer Science or Information Systems or combination of education and related work experience
- 5 years of Site Reliability experience (SRE)
- 5 years of DevOps experience
- 2 years with Kubernetes experience
- 3 years with cloud platform experience, AWS and GCP
- 5 years with production infrastructure experience
- Strong coding experience in Ruby, Python, Perl or similar languages
- Proven experience to automate routine repeatable tasks
- Strong sense of ownership, ability to work independently, recommend variations to support the regional business, proven track record of driving products and changes
- Strong experience in production support and operations
- Strong experience in monitoring technologies, infrastructure performance monitoring and availability; utilising SLOs, SLIs and presenting metrics for management use.
- Strong experience in Terraform, Ansible, Jenkins, Linux, Docker, Helm, Elasticsearch, Prometheus
- Strong automation, problem-solving skills, and ability to follow through to completion.
- Ability to wear multiple hats and multitask effectively in a fast-paced environment.
- Capable of working independently as well as part of a group.
- Experience in security strengthening and cybersecurity compliance.