AVP, SRE, Observability Automation and Orchestration, Technology & Operations
Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels. Responsibilities
- Design and architect a highly resilient open source stack based application monitoring infrastructure. Enhance,optimize and migrate to new solutions if required.
- Develop and deploy sprint boot microservices, build applications for administrative automations, APIs for integrations.
- Lead the application teams to migrate to latest OpenShift versions, perform deployment of stateful/stateless apps, and troubleshoot issues in Kubernetes/OpenShift platforms.
- Implementing best practices for OpenShift deployment and stateful sets like PodAntiAffinity, Security context at Pod/Container level, liveness and readiness probe, Secret management, Guaranteed QOS for Pods, Reduction in docker image size, Image pull policy, Least privileged RBAC, definitive resources and verbs.
- Work with application developers to implement application instrumentation libraries and frameworks. Educate the dev community on different types of instrumentations and profiling. Follow Open Tracing and Open Telemetry dev community for adopting latest implementations.
- Design the deployment pattern and setup metrics data store using TSDBs like Prometheus, Victoria Metrics, Timescale DB or Influx DB. Perform administration and tuning like cardinality optimisation, resource optimisation.
- Setup and maintain distributing tracing infrastructure like Jaeger, Zipkin, etc. Perform administrative functions and tuning like sampling strategy. Troubleshoot distributed tracing in microservices.
- Implement and perform production support activities of enterprise logging platforms like ELK stack, Grafana Loki, etc. Work on Index Lifecycle management in Elastic search.
- Implementing alerting infrastructure, integrate with PagerDuty, MS teams and any other software which needs alert based mitigation/action. Assist application support team to define alerting rules for enterprise business apps.
- Deploy and do administration of visualization tools like Grafana/Kibana, Skywalk. Create dashboarding templates which can be reused , Implement RBAC for the entire userbase.
- Educate and implement observability culture in dev community . Assist them identifying golden signals, defining SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR.
- Implement and integrate proprietary monitoring tools like Dynatrace or AppDynamics. Certification in anyone would be preferred.
- Troubleshoot the infra issues in the observability infrastructure in Linux VMs and Kubernetes PODs , Setup and secure reverse proxies, Secure all application endpoints with TLS, enable MFA, LDAPS, OAuth based on requirement.
- Configure CI/CD pipeline for all the monitoring infrastructure and services. Modify and extend existing pipeline to cater multiple environments/regions.
Apply Now We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.
- Experience in design and architecting a highly resilient open-source stack based application monitoring infrastructure in enterprise container platforms and VMs.
- Development skills with experience in real-time, distributed, and highly secured environments. Experience in Java, Spring boot is preferred.
- Strong hands-on experience in administrative configurations, deployment of stateful/stateless apps, and troubleshooting in Kubernetes/OpenShift platforms.
- Experience in application instrumentation, familiar with application instrumentation libraries, frameworks, and modes of instrumentation. Familiar with Open Tracing and Open Telemetry standards.
- Build custom applications to automate and orchestrate the workflow. Integrate the interfacing applications via APIs.
- Experience in implementing, administration, and tuning metrics data stores like Prometheus, Victoria Metrics, Timescale DB or Influx DB.
- Experience in implementing, administration, and tuning of distributing tracing infrastructure like Jaeger, Zipkin, etc. Familiar with stitching different applications together to have end to end tracing view.
- Experience in implementing and administration of enterprise logging platforms like ELK stack, Grafana Loki, etc.
- Experience in implementing alerting infrastructure, integrations with PagerDuty and Webhooks, defining alerting rules for enterprise business apps.
- Hands-on experience in implementing and administration of visualization tools like Grafana/Kibana, Skywalk.
- Experience in identifying golden signals, defining SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR.
- Implementation and administration experience of proprietary monitoring tools like Dynatrace or AppDynamics.
- Strong hands-on experience in Linux platform, Docker containers, Reverse proxies, SSL/TLS certificates, authentication configurations for MFA, LDAPS, OAuth, etc.
- Experience with CI/CD pipelines and toolsets like bitbucket, Jenkins, SonarQube, JIRA, Nexus, etc
- Self-driven, strong, committed, and reliable team player. Ability to contribute to discussions on design and strategy. Good written and oral communication skills.
- Minimum of 8 years technology experience (preferably in the financial industry).