AVP/Senior Associate, Site Reliability Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations AVP/Senior Associate, Site Reliability Engineer,  …

DBS Bank Limited
in Singapore
Permanent, Full time
Be the first to apply
Competitive
DBS Bank Limited
in Singapore
Permanent, Full time
Be the first to apply
Competitive
AVP/Senior Associate, Site Reliability Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
Business Function

Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.

Key Accountabilities

• Build and maintain Production monitoring and automation solutions
• Implement Site Reliability Engineering principles with regards to performance, reliability, monitoring, alerting and maintenance in Production environment
• Capacity monitoring & Observability of production Infrastructure, automated alerting, performance monitoring and reporting tools
• Build and implement Service improvements and Machine Learning models
• Manage identified Production applications, identify, measure and report performance trends and KPIs periodically - report SLI, SLO, SLA measures and improve systems performance and associated performance KPIs
• Production systems performance and KPIs monitoring
• Deployment automation an allied improvement

Responsibilities

• Conceptualize, design, develop and maintain production monitoring and Machine Learning based predictive automation solutions/ applications in a CORE Banking Production environment.
• Production automation. Automation of manual activities /processes for Production teams.
• SRE. Implement Site Reliability Engineering principles regarding performance, reliability, monitoring, alerting in Production environment
• Capacity monitoring & Observability of production Infrastructure, automated alerting, performance monitoring and reporting tools. Conduct periodic review of system performance for capacity planning and identification of system improvements
• Develop auto-healing solutions in production environment to enable efficient and timely service restorations of critical processes by auto-escalation of incidents, non-performant KPIs and underlying remedial actions
• Data handling - ingestion, cleansing, storage, visualization, monitoring & alerting and analytics
• Data analysis to find patterns in data using tools and coming up with optimum solutions that are predictive and provides insights
• Build and implement Service improvements. Identify, measure and report performance trends - SLIs/ SLOs/ SLAs periodically and improve systems performance and associated performance KPIs
• Production batch and incidents trending and measuring systems performance against KPIs
• Ensure Preventive and detective measures of the applications are identified and implemented.

Requirements

• 5 - 10 years of total IT experience in SRE and Production automation
• Good understanding of SRE concepts and principles regarding performance, reliability, monitoring, alerting.
• 3+ years of experience in a professional production environment as a developer in Python, Java and ELK.
• Proven ability to have implemented/ conceptualized/ maintained an ELK based (or equivalent central logging/ monitoring/ predictive applications) in production environment would be an added advantage.
• Production automation. Automation of manual activities /processes for Production teams. (Automation experience required)
• Good experience in running automation and service improvements experience
• Capacity monitoring & Observability. Good level of command over production Infra; performance monitoring and reporting tools
• Hands-on Engineering/ Development experience working on production systems - architecture design, development, integration, customization & implementation.
• Ability to write clear and concise documentation (such as requirements, design and testing procedures)
• Strong technical/ programming skills. Knowledge of additional programming languages - Python, Java, Kafka, PCF, Spark and ELK an added advantage.
• Knowledge of Data handling tools and Software version control tools (Git) and Jenkins
• Experience using and optimizing monitoring and trending systems (Prometheus, Grafana), log aggregation systems (ELK, Splunk) and their agents
• Ability to work with stakeholders to stretch his role in depth/width
• Present facts and recommendations effectively in oral and written form
• Good knowledge of development practices
Apply Now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognizes your achievements.
DBS Bank Limited logo
More Jobs Like This
See more jobs
Close
Loading...
Loading...