AVP, Site Reliability Engineer (Core Banking), Group Consumer Banking and Big Data Analytics Technology, Technology & Operations AVP, Site Reliability Engineer (Core Banking),  …

DBS Bank Limited
in Singapore
Permanent, Full time
Be the first to apply
Competitive
DBS Bank Limited
in Singapore
Permanent, Full time
Be the first to apply
Competitive
AVP, Site Reliability Engineer (Core Banking), Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
Business Function
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
Key Accountabilities
  • Build and maintain Production monitoring and automation solutions
  • Automation of manual tasks in a CORE Banking ecosystem
  • Implement Site Reliability Engineering principles with regards to performance, reliability, monitoring, alerting and maintenance in Production environment
  • Capacity monitoring & Observability of production Infrastructure, automated alerting, performance monitoring and reporting tools
  • Build and implement Service improvements and Machine Learning models
  • Manage identified Production applications, identify, measure and report performance trends and KPIs periodically - report SLI, SLO, SLA measures and improve systems performance and associated performance KPIs
  • Production systems performance and KPIs monitoring
  • Deployment automation an allied improvement
Responsibilities
  • Conceptualize, design, develop and maintain production monitoring and Machine Learning based predictive automation solutions/ applications in a CORE Banking Production environment.
  • Production automation. Automation of manual activities /processes for Production teams.
  • SRE. Implement Site Reliability Engineering principles regarding performance, reliability, monitoring, alerting in Production environment
  • Capacity monitoring & Observability of production Infrastructure, automated alerting, performance monitoring and reporting tools. Conduct periodic review of system performance for capacity planning and identification of system improvements
  • Build, monitor and maintain Machine Learning models from scratch.
  • Develop auto-healing solutions in production environment to enable efficient and timely service restorations of critical processes by auto-escalation of incidents, non-performant KPIs and underlying remedial actions
  • Data handling - ingestion, cleansing, storage, visualization, monitoring & alerting and analytics
  • Data analysis to find patterns in data using tools and coming up with optimum solutions that are predictive and provides insights
  • Build and implement Service improvements. Identify, measure and report performance trends - SLIs/ SLOs/ SLAs periodically and improve systems performance and associated performance KPIs
  • Production batch and incidents trending and measuring systems performance against KPIs
  • Automation of system health check and monitoring of production system SLIs and SLOs to ensure SLA is met
  • Provide continuous monitoring and improvement of systems - job automation, performance tuning, capacity planning.
  • Identify persistent or recurring problems and recommend creative solutions.
  • Communicate proactively and provide regular update to the stakeholders. Proven ability to communicate with peers and mentor junior developers.
  • Ensure Preventive and detective measures of the applications are identified and implemented.
Requirements
  • 6 - 12 years of total IT experience in SRE and Production automation experience in a Banking and Financial services environment. Experience gained in the SRE team, a good understanding of SRE concepts and principles regarding performance, reliability, monitoring, alerting.
  • 3+ years of experience in a professional production environment.as a developer in Python & allied libraries like Pandas/ Matplotlib/ Seaborn/Scikit-learn.
  • Proven ability of having conceptualised, developed and implemented 2 end-to-end Predictive Machine Learning models using algorithms like Regression, Decision Trees/ Random Forest, Bagging and Boosting algorithms, Unsupervised learning algorithms, Time-series etc. in a production environment.
  • Proven ability to have implemented/ conceptualized/ maintained an ELK based (or equivalent central logging/ monitoring/ predictive applications) in production environment would be an added advantage.
  • Production automation. Automation of manual activities /processes for Production teams. (Automation experience required)
  • Good experience in running automation and service improvements experience
  • Capacity monitoring & Observability. Good level of command over production Infra; performance monitoring and reporting tools
  • Hands-on Engineering/ Development experience working on production systems automation in Banking systems - architecture design, development, integration, customization & implementation.
  • Ability to write clear and concise documentation (such as requirements, design and testing procedures)
  • Strong technical/ programming skills. Knowledge of additional programming languages - NoSQL, Java, Python an added advantage.
  • Data handling tools
  • Software version control tools (Git)
  • Experience using and optimizing monitoring and trending systems (Prometheus, Grafana), log aggregation systems (ELK, Splunk) and their agents
  • Expert level experience in conceptualization, design, development, testing, implementation and maintenance of Elasticsearch, Logstash, Grafana/ Kibana, NoSQL, Java applications in production environment
  • Familiar with - MariaDB, Application Server like JBoss, Any cloud platform, Shell scripting, SQL
  • Good to have - Working knowledge of ELK and Java development practices, JAVA, .NET, Oracle, Tivoli, Websphere MQ, web services, XML, AIX, Linux
  • Familiar with applications Xcelerate/TBMS systems, StreamServe, WAS, Oracle PL/SQL, MS SQL, Java, Apache Tomcat, AIX, Linux.
  • Ability to work with stakeholders to stretch his role in depth/width
  • Present facts and recommendations effectively in oral and written form
  • Good knowledge of development practices and ability to write clear and concise documentation for requirements, design and testing procedures
  • Pro-active, independent, resourceful and Strong team player, effective at communicating internationally and used to working closely with remote teams and peers.
  • High attention to detail with focus on understanding the issues with finding solutions
  • Possess excellent verbal and written communication skills
  • Demonstrate ownership and responsibility in all assignments
Apply Now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.
DBS Bank Limited logo
More Jobs Like This
See more jobs
Close
Loading...
Loading...