Our client is an international bank in Hong Kong, currently looking for an IT Incident Manager to join their team.
- Receive and Respond to critical alerts from Client Journey monitoring systems
- Trigger the Major Incident Management process if necessary
- Review dashboard and monitoring effectiveness
- Document areas for improvement of monitoring tools and processes
- Keep abreast of planned system changes, business campaigns and economic, political, social and environmental events that may impact system usage or stability
- Use situational knowledge to correlate system anomalies with potential situational causes
- Build rapport with business, and technology stakeholders
- Triage incident reports to assess actual or potential client / business impact
- Trigger the Major Incident Management process for incidents impacting clients
- Assess the Priority of incidents according to the agreed Priority Matrix
- Act as an overall Situation Manager to ensure the right resources are mobilized and that incident investigation and resolution is progressing effectively
- Manage incident bridges to ensure technology responders are able to effectively work towards resolution and non-technology stakeholders are given proper updates on impact, work arounds, status and progress without interrupting resolution activities
- Communicate effectively to key stakeholders across the organization including business, country, risk, and technology stakeholders to keep them informed about the impact and status of ongoing technology incidents
- Operate an Incident Dashboard to provide on-demand status updates for ongoing technology incidents.
- Gather impact details, and communicate business impact to technology stakeholders on incident bridges
- Operate a group chat channel and facilitate Business Bridge to provide real time updates to key stakeholders
- Knowledge Management
- Initiate the RCA (Root Cause Analysis) process for relevant incidents.
- Ensure lessons learned are recorded particularly in regard to monitoring, mobilization, response, and recovery action improvements.
- Collect business impact details for sharing with relevant stakeholders
- Ensure outage and impact details are recorded accurately in source systems such as Remedy - to ensure timeliness and accuracy of reporting
- Facilitate reporting on incident trends and thematic analysis
- SRE / Problem Management
- Trigger the RCA process for relevant incidents
- Identify areas of improvement in monitoring, housekeeping and capacity planning to proactively avoid incidents, and where applicable, assist in the development, testing and implementation of software solutions
- Contribute to the identification and documentation of failure points using tools like FEMA (Failure Mode Effects Analysis), Jira, etc.
Interested applicants please contact me at 6377 1286 or send your resume to Cay Li: firstname.lastname@example.org
** For more job opportunities, please visit our website: www.pplesearch.com**
Bachelor degree with knowledge in Information technology Solid IT experience. Banking domain is desirable Hands on Production support experience preferably in the banking industry Good oral and written communication skills, ability to interact with business representatives and senior management Proven experience in co-ordination of many dependencies and multiple demanding stakeholders in a complex, large-scale international environment Familiar with Agile methodologies and tools such as Jira Knowledge of monitoring tools such as ITRS, BMC, Splunk, AppDynamics etc. Knowledge of Java, J2EE, Oracle, WaaS, MQ and Unix technologies is a plus Experience with Remedy or Service Now, and knowledge management tools, and documentation is a plus Immediate availability is highly preferred