Production Management- Head of Site Reliability Engineering (SRE) Production Management- Head of Site Reliability  …

Citi
in Singapore
Permanent, Full time
Be the first to apply
Competitive
Citi
in Singapore
Permanent, Full time
Be the first to apply
Competitive
Citi
Production Management- Head of Site Reliability Engineering (SRE)
Production Management- Head of Site Reliability Engineering (SRE) will play a key role to enable and lead Global Consumer wide program towards adoption of SRE principles and practices. Establish the foundation of the SRE key Pillars including Error Budgeting, Reliability Models, Toil Elimination, Measurement & Monitoring and Blameless Postmortem. Create and show case the value of SRE from a customer-focused perspective across the key principles including Availability, Latency, Performance, Efficiency, Change Management, Monitoring, Emergency Response, and Capacity Planning. Finally, drive results through Automation, Self-Healing and Resiliency Engineering with faster mean time detections, avoidance and restoration of services leading to cost reductions.

This leader will have a background in leading large, cross-functional teams across infra structure, Application Development and Architecture and must be someone that can influence through others, deliver results, and earn trust quickly of the organization and stakeholders.

As a manager, expectations are to define strategies and roadmaps to improve productivity while weighing in on the financial budget and vendor management resourcing. Build a high performing team of Site Reliability Engineers including recruiting and retaining top talent capable of operating under extreme pressure through empowerment, development and engaging experiences.

Job Responsibilities:
  • Management - Build a high performing team of Site Reliability Engineers including recruiting and retaining top talent capable of operating under extreme pressure through empowerment, development and engaging experiences.
  • Leadership - Expectations are to define strategies and roadmaps to improve productivity while weighing in on the financial budget and vendor management resourcing.
  • Establish the Service Reliability Engineering (SRE) foundations including creating a charter, reduction of toil and establishing & managing the error budget for critical services.
  • Introduce Chaos Engineering concepts including the creation and execution of Perturbation models designed to build confidence in a system's capability to withstand unexpected conditions.
  • Permit to design, permit to build and permit to deploy project work - prioritization, planning of projects and features, Stakeholder management and tracking of external commitments.
  • Efficiencies & Driving Results - Smarter monitoring, Resiliency Engineering, enhances opportunities in optimizing the customer experience, assessing change risk and automation.
  • Stream and Instrument SLI, SLOs, Error Budgets and actionable alerts in partnership with product owners, development and production management.
  • Eliminate Toil - Identify, meausure and track reduction & automation of manual and repetitive work.
  • Measure Simplicity - Identify, prevent and fix complexities that are found in software design, system architecture, configuration, deployment processes, or elsewhere.
  • Partner with AIOps, production management, engineering, application development and other groups for addressing customer pain points through automation.
  • Ensure teams are building using modern practices and tools. Recognize and act of your responsibility for driving change for the entire development community.


Qualifications:
  • At least 15 years demonstrated work experience, great analytical skills, strong business judgment, superb communication skills, excellent interpersonal skills, and the ability to resolve conflicts and set priorities, and outstanding customer insight.
  • Ideal leader of leaders will have a background in leading large, cross-functional teams across infra structure, Application Development and Architecture.
  • Experience in related service management field with an understanding of both Infrastructure and Software architecture as well as software development skills.
  • An understanding of Cloud technology both private and public would also be helpful.
  • Knowledge or Certification in Site Reliability Engineer (SRE), Cloud Based Certification (AWS or Google)


Education:
  • Bachelor's/University degree, Master's degree preferred- MBA, Masters in Computer Science / Architecture/ Engineering/Other Tech Specialties. Overall broad understanding of the technology functions within the Enterprise.


Job Family Group:
Technology

Job Family:
Technology Management

Time Type:
Full time

Citi is an equal opportunity and affirmative action employer.

Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Citigroup Inc. and its subsidiaries ("Citi") invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi .

View the " EEO is the Law " poster. View the EEO is the Law Supplement .

View the EEO Policy Statement .

View the Pay Transparency Posting
Citi logo
More Jobs Like This
See more jobs
Close