Site Reliability Engineer, Google Cloud Platform, Vice President
As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you'll be focused on running better production applications and systems.
In this role as an SRE, you will be providing production support to the JPMC Public Cloud team on the public cloud, specifically Google Cloud Platform (GCP). You will be working with cloud engineers to build the platform, pipelines and monitor systems to ensure the application landscape is designed to best take advantage of JPMC's global cloud solutions. Responsibilities:
- Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
- Provide failure analysis / root cause analysis when required
- Provide support to develop & improve the quality of technical engineering documentation
- Provide support to drive the maturity of the software development lifecycle
- Provide quality control of engineering deliverables
- Provide technical consultation to product management
- Perform deployment, administration, management, configuration, testing, and integration tasks related to the platforms in cloud environment
- Help to develop new cloud engineering strategies and implementations for the firm
- Champion a DevOps model so that services are automated and elastic across all platforms
- Help coach and mentor less experienced team members.
- Write operational documentation and knowledge base of known issues with solutions
- Ready to participate in 24x7 SRE on-call rotations and escalation workflows as needed, such as on occasional weekends
- Bachelor's Degree or equivalent experience
- 8 or more years of IT experience with expertise in Enterprise Cloud infrastructure (GCP, AWS or Azure) in a mission critical environment
- In-depth OS experience (RHEL, Ubuntu, Windows Server) with strong debugging, troubleshooting, and problem-solving skills
- Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, ElasticSearch, Grafana
- Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins
- Good understanding of networking protocols and cybersecurity best practices in cloud environment
- Cloud certification in either AWS, Azure or GCP is required. GCP certification is highly preferred.
- Experience in PowerShell, shell scripting or GO is highly desirable