RESPONSIBILITIES AND QUALIFICATIONS
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SREs Platforms team is responsible for designing and developing large scale distributed system on premises and in public cloud. These systems are used to provide observability platform for firm's most critical platform services, and ensures they meet the requirements of our internal and external users. We look for engineers who are motivated to collaborate with our businesses to develop and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.
HOW YOU WILL FULFILL YOUR POTENTIAL
Design and Develop large scale distributed systems.
Balance feature development velocity and reliability with well-defined SLOs.
Run the Production environment by monitoring availability and taking a holistic view of system health.
Create sustainable systems and services through automation and uplifts
Participate in system design consulting, platform management, and capacity planning.
SKILLS AND EXPERIENCE WE ARE LOOKING FOR
BS degree in Computer Science or related technical field involving coding and / or systems engineering.
Proficiency in one or more of the following: Java, Go, C++, Python, C.
Hands-on experience with development, debugging and optimizing code, as well as automation
Experience with algorithms, data structures and software design.
Experience with distributed systems design, maintenance, and troubleshooting.
Experience with distributed databases like Mongo, Hadoop, Cassandra or ElasticSearch
Experience with open source messaging like Kafka/ Rabbit MQ etc.
Knowledge of cloud native solutions in AWS or GCP
Strong interpersonal skills, drive, and ownership.
Coding beyond simple scripts.
Solving novel problems from first principles.