Cloud Engineer/SRE - Bloomberg Law
Bloomberg Law SRE combines software and systems engineering to champion the use of sound engineering principles, operational discipline, and automation. We focus on improving Bloomberg Law (BLAW) product reliability, stability, and scaling with an interest in fault-tolerant distributed system design. Our culture of diversity, intellectual curiosity, methodical problem solving and openness in a blameless environment are keys to our success. What's in it for you:
As a Site Reliability Engineer (SRE) at Bloomberg Law, your mission is to improve reliability, scalability and performance of the BLAW Platform running on hybrid environment (on-premise and AWS). You will be empowered to promote and implement industry-wide SRE best practices. You will have the opportunity to work alongside application engineers across the full stack that uses modern open source web and data processing technologies. We'll trust you to:
You need to have:
- Implement systems that are highly available, scalable and self-healing on Bloomberg data centers and on AWS
- Design infrastructure and implement automation using infrastructure-as-code solutions (Terraform)
- Improve overall observability by implementing monitoring, metrics, logs and Service Level Objectives (SLO)
- Work alongside application engineers as they build/migrate applications on your infrastructure
- Troubleshoot production problems as they occur, and drive post-mortem process
- Measure current capacity, predict future capacity needs and make suggestions accordingly
We'd love to see:
- 3+ years of experience working on highly available, fault-tolerant distributed systems
- Experience in developing automated infrastructure in AWS or other cloud providers.
- A mindset to ensure stability of production environment, applying software engineering solutions to run/manage applications
- Understanding of Linux operating systems and networking.
- BS/MS/PhD in Computer Science, Engineering or related technology field
- Prior experience in AWS infrastructure and related DevOps practices.
- A deep understanding stability & reliability engineering (SRE) principles and practices
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Create project ideas and implement them with effective collaboration and communication.
- Familiarity with kubernetes/docker/containers
- Ability to work with diverse teams and personalities
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.