- New York, NY, USA
- Contract, Full time
- Apex Systems, LLC
- 06 Dec 17 2017-12-06
The Public Cloud Site Reliability Engineer will develop software tools, collaborate heavily with support teams, software development teams, and operations teams to solve complex problems in our production systems while developing software systems to identify and mitigate system issues before they impact end users. The role is a blend of software development and operations
- Software Developer to engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Troubleshoot issues across the entire stack: hardware, software, application and network
- BS degree in Computer Science or equivalent practical experience.
- Experience in one or more of the following: Java, Python, Go, Perl or Ruby
- Experience with Unix/Linux operating systems internals and administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN).
- Strong Public Cloud Experience (AWS, Azure, GCE) with production workloads, not just a certificate
If interested, please email the word version of your resume to Brianna at BSchauer@apexsystems.com!