Bloomberg's internal Data Science Platform was established to support development efforts around data-driven compute, machine learning, and business analytics. The platform aims to provide scalable compute, specialized hardware and first-class support for a variety of workloads such as Spark, PyTorch, and Jupyter. By using containerization, container orchestration and a cloud-native design, Bloomberg created a data science product with portable and composable infrastructure building blocks. This facilitated Bloomberg to rapidly evolve, scale and adapt to new challenges and opportunities in the data science space.
Being one of the most critical stages of machine learning, distributed ETL processing has contributed to the largest compute footprint within Bloomberg's data science platforms, especially through its Apache Spark offering. To effectively manage Apache Spark on Kubernetes, the Spark Runtimes team uses its inhouse Spark expertise and best practices around infrastructure management, scalability, and security to provide support for a powerful, flexible, and centralized Spark product.
As the needs of distributed compute, machine learning, data exploration and analysis advance, so does the need of the Spark runtime that underpins it. Our runtime is poised for continued growth to accommodate the endless number of products across Bloomberg that rely on a robust ETL offering. Highlights from our upcoming roadmap focus on: enabling pluggable batch scheduling capabilities, increased resiliency when running in a pre-emptive environment, deeper integration with cloud-based Spark environments, and enhanced dynamic allocation support. The Role:
As a member of the multi-disciplinary Spark Runtimes team, you'll have the opportunity to make key technical decisions to keep this platform moving forward. Our team makes extensive use of open source (e.g. Spark, Kubernetes, Buildpacks, Kubeflow, Jupyter etc.) and is deeply involved in a number of upstream communities. We collaborate widely with the industry, contribute back to the open source projects, and present at conferences. While working on the runtime, the backbone for many of Bloomberg's upcoming products, you will have the opportunity to collaborate with engineers across the company and mutually learn about the technology that delivers products from the news to financial instruments. If you are a software engineer who is passionate about building resilient, highly available, and elastic infrastructure and seamless, usable full stack solutions, we'd like to talk to you about an opening in our team. We'll trust you to:
You'll need to have:
- Design and develop Kubernetes-based products running as a SaaS offering and on customer-managed clusters
- Interact with data engineers and Spark experts across the company to understand their workflows and requirements to inform the next set of features for the platform.
- Design distributed systems and develop solutions for problems such as elastic load distribution, effective resource management, and fair scheduling
- Automate operations and improve telemetry in our Kubernetes infrastructure stack
- Regularly present and explain your work to peers, senior stakeholders (including our CTO), and clients
Nice to haves (not required):
- Proficiency in two or more languages (Go, Java, Scala, Python) and willingness to learn more as needed
- Have a strong sense of curiosity to solve new problems and keep learning new technologies.
- Have a passion for providing reliable and scalable infrastructure
- Experience with distributed systems eg. Kubernetes, Spark, Kafka, Zookeeper
- Experience building and scaling Docker-based systems using Kubernetes, Swarm or Mesos
- Experience with Kubebuilder and Kubernetes operator-based frameworks
- Experience with Spark infrastructure open source projects such as Delta Lake, Iceberg
- Experience working with authentication & authorization systems such as Spiffe and Spire
- Open source involvement such as a well-curated blog, accepted contribution, or community presence
- Experience with cloud providers such as AWS, GCP, or Azure
- Ability to identify and perform OS and hardware-level optimizations
- Experience with configuration management systems (Chef, Puppet, Ansible, or Salt)
- Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops)
- Experience working with GPU compute software and hardware
- Passion for education e.g. providing workshops for tenants
Learn more about our work: https://linktr.ee/bloombergdnainfra
Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or maternity/parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law.
Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email firstname.lastname@example.org