APAC Application Resiliency Operation Lead
As a Site Reliability Engineer / Application Resiliency Operation Engineer you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of support and development will focus on existing systems, building infrastructure and reducing work through automation. You'll join a team of curious problem solvers with diverse set of perspectives who are thinking big and taking risks. In this environment you'll take the lead on the relevant projects, supported by and organization that provides the support and mentorship you need to learn and grow. As an SRE you'll be focused on running better production applications and systems. Responsibilities
- Design, code, debug, test, and deliver software to automate manual operational work
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
- Build and drive adoption for greater self-healing and resiliency patterns.
- Lead and participate in performance tests; identify bottlenecks, opportunities for optimization and capacity demands.
- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Design automated software and product upgrades, change management, and release management solutions
- Coach or manage teams as applicable
- Participate in the 24x7 support coverage as needed
- Bachelor's degree or equivalent experience in an software engineering discipline
- Mastery in at least two or more programming languages ( e.g. Python, Java, Go etc..) with respect to designing, coding, testing, and software delivery.
- 7+ years of Software Engineering experience, using one or more object oriented programming languages and/or scripting.
- Adept in the development of automated tools, systems, and services in multiple technology domains with excellent debugging and trouble shooting skills
- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
- Advanced knowledge of one or more infrastructure components ( e.g. networks, cloud services, orchestration tools, containerizations, compute and storage systems
- Advanced knowledge in telemetry and triaging systems like Splunk, Dynatrace Managed, Geneos etc..
- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm.
- Adept in Agile development practices.
- 5+ years experience in two or more of the following areas (encourage applying to this role if you have good core experience in at least 2 areas):
- Hand-on experience with cloud-based applications, technologies and tools, deployment, monitoring and operations, such as Kubernetes, Prometheus, FluentD, Slack, Elasticsearch, Grafana, Kibana, etc.Relational and NoSQL databases; developing and managing operations leveraging key event streaming, messaging and DB services such as Cassandra, MQ/JMS/Kafka, Aurora, RDS, Cloud SQL, BigTable, DynamoDB, MongoDB, Cloud Spanner, Kinesis, Cloud Pub/Sub, etc. Networking (Security, Load Balancing, Network Routing Protocols, etc.
- Language: Fluent oral and written Japanese and English.
- Ability to work under own initiative and over hurdles
- Work independently, reliable and creative
- Team player, people manager, Experience in interacting with local and overseas vendor(s)
- Passionate, self-motivated and driven with a hunger for learning and growth.
- Strong communicator with a high degree of written and verbal communication skills.
- Strong inter-personal and collaboration skills with the ability to interact with people in different countries both within APAC and Globally
- Results oriented with energy and passion to achieve and exceed stretching objectives and to deliver