Cloud Operations Lead
Job Summary
This is an opportunity for a self-starter to create the Cloud Operations function, prioritizing operational resilience, reliability of service, and data security, for a US/European Business Management Software Vendor, and build the team that will manage the platform supporting ERP customers operating across verticals.
Main Responsibilities
Leadership Role:
- Control the cloud service design, roadmap and delivery
- Enhance customer experience for cloud customers
- Resolve critical issues in production
- Lead and motivate every member in the department and outside
- Advise and enhance process-based improvements with the help of coordinators
Technical Role:
- Work with product engineers and design team to maintain operationally resilient and secure cloud environments with lowest cost of ownership
- Develop automation for cloud for reliable and continuous delivery
- Manage operations in all public and private cloud environments
- Incident Management and support
- Logging, monitoring and event management
- Overall, Health management of all environments
- Be available on-call US/UK hours as part of team maintaining SaaS availability 24×7
Management Role:
- Capacity Management
- Change Management
- Organization and People Management
- Hire and train high performing team
- Develop succession plan and delegate and elevate
- Ensure that the departmental goals and scorecard metrics are met
- Effectively communicate and establish department goals
Required Knowledge & Experience
- 8+ years of progressive experience in IT design and implementation of various technology and operations
- 5+ years in designing/ managing/ leading cloud-based environments supporting IaaS and SaaS serving users preferably within an IT Services organisation
- Strong expertise in managing cloud-based infrastructure
- Experience in troubleshooting HW, OS and App related issues
- Public Cloud knowledge: AWS, Azure or SoftLayer, GCP
- Experience of security management would be good to have
- Expert knowledge of Linux or Unix based system administration
- Good knowledge of computing, storing and of network architecture
- Deployment & Configuration management: Puppet, Chef, Salt or Ansible
- Scripting Language: Shell, Python, PHP, Go or Perl
- Log Management: Elasticsearch, Syslog or Logstash
- Monitoring: Graphite, Nagios or Ganglia