Cloud Operations Lead

Job Summary

This is an opportunity for a self-starter to create the Cloud Operations function, prioritizing operational resilience, reliability of service, and data security, for a US/European Business Management Software Vendor, and build the team that will manage the platform supporting ERP customers operating across verticals. 


Main Responsibilities

Leadership Role:

  • Control the cloud service design, roadmap and delivery
  • Enhance customer experience for cloud customers
  • Resolve critical issues in production
  • Lead and motivate every member in the department and outside
  • Advise and enhance process-based improvements with the help of coordinators

Technical Role:

  • Work with product engineers and design team to maintain operationally resilient and secure cloud environments with lowest cost of ownership
  • Develop automation for cloud for reliable and continuous delivery
  • Manage operations in all public and private cloud environments
  • Incident Management and support
  • Logging, monitoring and event management
  • Overall, Health management of all environments
  • Be available on-call US/UK hours as part of team maintaining SaaS availability 24×7

Management Role:

  • Capacity Management
  • Change Management
  • Organization and People Management
  • Hire and train high performing team
  • Develop succession plan and delegate and elevate
  • Ensure that the departmental goals and scorecard metrics are met
  • Effectively communicate and establish department goals

Required Knowledge & Experience

  • 8+ years of progressive experience in IT design and implementation of various technology and operations
  • 5+ years in designing/ managing/ leading cloud-based environments supporting IaaS and SaaS serving users preferably within an IT Services organisation
  • Strong expertise in managing cloud-based infrastructure
  • Experience in troubleshooting HW, OS and App related issues
  • Public Cloud knowledge: AWS, Azure or SoftLayer, GCP
  • Experience of security management would be good to have
  • Expert knowledge of Linux or Unix based system administration
  • Good knowledge of computing, storing and of network architecture
  • Deployment & Configuration management: Puppet, Chef, Salt or Ansible
  • Scripting Language: Shell, Python, PHP, Go or Perl
  • Log Management: Elasticsearch, Syslog or Logstash
  • Monitoring: Graphite, Nagios or Ganglia