Site Reliability Engineer:
The Site Reliability Engineer will be part of our client’s SaaS Operations team that is responsible for the day-to-day operations of SaaS offerings including servers, operating systems, storage, and supporting systems. You will manage availability, latency, scalability and efficiency of our client’s online platform; Automate their infrastructure and operations to maximize reliability, availability, scalability, and performance; Solve live performance and stability issues and prevent their recurrence; Evaluate hardware and software technologies to improve efficiency and performance; and participate in performance & capacity monitoring and planning.
Key Responsibilities:
• Manage an international, 24x7, multi-site infrastructure powering our client’s customer facing service offerings
• Participate in the design, implementation, and operation of a scalable and reliable systems infrastructure supporting a fast-growth SaaS environment
• Ensure proper security, monitoring, alerting, and reporting for the infrastructure
• Participate in evaluation of new software, hardware and infrastructure solutions
• Participation in a follow-the-sun on-call rotation
Required Skills, Knowledge & Experience:
• BS/BA degree in Computer Science, Information Systems or equivalent experience
• Strong working knowledge of Linux systems and applications including Java and Apache
• Minimum of 5 years’ hands on Linux administration experience
• Ability to automate common tasks. You use automation to make your job more efficient (Ruby, Python, Java shell and power shell scripting etc.)
• In-depth understanding of web operations best practices.
• Strong networking fundamentals. You understand TCP/IP, VLAN’s, DNS, load balancing and firewalling.
• Thorough understanding of vSphere
• Experience in monitoring tools such as Zabbix, Graylog, Grafana, Bosun, etc.
• Solid communicator with great customer service skills.
Preferred Qualifications:
• Working knowledge of Windows Server setup and administration including IIS and the deployment/support of .Net applications
• Experience with configuration management. You have managed an infrastructure with hundreds or thousands of servers using deployment frameworks such as Puppet, Chef, Foreman, Perforce, Jenkins, GIT, etc.
• Experience with storage systems like NetApp and Hitachi
• Experience with AWS and EC2 instances
