This job has expired, please see additional jobs below
Senior Site Reliability Manager
iHeartMedia
San Antonio, TX, United States
Job Details - this job has expired, please see similar jobs below
Job Summary:
iHeartMedia is the No. 1 all-in-one digital audio service with over a billion downloads; we reached our first 20 million registered users faster than any digital service in Internet history and reached 110 million registered users faster than any other radio or digital music service and even faster than Facebook. Our areas of operations include radio broadcasting, online, mobile, digital and social media, live concerts and events, syndication, music research services and independent media representation.
iHeartMedia IT has begun a massive shift in the way we operate away from the traditional on-prem, dedicated infrastructure approach and towards the more flexible world of public cloud, agile development and CI/CD. This is a profound shift for us that has included big changes in our culture, skills and toolset. It affects the way we think about development, procurement, project/integration and operations. Along with the huge opportunities to increase our speed, flexibility and efficiency, this shift also presents challenges in rethinking the way we work.
The SRE Manager will build and lead a team of talented site reliability engineers responsible for building solutions to enhance availability, performance and stability of iHeartMedia’s services. We are looking for a strong leader who is passionate about automating out repetitive work, encourages collaboration, goes big, takes risks, blame-free, and promotes self-direction for the SREs. The SRE Manager will work cross-functionally with every software engineering team at iHeartMedia, ensuring we design and instrument tools and integrations that are of value from day one, driving towards simplicity and ultimately reliability.
What You Will Be Doing:
• Coach, mentor and provide direction and leadership of SRE team.
• Identify and implement non-functional requirements to improve service outcomes and business objectives.
• Develop, implement and maintain SRE Centers of Excellence.
• Partner with customers to collaboratively investigate & resolve incidents and establish blame-less port-mortems to drive service improvements.
• Providing SREs with the tools they need to meaningfully monitor and alert on the services and features they develop, using tools like Prometheus, AlertManager, Grafana, EFK (ELK) Stack, and PagerDuty.
• Analyze systems, find bottlenecks and leverage solutions to help you and the SRE team deliver services to improve performance of critical production systems.
• Partner with other engineering leaders to guide conversations and best practices in application architecture and cloud infrastructure.
• Advocate for performance, security, global scale, and automation.
• Drive decisions based on data, define and be held accountable for process, metrics, and SLAs for our operations and infrastructure.
Skills needed:
• 6+ years of software development experience, partly as a manager of an engineering team.
• Experience in the fields of devops, infrastructure engineering, systems engineering, or site reliability engineering.
• Functional leader who is willing to work hand and hand with the SRE team.
• You are a creative problem solver who understands and can articulate the tradeoffs involved in technical approaches and can demonstrate simplicity and resourcefulness when confronted with constraints.
• You continuously demonstrate empathy with non-technical audiences and engineers whose area of interest or experience lies outside of SRE and Infrastructure.
• You have strong opinions informed by experience on subjects like infrastructure-as-code, containerization, microservice orchestration, distributed systems, incident response, and AWS bric-a-brac, and are willing to experiment to refine those opinions.
Education:
• BS degree in Computer Science, Engineering or a related subject
• MS degree preferred
• AWS Certified Solutions Architect and similar certs preferred