This job has expired, please see additional jobs below
Principal Site Reliability Engineer
Pearson
Iowa City, IA, United States
Job Details - this job has expired, please see similar jobs below
Description
At Pearson, we’re committed to a world that’s always learning and to our talented team who makes it all possible. From bringing lectures vividly to life to turning textbooks into laptop lessons, we are always re-examining the way people learn best, whether it’s one child in our own backyard or an education community across the globe. We are bold thinkers and standout innovators who motivate each other to explore new frontiers in an environment that supports and inspires us to always be better. By pushing the boundaries of technology — and each other to surpass these boundaries — we create seeds of learning that become the catalyst for the world’s innovations, personal and global, large and small.
Pearson is an Equal Opportunity and Affirmative Action Employer and a member of E-Verify. All qualified applicants, including minorities, women, protected veterans, and individuals with disabilities are encouraged to apply.
This role provides technical leadership and architectural oversight to an operationally focused team, leveraging DevOps concepts to develop solutions for automated provisioning, configuration management, automated deployments, and other operational needs. The primary nature of this position and this team’s work will be software development practices applied to operational use cases. This position will play a key role in our organization’s evolution towards contemporary application and infrastructure management practices, and will be expected to both guide and support the team’s growth and learning.
Specific responsibilities include:
• Provide technical leadership and mentorship to a growing team focused on applying software engineering practices to operations at scale.
• Monitor and report on service level objectives for a given applications services. Work with business and product owners to establish key performance indicators.
• Lead and develop a team by organizing and conducting training events, game day scenarios, and focused engineering spikes.
• Design and architect operational solutions for managing applications and infrastructure, with the specific goal of increasing the automation, repeatability, and consistency of operational tasks.
• Create and maintain monitoring technologies and processes that improve the visibility to our applications' performance and business metrics and keep operational workload reasonable.
• Sponsor healthy software development practices – including complying with the chosen software development methodology (Agile, or alternatives), building standards for code reviews, work packaging, etc.
• Persistent testing of application and infrastructure resiliency over a variety of error conditions.
• Partering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
• Collaborate with internal teams to ensure that operational development solutions meet business requirements.
• Provide architectural and practical guidance to software development to improve resiliency, efficiency, performance, and costs.
• Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
Qualifications:
• Bachelor's degree or appropriate combination of education and experience is required. Masters degree preferred.
• 8+ years experience in IT operations and/or software development/architecture.
• Prior experience in architecting cloud-based solutions on AWS.
• Prior experience in managing EC2, EBS, and S3 on AWS.
• Proficiency in any of the following: Java, Golang, Python, Ruby.
• A strong understanding of diverse infrastructure platforms and concepts is required.
• Versatility as demonstrated with troubleshooting diverse sets of hosting technologies (web server platforms, Java application platforms, operating systems, network components, virtualization technologies, database platforms) is strongly desired.
• Understanding of general networking concepts and protocols is desired.
• Experience in a production environment supporting mission-critical applications is desired.
• Knowledge of standard IT management practices including change management and incident management is desired.