This job has expired, please see additional jobs below
Service Reliability Engineer
Yahoo!
Sunnyvale, CA, United States
Job Details - this job has expired, please see similar jobs below
Description
Yahoo’s Service Reliability Engineering (SRE) team is seeking a talented “Rapid Response Engineer” to play a vital role in a team that runs critical site up operations and systems engineering for Yahoo most popular internet sites including Mail, Messenger, Sports, Finance, Games, News, Entertainment and many others. SRE’s mission is to increase the uptime of Yahoo’s properties by providing embedded subject matter experts to restore Yahoo services as soon as possible. The SREs are fast paced and work together to solve complex problems before they impact the amazing experiences our customers have on a daily basis.
About the Role
Do you...
• Have a passion for solving technical problems, from the network layer to the application?
• Spend time trying to figure out how something works?
• Want to make real web applications and back-end systems faster, more reliable, more efficient? This position requires an aggressive troubleshooter who can multitask on problems of varying difficulty, priority and time-sensitivity in order to keep Yahoo’s site up. This versatile position requires familiarity with all the support concepts of busy web sites: Systems and database administration; Networking; Process troubleshooting; QA and rollout automation.
Responsibilities
• Serves as technical and project leader for projects with limited scope and complexity
• Assist in system builds, configuration, patches, and upgrades
• Troubleshoots complex components across distributed systems.
• Interfaces with product development team to understand how the application works / runs in production
• Lead weekly incident review meetings across all support teams.
• Identifies automation opportunities and implements where capable.
• Mentors the technical work and leadership of other Operations Engineers
• Resolution of escalated system issues within SLA
• Work across multiple verticals to resolve critical incidents.
• Propose solutions for repetitive issues and collaborate across support teams in their implementation.
• Consult on post mortems and own action items to avoid repeated outages.
• Well documented incident resolution and triage.
• Technical presentations
Minimum Qualifications
• Two to five years of systems engineering experience
• A degree in computer science or a related field
• Familiarity with networked/ distributed computing environment concepts
• Ability to write scripts (e.g., Perl, Ruby, Python, shell)
• Solid knowledge of various operating systems (ie Linux, UNIX, Windows)