This job has expired, please see additional jobs below
Site Reliability Engineer
Entertainment & Media Industry Company
Philadelphia, PA, United States
Job Details - this job has expired, please see similar jobs below
Job Summary:
Company's Technology & Product organization works at the intersection of media and technology. Our innovative teams are continually developing and delivering products that transform the customer experience. From creating apps like TVGo to new features such as the Talking Guide on the X1 platform, we work every day to make a positive impact through innovation in the pursuit of building amazing products that are enjoyable, easy to use and accessible across all platforms. The team also develops and supports our evolving network architecture, including next-generation consumer systems and technologies, infrastructure and engineering, network integration and management tools, and technical standards.
As a Site Reliablity Engineer with the Xfinity Home Operations team, you will work other DevOps practitioners to produce mission-critical infrastructure, tools, and processes that enable our systems to scale at a rapid pace. One day might involve performance tuning of a Java web application; the next may be building tools to enable continuous delivery. You'll investigate and create new systems for scaling development and production.
You're right for the job if you're comfortable with deep technical Linux, networking topics, and distributed architectures. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization.
You will develop, deploy, and operate web and mobile properties and their supporting APIs at scale. You will work directly with Software Engineering teams to build our next generation, cloud-based, architecture.
Where are we headed?
Our goal is to build, scale and guard the systems that delight customers. To do so, you will need to strong skills in following areas:
Software Development
• Build tools and alarms that would inform of potential problems or customer issues
• Adapt what exists and build what doesn't to scale the system
• Build tools and develop processes for continuous integration and delivery of services
• Build and drive the automation systems that maintain system health
Site Reliability Engineering / Operations
• Root-cause complex problems involving multiple parties, networks, hardware and software that relate to scaling and performance
• Participate in on-call rotation
• Engender reliability and availability starting with metrics and measurements
• Enable scaling by providing tools, developing training or augmenting processes
• Secure the system from issues, be they real, perceived or notional
• Obsess over collecting and digesting metrics
• Experience with configuration management tools such as Ansible, CFEngine, Chef and Puppet
• Experience building tools for automation (building, testing, releasing, monitoring and alarming)
Qualifications:
• 6+ years in a software development role, operations role, or closely related position
• Experience administering Linux systems in a production environment
• Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell
• Bachelor's Degree in Computer Science or a related field, or relevant work experience
• Excellent problem solving skills with a strong attention to detail
• Experience with distributed version control systems like Git or Mercurial
• Ability to dive deep into complex technical problems
• Experience with IaaS and PaaS providers such as AWS, OpenStack, Heroku, and CloudFoundry
• A sense of ownership, initiative, and drive
• Experience with enterprise monitoring solutions like InfluxDB, AppDynamics, Graphite, Racon, Grafana, Nagios, and Splunk
• Familiarity with continuous integration/deployment processes and tools such as Artifactory, Gerrit, Git, Jenkins, Maven and Nexus
Company is an EOE/Veterans/Disabled/LGBT employer