This job has expired, please see additional jobs below
Software Engineer - Data Infrastructure
Twitter
San Francisco, CA, United States
Job Details - this job has expired, please see similar jobs below
Who We Are:
Twitter’s globally distributed, real-time communications network generates trillions of events and petabytes of data per day. As engineers on the Data Platform team, our mission is to build the fastest, most reliable, and largest-scale data processing technologies in the world - able to cope with ever-increasing volumes of data in real time - and then apply them to the company’s most critical and fundamental data problems.
What You’ll Do:
As a member of the team, you will build systems to manage hundreds of petabytes of data and process tens of millions of events per second in real time. The services you build will integrate directly with Twitter’s products, opening the door to new and cutting-edge features. You will work with open-source technologies such as Hadoop, Scalding, Heron, and Presto and be an active member of the open-source community. You will empower dozens of engineering teams, hundreds of co-workers, and millions of users to dream of new insights and new possibilities.
Data Platform is hiring for the following infrastructure teams:
• Hadoop Infrastructure - one of the industry’s largest-scale deployments, with over 36,000 total nodes, 700 internal users, 500 petabytes of storage, and a track record of active contributions to the Apache Hadoop code base
• Data I/O - technologies to ingest, archive, and replicate source data at scales of trillions of messages per day
• Real-Time Compute Infrastructure - cutting-edge streaming and interactive compute technologies, including Presto and Twitter Heron
• Core Data Libraries - open source abstraction libraries, such as Scalding, Summingbird, and Parquet, that power almost all of Twitter’s batch and streaming data applications
• Data Pipeline - tools and services that simplify data discovery, data management, and job scheduling for engineers and data scientists
Help us solve some of our biggest challenges!
• Scale Hadoop clusters beyond 10,000 nodes
• Integrate Scalding with next-generation frameworks such as Spark and Tez
• Optimize our batch and real-time compute stacks to improve efficiency and reliability
• Understand how Twitter’s data is used and “what it all means”
• Work with hardware, network, and SiteOps teams to design next-generation storage and compute platforms
Who You Are:
You want to be part of a community of the most talented, forward-thinking engineers in the industry. You want to optimize existing real-time and batch applications and save the company millions of dollars. You want to learn, work with, and contribute to cutting-edge open-source technologies. The ideal candidate has experience with and/or a history of contributions to Hadoop, Spark, Hive, Scalding, Parquet, or similar technologies. You are a strong Java, Scala, or C++ developer. You have experience in distributed systems, database internals, Linux and networking fundamentals, or performance analysis.
Requirements:
• BS, MS, or PhD in computer science or a related field, or equivalent work experience
We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.
San Francisco applicants: Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.