This job has expired, please see additional jobs below
Data Engineer / Data Science
Hearst Magazines
New York, NY, United States
Job Details - this job has expired, please see similar jobs below
Position at Corporate Technology
The Data Services team at Hearst, part of Corporate Technology, serves as a central repository for data that touches businesses throughout the company. We ingest many terabytes of data each day from a wide variety of internal and external sources. The Data Engineers (2 roles) will be key members of the team, responsible for data ingestion including ETL and quality assurance. Data will often need to be replicated or moved internally, and aggregated or transformed for consumption by BI tools and front-end applications. The engineers will write scripts to gather data from sources such as REST and SOAP APIs, flat files delivered in a variety of formats, and transports such as AWS Kinesis. The language of choice in Data Services is Python, and we work with many elements of the AWS and Google Cloud stacks, including Redshift and BigQuery.
Skills & Requirements
• Minimum 3-5 years experience with one or more scripting languages. Must be comfortable with Python; if it is not your primary language, you must demonstrate an ability to quickly become an advanced user
• Advanced SQL skills, including significant PostGres or Redshift. Experience with MySQL and/or Google BigQuery SQL are a plus. Must understand how to apply advanced techniques such as regex matches, date transformations/truncations, and window functions in one or more SQL variants
• 3+ years experience with the Linux shell; must understand the crontab and how to automate jobs
• 1-2+ years experience writing scripts to retrieve data from REST and/or SOAP APIs. Past experience with oAuth authentication is a plus
• Must have an ability to independently track down data sources and the contacts required to get access to them. Must have good personal skills and be able to negotiate the corporate structure.
• Must have experience writing ETL/ELT procedures and transforming large datasets, preferably “from scratch” using a scripting language.
• Must have significant experience working in a cloud environment; comfort with AWS and/or Google Cloud is preferred. Experience with AWS Kinesis & lambda and/or Google PubSub, DataFlow, & Cloud Function is a plus.
• Experience with NoSQL databases such as DynamoDB or Cassandra is a plus
• Experience with Spark, Hadoop (Hive/Pig), Java or Scala are pluses