This job has expired, please see additional jobs below
Director of Production Engineering
Internet Industry Company
Ontario Toronto, , Canada
Job Details - this job has expired, please see similar jobs below
Production Engineering at Company encompasses the disciplines of site reliability engineering, infrastructure engineering, and developer productivity. Our team ensures that Company infrastructure is able to scale massively, while also delivering resilient systems, amazing performance, and impactful tools for our entire engineering team. We are in need of a Director who will focus on our transactional data processing and storage systems for Company. The current focus is on a sharded MySQL, memcache, and redis. You’ll be responsible for evolving this into a team that delivers these processing and storage systems as a service to our entire engineering team and our diverse set of services. Initially this will be growing the group from a single team to 2-4 teams of 6-12 people owning this focus. You may lead a direct team from time to time and be expected to work with the other leads to groom new leads and build and develop new teams. The team is currently split between Ottawa, Montreal and remote, while we also have Production Engineering members in Toronto and Waterloo.
We run some of the largest flash sales in the world, and have over 375,000+ merchants who have trusted us with their businesses. We also strive to make Company the most productive place to be a developer by empowering our entire RnD team with continuous delivery and great developer tooling, which lets us deploy over 50 times a day.
You will own growing the software engineering discipline within the team in order to tackle problems like:
◦ Automatically balancing merchants across 100s of shards based on size and volume.
◦ Applying intelligent back pressure (e.g. query “QoS scheduler”)
◦ Scaling our orchestration tooling to 1000 shards
You will evaluate the scaling and resiliency of our current architecture including:
◦ Datastore proxies
◦ Replication methods and clustering options
◦ Containerized stateful services
◦ ActiveRecord ORM improvements
◦ exploring, prototyping and delivering additional distributed data stores as first-class platform offerings
You will continue to delight our developers with great self-service tooling to accomplish:
◦ Online schema migrations
◦ Automatic backup verification
◦ Query performance analysis