Engineering Management for Data Science Systems & Teams: Battle-Tested Best Practices

A lot of progress has been made over the past decade on process & tooling managing large-scale, multi-tier, multi-cloud apps & API's. In contrast, there is far less common knowledge on best practices for managing machine learned models (classifiers, forecasters, etc.) - especially beyond the modeling / optimization / deployment process, once they are in production.

This talk summarizes best practices & lessons learned across the entire life cycle of such systems, based on nearly a decade of experience building & operating such systems at Fortune 500 companies across several industries. The talk is intended for engineering leaders, architects & ops managers, and covers seven areas:

Development process for fast, scalable & reproducible experimentation
Iterative development process for unsupervised & semi-supervised learning projects
Online versus offline accuracy metrics, and why both are needed
Common mistakes in measuring the accuracy of machine learned models
Handling feedback for non-stationary and adversarial learning projects
Staffing models, roles & responsibilities during the development of data science projects
DevOps, maintenance & support for models and systems in production

David Talby is Atigeo’s senior vice president of engineering. David has extensive experience in building & operating web-scale big data and data science platforms, as well as building world-class, agile, distributed teams. Previously he was with Microsoft’s Bing group where he led business operations for Bing Shopping in the US and Europe, and earlier he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams which helped scale Amazon’s financial systems. David holds a PhD in Computer Science along with two masters degrees, in Computer Science and Business Administration.