Machine Learning Operations
Stealth Startup
May 2022 –
Nov 2023
Toronto
- Responsible for production code and infrastructure deployment of all ML Operations
- Reduced core compute pipeline execution time by a factor of 15 while also reducing cost by 70%
- Setup all AWS infrastructure for the company and evaluated new ML developments like AWS Inferentia 2 in collaboration with teams at AWS
- Procured, setup and administrated a new compute cluster with fully composable GPU infrastructure using Dell and Liqid
- Established a company-wide configuration system based on hydra and Git
- Trained coworkers to transition from pure research roles to modern software engineering and development, following best-practices like CI, testing, code reviews and VCS