Job Description
At Bell, we do more than build world-class networks, develop innovative services and create original multiplatform media content – we’re revolutionizing how Canadians communicate.
If you’re ready to bring game-changing ideas to life and join a community that values bold ideas, professional growth and employee wellness, we want you on the Bell team.
About the Role:
We are looking for a Machine Learning Engineer, MLOps/DevOps who has experience productionizing, maintaining, and optimizing cloud-native ML and AI applications. The successful candidate will help us define, build, and maintain a machine-learning-as-a-service environment at Bell. Our mission is to enable the customer operations team at Bell to iterate quickly on hypotheses, scale their experiments to enterprise datasets, and efficiently productionize highly available, fault-tolerant machine learning applications. In this role, you will work cross-functionally with data scientists and data engineers across the organization to enable Bell to use machine learning to drive business decisions.
Job Duties/ Accountabilities
- Researching and assessing new technology/methodologies
- Collaborate with the product team to understand business and technical needs
- Participate in the design, development, testing and deployment of new ML services in on-prem and Cloud environments
- Participate in code review sessions
- Participate in the acceleration of GCP infrastructure adoption and best practices (MLOps, Managed Notebooks, maintenance and deployment)
- Automate and manage machine learning models, applications, dependencies and artifacts from dev to testing to production in Cloud
- Develop and maintain Cloud services using CI/CD pipelines and Infrastructure-as-Code
- Participate in capacity and infrastructure planning
- Support application upgrades, patches, testing and release management activities from on-prem and Cloud environments
- Create and maintain documentation on procedures, standards and best practices
Critical Skills/ Competencies:
- BA/BS degree in Computer Science or a related engineering field, or equivalent practical experience
- 3-5 years of experience with software development in Python and/or Java
- 3-5 years of technical or operation experience in cloud/enterprise settings
- Comfortable with ambiguity and ability to adapt in a fast-evolving environment
- Experience with containerization technologies Docker and Kubernetes/OpenShift
- Experience with version control such as Git
- Experience with at least one of the following data processing technologies Dataflow, DataProc, Apache Spark, Flink and Kafka
- Experience with at least one of the following storage technologies Hadoop, BigQuery, Redis, PostgreSQL, Kafka and PubSub
- Experience building and deploying ML models with Deep Learning framework Tensorflow and ecosystem such as TFX and/or Kubeflow
- Experience with workflow orchestration tools like Airflow
- Experience with Infrastructure-as-Code and tools such as Ansible and Terraform
- Experience with Cloud environments (GCP, Azure, AWS)
- Experience building and maintaining CI/CD pipelines (GitLab CI/CD, Jenkins, TeamCity) and applying best practices
- Strong understanding of DevOps concepts and practices
- Experience delivering and maintaining “software-as-a-service†applications via cloud-native architecture. Experience in the operationalization of Machine Learning projects using at least one of the popular frameworks or platforms (e.g. Kubeflow, AWS Sagemaker, Google Vertex AI, Azure Machine Learning, DataRobot, MLFlow)
Preferred Skills :
- Strong understanding of ML, AI concepts, workflows and processes, preferably with an operation background.
- Strong understanding of micro-services architecture
- Understanding of the latency and throughput trade-offs of ML systems
- Solid knowledge of computer/cloud networking
- Experience with complex distributed architecture
Job ID: 103788