Site Reliability Engineer Data & Platform

Job Overview

Location

Mississauga, Ontario

Job Type

Full Time Job

Job ID

121413

Date Posted

1 year ago

Recruiter

Raymond Catherine

Job Views

171

Job Description

**Now is an extremely exciting time to join a newly formed group within Citi. The Institutional Clients Group - Engineering and Architecture Practice (EAP) is responsible for defining and building core architecture and technology strategy for the ICG.**

**This position will be in Kafka as-a-Service team which sits under Common Platform Engineering (CPE). The CPE is a department within the EAP group whose mission is to provide engineering for common platform capabilities in ICG, engineer solutions that codify the firm's data strategy into frameworks & tools and to ensure 'Common Product' standards are defined to ensure efficient adoption of common components.**

**We are looking for a SRE with software engineering background who is passionate about running large scale, multi-tenant distributed data systems for customers that expect a very high level of availability. In this role, you will be responsible for the availability, performance, monitoring, emergency response, and capacity planning of the data systems.**

**If you love the hum of big data systems, thinking about how to make them run as smoothly as possible, and want to have a big influence on the architecture plus operational design points of the systems, then you will fit right in. Your solutions will be leveraged by tens of thousands of developers across Citi supporting applications used by hundreds of thousands of internal and client users.**

**What youll be doing:**

**Design & build observability solutions for distributed systems**

**Contribute to the continuous automation of toil, and drive & evangelize the four key DORA metrics**

**Establish Service Level Objectives for core services, monitor their Service Level Indicators, and implement error-budget based alerting**

**Help operational team by building solutions that allow them to identify and resolve health issues of the data systems as quickly as possible**

**Automate the deployment of infrastructure and application for data systems such as Kafka**

**Support the rapid growth of the platform, by expanding its strategy to deploy into an OpenShift environment and AWS Cloud environment (EKS/GKE)**

**Design and implement service improvements for performance & security, relentlessly improve reliability and facilitate effective incident response, mitigation & resolution**

**Write and review technical documents, including design, requirements, and process documentation**

**Advocate for a culture of platform automation with obsession for everything as-a-code approach**

**What we are looking for:**

**4+ years experience in Site Reliability Engineering to create scalable and highly reliable systems**

**Strong fundamentals in distributed systems design and operation with experience building automation to operate large-scale data systems**

**Experience designing & implementing observability solutions for data systems to enable a holistic view of system health**

**Strong understanding of modern site reliability engineering practices and ability to apply them to improve the reliability of systems**

**Experience creating, deploying, and managing the lifecycle of containerised applications on Kubernetes**

**Experience in an agile development environment with modern programming languages such as any of the following: Python, Golang, Java, Kotlin, Scala or similar**

**What gives you an edge:**

**Experience working with the distributed systems and stream processing solutions, hands on experience with Apache Kafka is highly desirable**

**Strong grasp of DevSecOps practices and ability to contribute to improving systems reliability, quality, and time-to-market**

**Experience designing and implementing multiple automated deployment pipelines at both applications and infrastructure level. Ideally, you would have experience with Ansible and Terraform on multiple projects**

**Experience working with the Hashicorp tool set, specifically Vault for secrets management and Consul for service discovery**

**Experience deploying applications and infrastructure into the cloud**

Job ID: 121413

Mendeley

Ottawa, Ontario

Bookmark or Share

Similar Jobs

Site reliability engineer data & platform Site reliability engineer data & platform

Meta is embarking on the most transformative change to its business and technolo...

Full Time Job

Deloitte

Full Time Job

Site reliability engineer data & platform Site reliability engineer data & platform

Deloitteâ€™s Enterprise Performance professionals are leaders in optimizing...

Full Time Job

Labcorp

Full Time Job

Site reliability engineer data & platform Site reliability engineer data & platform

Job Duties/Responsibilities:Determine the acceptability of specimens for testing...

Full Time Job

Braintrust

Full Time Job

Site reliability engineer data & platform Site reliability engineer data & platform

â€¢ JOB TYPE: Direct Hire Position (no agencies/C2C - see notes below)â€�...

Full Time Job

Welcome to My Jobs Centre! We specialize in connecting candidates with job opportunities worldwide. Our clients include recruitment agencies and employers across the globe, offering the latest full-time, part-time, permanent, temporary, and contract positions. Whether you're looking for roles in healthcare, retail, construction, engineering, warehouse, driving, or beyond, we’ve got you covered. Explore the diverse range of industries and find your perfect job today!

Follow Us

Site Reliability Engineer Data & Platform

Site Reliability Engineer Data & Platform

Job Overview

Mississauga, Ontario

Full Time Job

121413

1 year ago

Raymond Catherine

171

Job Description

Tags

Mendeley

Bookmark or Share

More Info

Location

Similar Jobs

Meta

Site reliability engineer data & platform Site reliability engineer data & platform

Deloitte

Site reliability engineer data & platform Site reliability engineer data & platform

Labcorp

Site reliability engineer data & platform Site reliability engineer data & platform

Braintrust

Site reliability engineer data & platform Site reliability engineer data & platform

Search By Locations:

Cookies

Site Reliability Engineer Data & Platform

Site Reliability Engineer Data & Platform

Job Overview

Mississauga, Ontario

Full Time Job

121413

1 year ago

Raymond Catherine

171

Job Description

Tags

Mendeley

Bookmark or Share

More Info

Location

Similar Jobs

Meta

Site reliability engineer data & platform Site reliability engineer data & platform

Deloitte

Site reliability engineer data & platform Site reliability engineer data & platform

Labcorp

Site reliability engineer data & platform Site reliability engineer data & platform

Braintrust

Site reliability engineer data & platform Site reliability engineer data & platform

Messages *

Resumes *

Unlock chat with Raymond Catherine

Search By Locations:

Cookies

Welcome Back!