Job Description
What you will be responsible for
As a Cloud Data Engineer Cyber Data Science, you will:
- Use your understanding of large scale data processing and analytics to wrangle our unique cybersecurity data and create analyses and tools that point to the most significant business, governance, and risk management impacts.
- Build data warehousing and business intelligence systems to empower engineers, data scientists, and analysts to extract insights from our data.
- Work on our data lake, data warehouse, and stream processing systems to create a unified query engine, multi-model databases, analytics extracts and reports, as well as dashboard and visualizations
- Design and build petabyte scale systems for high availability, high throughput, data consistency, security, and end user privacy, defining our next generation of data analytics tooling
- Build data modeling and ELT workflows to produce Raw, Rationalized, co-Related, and Reporting data flows for graph, timeseries, structured, and semi-structured cybersecurity data
- You will mentor other engineers and promote software engineering best practices across the organization designing systems with monitoring, auditing, reliability, and security at their core.
- Come up with solutions for scaling data systems for various business needs and collaborate in a dynamic and consultative environment.
Education & Qualifications
Minimum Qualifications
- B.S., M.S., or PhD. in Computer Science or equivalent work experience
- 5+ years of experience with CS fundamental concepts and OOP languages like Java and Python
- Experience working with data warehouses or Databases like Snowflake, Redshift, Postgres, Cassandra etc
- Experience in big data technologies like Presto/Trino, Spark, Hadoop, Airflow, Kafka, Flink, dbt.
- Experience writing and optimizing complex SQL and ETL development and designing and building data warehouse, data lake or lake house solutions
- Experience with distributed systems and distributed data storage and large scale data warehousing solutions, like BigQuery, Athena, Snowflake, Redshift, Presto, etc.
- Experience working with large datasets and best in class data processing technologies for both stream and batch processing, graph and time series data, notebooks and analytic visualization environments.
- Strong communication and collaboration skills particularly across teams or with functions like data scientists or business analyst.
Preferred Experience
- 8+ years of experience with Python, Java, or similar languages, with cloud infrastructure (e.g. AWS, GCP, Azure), and deep experience working with big data processing infrastructures and ELT orchestration
- Experience with designing for data lineage, federation, governance, compliance, security, and privacy
- Experience developing batch and real-time feature stores, and developing coordinated batch, streaming and online model execution workflows, building and optimizing large scale data processing jobs in Spark, GraphX/GraphFrames, Spark Structured Streaming, as well as graph and time-series native operations.
- Experience with data quality monitoring and with building continuous data pipelines and implementing history and time-travel using modern data lake storage layers like Delta Lake, Iceberg, and LakeFS
- Experience with MLOps and iterative cycles of end-to-end development, MRM coordination, deployment, and monitoring of production grade ML models in a regulated high-growth tech environment
Job ID: 123342