Data Engineer - Pune, India - The Citco Group Limited

    The Citco Group Limited
    The Citco Group Limited Pune, India

    Found in: Appcast Linkedin IN C2 - 1 week ago

    Default job background
    Description

    About Citco

    Since the 1940s Citco has provided specialist financial services to alternative investment funds, investors, multinationals and private clients worldwide. With over 6,000 employees in 45 countries we pioneer innovative solutions that meet our clients' evolving needs and deliver exceptional service.

    About the Role:

    The Data Engineering will be a key member of the Operations Data Lake (ODL) team and will be responsible for overseeing the design, development, and optimization of data pipelines to ingest structured and unstructured data in the lake. The incumbent will play a pivotal role in supporting multiple domain teams to create organizational data assets while ensuring data integrity, security and implementing centralized governance.

    Job Duties in Brief:

    • Lead and manage multiple data projects leveraging Databricks, from project initiation to completion, ensuring adherence to project timelines, budget, and quality standards.
    • Collaborate closely with cross-functional teams including Subject Matter Experts, data scientists, engineers, analysts, and other stakeholders to define project requirements, scope, and deliverables.
    • Define and implement scalable and robust data lake architectures leveraging Databricks Delta Lake technology
    • Design data ingestion, transformation, and storage strategies to ensure efficient and reliable data management
    • Oversee the development of data pipelines to ingest, process, and transform data from various sources into Databricks Delta Lake
    • Define data models and schemas to support analytical and reporting needs.
    • Optimize data structures, partitioning strategies, and storage formats for efficient query performance
    • Implement ML pipelines and workflows for model training, validation, and deployment using Databricks MLflow and related tools to support real-time and batch inference.
    • Work closely with BI Analysts and Data Visualization specialists to design and optimize data schemas and structures for BI reporting and analytics.
    • Establish monitoring and alerting mechanisms to proactively detect issues and optimize data lake performance
    • Stay abreast of industry trends, best practices, and emerging technologies in data engineering and Databricks Delta Lake
    • Provide technical guidance and leadership on Databricks best practices, methodologies, and implementation strategies.
    • Manage Databricks clusters and resources efficiently to optimize performance, scalability, and cost-effectiveness.
    • Develop and maintain metadata management solutions to capture, organize, and govern data assets across the organization.

    About You:

    • Bachelor's degree in Computer Science, Information Systems, Data Science, or a related field. Advanced degree preferred.
    • Excellent communication, interpersonal, and leadership skills, with the ability to effectively collaborate with diverse teams and stakeholders.
    • Strong analytical and problem-solving abilities, with a focus on delivering innovative and impactful data-driven solutions.
    • 3-8 years of hands-on experience in leading data engineering projects in Databricks.
    • Deep expertise in Apache Spark, Databricks runtime environment, and Databricks Delta Lake
    • Professional/Associate level Databricks certification is required.
    • Strong understanding of master data management principles, metadata management, and data cataloging concepts and best practices.
    • Strong background in data modeling, ETL/ELT development, and data warehousing.
    • Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform and big data technologies.

    Assets:

    • Financial product knowledge and knowledge of Hedge Fund Administration
    • Experience in setting up and managing Data Center of Excellence (CoE) is highly desirable
    • Create interactive reports in Qlik/Tableau/Power BI/Alteryx
    • Experience integrating machine learning models and algorithms into data pipelines (experience with Databricks MLflow is a plus).
    • Experience working in an Agile environment with knowledge of JIRA, Confluence etc