Data Engineer - Bengaluru, India - EXL

    EXL
    EXL background
    Description

    Job Location: Bangalore/Gurgaon

    Shift Timing: 12:00PM IST – 10:30 PM IST

    Experience: 5+ years

    Job Summary:

    Data Engineer (DE) is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders.

    Responsibilities:

    · Collaborate with project stakeholders (client) to identify product and technical requirements. Conduct analysis to determine integration needs.

    • Use different data warehousing concepts to build a data warehouse for reporting purpose.

    · Build data pipelines to ingest and transform the data into our Data platform.

    • Create and maintain data models, including schema design and optimization.
    • Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.

    · Apply best approaches for large scale data movement, capture data changes and apply incremental data load strategies.

    · Develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data.

    • Assist Data Science / Modelling teams in setting up data pipelines & monitoring daily jobs.
    • Develop and test ETL components to high standards of data quality and act as hands-on development lead.
    • Oversee and contribute to the creation and maintenance of relevant data artifacts (data lineages, source to target mappings, high level designs, interface agreements, etc.).
    • Ensuring that developer responsibilities are being met by mentoring, reviewing code and test plans, verifying that design best practices as well as coding and architectural guidelines, standards, and frameworks.
    • Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
    • Lead the data validation, UAT and regression test for new data asset creation.

    Qualifications (Must have):

    • Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
    • 5+ years as Data Engineer with proficiency in SQL, Python & PySpark programming.
    • Strong knowledge on MS Fabric and related services/functionalities like One lake, ADF, Synapse, etc and how to utilize them across the DE & Analytics spectrum
    • Good experience in Airflow & experience working with ETL Tools like Glue, Fivetran, Talend, Matillion, etc
    • Implement security protocols and access controls to protect sensitive data
    • Good exposure and hands on knowledge on Datawarehouse, Data Lake solutions, Master data management, data modelling
    • Very good knowledge on Data Quality Management and ability to perform data validation
    • Proficiency in utilizing unit testing tools and practices to validate ETL processes, data pipelines, and other components of data engineering workflows.
    • Strong knowledge on Hadoop, Hive, Databricks and RDBMS like Oracle, Teradata, SQL server etc
    1. Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
    2. Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
    3. Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
    • Proficiency in at least one cloud platform (AWS, Azure, GCP) & developing ETL processes using ET tools, big data processing and analytics with Databricks.

    · Expertise in building data pipelines in big data platforms; Good understanding of Data warehousing concepts

    • Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
    • Strong business acumen & demonstrated aptitude for analytics that incite action.

    Qualifications (Preferred):

    • Good experience in Databricks, Snowflake, etc
    • Good experience building Real-Time streaming data pipelines with Kafka, Kinesis etc.
    • Knowledge of Jinja/YAML templating in Python is a plus.
    • Knowledge and experience in designing and developing RESTful services.
    • Working knowledge of DevOps methodologies, including designing CI/CD pipelines
    • Experience building distributed architecture-based systems, especially handling large data volumes and real-time distribution.
    • Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
    • Expectation is to have strong problem-solving and troubleshooting skills.
    • Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
    • Initiative and problem-solving skills when working independently.
    • Familiarity with Big Data Design Patterns, modelling, and architecture.
    • Exposure to NoSQL databases and cloud-based data transformation technologies.
    • Understanding of object-oriented design principles.
    • Knowledge of enterprise integration patterns.
    • Experience with messaging middleware, including queues, pub-sub channels, and streaming technologies.
    • Expertise in building high-performance, highly scalable, cloud-based applications.
    • Experience with SQL and No-SQL databases.