- Use different data warehousing concepts to build a data warehouse for reporting purpose.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
- Assist Data Science / Modelling teams in setting up data pipelines & monitoring daily jobs.
- Develop and test ETL components to high standards of data quality and act as hands-on development lead.
- Oversee and contribute to the creation and maintenance of relevant data artifacts (data lineages, source to target mappings, high level designs, interface agreements, etc.).
- Ensuring that developer responsibilities are being met by mentoring, reviewing code and test plans, verifying that design best practices as well as coding and architectural guidelines, standards, and frameworks.
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Lead the data validation, UAT and regression test for new data asset creation.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 5+ years as Data Engineer with proficiency in SQL, Python & PySpark programming.
- Strong knowledge on MS Fabric and related services/functionalities like One lake, ADF, Synapse, etc and how to utilize them across the DE & Analytics spectrum
- Good experience in Airflow & experience working with ETL Tools like Glue, Fivetran, Talend, Matillion, etc
- Implement security protocols and access controls to protect sensitive data
- Good exposure and hands on knowledge on Datawarehouse, Data Lake solutions, Master data management, data modelling
- Very good knowledge on Data Quality Management and ability to perform data validation
- Proficiency in utilizing unit testing tools and practices to validate ETL processes, data pipelines, and other components of data engineering workflows.
- Strong knowledge on Hadoop, Hive, Databricks and RDBMS like Oracle, Teradata, SQL server etc
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Proficiency in at least one cloud platform (AWS, Azure, GCP) & developing ETL processes using ET tools, big data processing and analytics with Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
- Strong business acumen & demonstrated aptitude for analytics that incite action.
- Good experience in Databricks, Snowflake, etc
- Good experience building Real-Time streaming data pipelines with Kafka, Kinesis etc.
- Knowledge of Jinja/YAML templating in Python is a plus.
- Knowledge and experience in designing and developing RESTful services.
- Working knowledge of DevOps methodologies, including designing CI/CD pipelines
- Experience building distributed architecture-based systems, especially handling large data volumes and real-time distribution.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Initiative and problem-solving skills when working independently.
- Familiarity with Big Data Design Patterns, modelling, and architecture.
- Exposure to NoSQL databases and cloud-based data transformation technologies.
- Understanding of object-oriented design principles.
- Knowledge of enterprise integration patterns.
- Experience with messaging middleware, including queues, pub-sub channels, and streaming technologies.
- Expertise in building high-performance, highly scalable, cloud-based applications.
- Experience with SQL and No-SQL databases.
Data Engineer - Bengaluru, India - EXL
Description
Job Location: Bangalore/Gurgaon
Shift Timing: 12:00PM IST – 10:30 PM IST
Experience: 5+ years
Job Summary:
Data Engineer (DE) is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders.
Responsibilities:
· Collaborate with project stakeholders (client) to identify product and technical requirements. Conduct analysis to determine integration needs.
· Build data pipelines to ingest and transform the data into our Data platform.
· Apply best approaches for large scale data movement, capture data changes and apply incremental data load strategies.
· Develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data.
Qualifications (Must have):
· Expertise in building data pipelines in big data platforms; Good understanding of Data warehousing concepts
Qualifications (Preferred):