Spark/PySpark Developer - Bihar, India - ATech

ATech Bihar, India

3 days ago

permanent Technology / Internet

Description

Job Profile : Spark ( Pyspark ) Developer

Industry Type : IT Services

Job description :

- The developer must have sound knowledge in Apache Spark and Python programming.

- Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.

- Experience in deployment and operationalizing the code is added advantage

Have knowledge and skills in Devops/version control and containerization.

Preferable having deployment knowledge.

- Create Spark jobs for data transformation and aggregation

Produce unit tests for Spark transformations and helper methods

- Write Scaladoc-style documentation with all code

- Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data

- Spark query tuning and performance optimization

Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.

- SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL)

- Experience working with (HDFS, S3, Cassandra, and/or DynamoDB)

- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)

- Experience in building cloud scalable high-performance data lake solutions

- Hands on expertise in cloud services like AWS, and/or Microsoft Azure.

- As a Spark developer you will manage the development of scalable distributed Architecture defined by the Architect or tech Lead in our team.

- Analyse, assemble large data sets to designed for the functional and non-functional requirements.

- You will develop ETL scripts for big data sources.

- Identify, design optimise data processing automate for reports and dashboards.

- You will be responsible for workflow optimizations, data optimizations and ETL optimization as per the requirements elucidated by the team.

- Work with stakeholders such as Product managers, Technical Leads Service Layer engineers to ensure end-to-end requirements are addressed.

- Strong team player to adhere to Software Development Life cycle (SDLC) and documentations needed to represent every stage of SDLC.

- Hands on working experience on any of the data engineering analytics platform (Hortonworks Cloudera MapR AWS), AWS preferred

- Hands-on experience on Data Ingestion Apache Nifi, Apache Airflow, Sqoop, and Oozie

- Hands-on working experience of data processing at scale with event driven systems, message queues (Kafka Flink Spark Streaming)

- Hands on working Experience with AWS Services like EMR, Kinesis, S3, Cloud Formation, Glue, API Gateway, Lake Foundation

- Hands on working Experience with AWS Athena

- Data Warehouse exposure on Apache Nifi, Apache Airflow, Kylo

- Operationalization of ML models on AWS (e.g. deployment, scheduling, model monitoring etc.)

- Feature Engineering Data Processing to be used for Model development

- Experience gathering and processing raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.)

- Experience building data pipelines for structured unstructured, real-time batch, events synchronous asynchronous using MQ, Kafka, Steam processing

- Hands-on working experience in analysing source system data and data flows, working with structured and unstructured data

- Must be very strong in writing SQL queries

)

Spark/PySpark Developer - Bihar, India - ATech

Description

for Recruiters

Information