Jobs

    Spark/PySpark Developer - Bihar, India - ATech

    ATech
    ATech Bihar, India

    3 days ago

    Default job background
    permanent Technology / Internet
    Description

    Job Profile : Spark ( Pyspark ) Developer

    Industry Type : IT Services

    Job description :

    - The developer must have sound knowledge in Apache Spark and Python programming.

    - Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.

    - Experience in deployment and operationalizing the code is added advantage

    Have knowledge and skills in Devops/version control and containerization.

    Preferable having deployment knowledge.

    - Create Spark jobs for data transformation and aggregation

    Produce unit tests for Spark transformations and helper methods

    - Write Scaladoc-style documentation with all code

    - Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data

    - Spark query tuning and performance optimization

    Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.

    - SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL)

    - Experience working with (HDFS, S3, Cassandra, and/or DynamoDB)

    - Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)

    - Experience in building cloud scalable high-performance data lake solutions

    - Hands on expertise in cloud services like AWS, and/or Microsoft Azure.

    - As a Spark developer you will manage the development of scalable distributed Architecture defined by the Architect or tech Lead in our team.

    - Analyse, assemble large data sets to designed for the functional and non-functional requirements.

    - You will develop ETL scripts for big data sources.

    - Identify, design optimise data processing automate for reports and dashboards.

    - You will be responsible for workflow optimizations, data optimizations and ETL optimization as per the requirements elucidated by the team.

    - Work with stakeholders such as Product managers, Technical Leads Service Layer engineers to ensure end-to-end requirements are addressed.

    - Strong team player to adhere to Software Development Life cycle (SDLC) and documentations needed to represent every stage of SDLC.

    - Hands on working experience on any of the data engineering analytics platform (Hortonworks Cloudera MapR AWS), AWS preferred

    - Hands-on experience on Data Ingestion Apache Nifi, Apache Airflow, Sqoop, and Oozie

    - Hands-on working experience of data processing at scale with event driven systems, message queues (Kafka Flink Spark Streaming)

    - Hands on working Experience with AWS Services like EMR, Kinesis, S3, Cloud Formation, Glue, API Gateway, Lake Foundation

    - Hands on working Experience with AWS Athena

    - Data Warehouse exposure on Apache Nifi, Apache Airflow, Kylo

    - Operationalization of ML models on AWS (e.g. deployment, scheduling, model monitoring etc.)

    - Feature Engineering Data Processing to be used for Model development

    - Experience gathering and processing raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.)

    - Experience building data pipelines for structured unstructured, real-time batch, events synchronous asynchronous using MQ, Kafka, Steam processing

    - Hands-on working experience in analysing source system data and data flows, working with structured and unstructured data

    - Must be very strong in writing SQL queries

    )