PySpark - Mumbai - confidential

    confidential
    confidential Mumbai

    2 days ago

    Full time
    Description
    • Key Responsibilities:
    • PySpark Development:
    • Design, implement, and optimize PySpark solutions for large-scale data processing and analysis.
    • Develop data pipelines using Spark to handle data transformations, aggregations, and other complex operations efficiently.
    • Write and optimize Spark SQL queries for big data analytics and reporting.
    • Handle data extraction, transformation, and loading (ETL) processes from various sources into a unified data warehouse or data lake.
    • Data Pipeline Design & Optimization:
    • Build and maintain ETL pipelines using PySpark, ensuring high scalability and performance.
    • Implement batch and streaming processing to handle both real-time and historical data.
    • Optimize the performance of PySpark applications by applying best practices and techniques such as partitioning, caching, and broadcast joins.
    • Data Storage & Management:
    • Work with large datasets and integrate them into storage solutions such as HDFS, S3, Azure Blob Storage, or Google Cloud Storage.
    • Ensure efficient data storage, access, and retrieval through Spark and other tools (e.g., Parquet, ORC).
    • Maintain data quality, consistency, and integrity throughout the pipeline lifecycle.
    • Cloud Platforms & Big Data Frameworks:
    • Deploy Spark-based applications on cloud platforms such as AWS (Amazon EMR), Azure HDInsight, or Google Dataproc.
    • Work with cloud-native services such as AWS Lambda, S3, Google Cloud Storage, and Azure Data Lake to handle and process big data.
    • Leverage cloud data processing tools and frameworks to scale and optimize the PySpark jobs.
    • Collaboration & Integration:
    • Collaborate with cross-functional teams (data scientists, analysts, product managers) to understand business requirements and develop appropriate data solutions.
    • Integrate data from multiple sources and platforms (e.g., databases, external APIs, flat files) into a unified system.
    • Provide support for downstream applications and data consumers by ensuring timely and accurate delivery of data.
    • Performance Tuning & Troubleshooting:
    • Identify bottlenecks and optimize Spark jobs to improve performance.
    • Conduct performance tuning of both the cluster and individual Spark jobs, leveraging Spark's in-built tools for monitoring.
    • Troubleshoot and resolve issues related to data processing, application failures, and cluster resource utilization.
    • Documentation & Reporting:
    • Maintain clear and comprehensive documentation of data pipelines, architectures, and processes.
    • Create technical documentation to guide future enhancements and troubleshooting.
    • Provide regular updates on the status of ongoing projects and data processing tasks.
    • Continuous Improvement:
    • Stay up to date with the latest trends, technologies, and best practices in big data processing and PySpark.
    • Contribute to improving development processes, testing strategies, and code quality.
    • Share knowledge and provide mentoring to junior team members on PySpark best practices.
    • Required Qualifications:
    • 2-4 years of professional experience working with PySpark and big data technologies.
    • Strong expertise in Python programming with a focus on data processing and manipulation.
    • Hands-on experience with Apache Spark, particularly with PySpark for distributed computing.
    • Proficiency in Spark SQL for data querying and transformation.
    • Familiarity with cloud platforms like AWS, Azure, or Google Cloud, and experience with cloud-native big data tools.
    • Knowledge of ETL processes and tools.
    • Experience with data storage technologies like HDFS, S3, or Google Cloud Storage.
    • Knowledge of data formats such as Parquet, ORC, Avro, or JSON.
    • Experience with distributed computing and cluster management.
    • Familiarity with Linux/Unix and command-line operations.
    • Strong problem-solving skills and ability to troubleshoot data processing issues.

  • confidential Mumbai Full time

    A PySpark developer with expertise in designing ETL pipelines, handling large datasets using CDC operations, performance tuning Spark jobs. · ...

  • Pyspark

    1 month ago

    Only for registered members Mumbai, Maharashtra

    We are a global leader in the technology arena and there's nothing that can stop us from growing together. · Pyspark · ...

  • Only for registered members Mumbai

    +Collaborate and manage team to perform. · +Expected to be an SME. · Facilitate knowledge sharing sessions. · +Must To Have Skills: Proficiency in PySpark. · ...

  • Only for registered members Mumbai Metropolitan Region

    We are seeking a highly skilled Technical Lead with 4 to 9 years of experience · We will work on cutting-edge projects during day shifts with no travel required.Lead the design and implementation of scalable data solutions using Databricks Workflows. · Mentor and guide junior te ...

  • Only for registered members Mumbai

    ,Role_Azure databricks ,PySpark ,Required Technical Skill Set · • Experience with Microsoft Azure (data bricks , data factory , SQL, Storage, Web apps, web roles, worker roles , Service Fabric),Pyspark · ...

  • Only for registered members Mumbai Full time

    We are seeking a PySpark developer to join our team at Citi. As one of the world's most global banks we're changing how the world does business. · ...

  • Only for registered members Mumbai Full time

    We're hiring a Spark Scala Developer who has real-world experience working in Big Data environments, · both on-prem and/or in the cloud.Design and develop scalable data pipelines using Apache Spark and Scala · Optimize and troubleshoot Spark jobs for performance (e.g. memory mana ...

  • Only for registered members Mumbai

    The Data Engineer is responsible for designing, building and optimizing our data ecosystem. With a focus on delivering high-quality and trustworthy data. · Expertise and experience in Python and Pyspark (at least 4 years of experience) · Experience with BI tools, SQL queries, and ...

  • Only for registered members Mumbai

    Data Engineer will help us generate insights by leveraging the latest Artificial Intelligence (AI) and Analytics techniques to deliver value to our clients. · ...

  • Only for registered members Mumbai, MH, India

    Accelerate your career with PradeepIT · ...

  • Only for registered members Mumbai

    Ingest data from disparate sources (Structured, unstructured and semi-structured) and develop ETL jobs using the above skills. · ...

  • Only for registered members Mumbai, Maharashtra

    Must have strong knowledge of ETL processes using frameworks like Azure Data Factory or Synapse or Databricks; establishing cloud connectivity between different systems like ADLS ADF Synapse Databricks etc. · ...

  • Only for registered members Mumbai

    We are looking for an experienced Senior Python & Spark Developer with strong expertise in Python (Django, Flask) and PySpark for large-scale data processing. · ...

  • Only for registered members Mumbai City

    Tata Consultancy Services is hiring for a Big Data with Pyspark/Spark Scala role in Mumbai. The job requires 5+ years of experience and skills in Pyspark, Hive, Hbase. · ...

  • Only for registered members Mumbai

    +Azure Data Engineer role involves working with Azure databricks, PySpark, ETL processes using frameworks like Azure Data Factory or Synapse. · +Design and develop ETL processes based on functional and non-functional requirements in python / pyspark within Azure platform. · Candi ...

  • Only for registered members Mumbai

    You re ready to gain the skills and experience needed to grow within your role and advance your career. · ...

  • Only for registered members Mumbai, Maharashtra

    TCS invites applications for Bigdata with Pyspark/Spark Scala experience. The role involves ingesting data from disparate sources developing ETL jobs. · ...

  • Only for registered members Mumbai City

    Youre ready to gain the skills and experience needed to grow within your role and advance your career — and we have the perfect software engineering opportunity for you. · ...

  • Only for registered members Mumbai Full time

    You're ready to gain the skills and experience needed to grow within your role and advance your career — and we have the perfect software engineering opportunity for you. · ...

  • Only for registered members Mumbai

    You will be aligned with Insights & Intelligence vertical and help us generate insights by leveraging the latest Artificial Intelligence (AI) and Analytics techniques to deliver value to our clients. · ...

Jobs
>
Pyspark
>
Jobs for Pyspark in Mumbai