Spark Python - Chennai, India - Cognizant

Cognizant

Verified Company

Chennai, India

1 week ago

Posted by:

Deepika Kaur

beBee Recuiter

Strong expertise in Big Data ecosystem like Spark, Hive, Sqoop, HDFS, Map Reduce, Kafka, Oozie, Yarn, Hbase, Nifi.

In depth Knowledge on Architecture of Distributed Systems and Parallel Computing.
Experience implementing end to end data pipelines for serving reporting and data science capabilities.
Strong experience working with various configurations of Spark like broadcast thresholds, increasing shuffle partitions, caching, repartitioning etc., to improve the performance of the jobs.
In depth knowledge on import/export of data from Databases using Sqoop.
Well versed in writing complex hive queries using analytical functions.
Knowledge in writing custom UDF's in Hive to support custom business requirements.
Solid experience in using the various file formats like CSV, TSV, Parquet, ORC, JSON and AVRO.
Experience in using the compression techniques like Gzip, Snappy with in Hadoop.
Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
Experience in using the cloud services like Amazon EMR, S3, EC2, Red shift and Athena.
Experience automating endtoend data pipelines with strong resilience and recoverability.
Worked on Spark Streaming and Structured Spark streaming including Kafka for real time data processing.
Strong knowledge of version control systems like SVN and GITHUB.
Involved in production monitoring using workflow monitor and experience in development and support environments.
Experienced in using waterfall, Agile and Scrum models of software development process framework.