- Understand a data warehousing solution and able to work independently in such an environment
- Responsible in Project development and delivery experience of a few good size projects
- Design, build, optimize and support new and existing data models and ETL processes based on our clients business requirements.
- Build, deploy and manage data infrastructure that can adequately handle the needs of a rapidly growing data driven organization.
- Coordinate data access and security to enable data scientists and analysts to easily access to data whenever they need too.
- Experiencing developing scalable Big Data applications or solutions on distributed platforms
- Able to partner with others in solving complex problems by taking a broad perspective to identify innovative solutions
- Strong skills building positive relationships across Product and Engineering.
- Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
- Able to quickly pick up new programming languages, technologies, and frameworks
- Experience working in Agile and Scrum development process
- Experience working in a fastpaced, resultsoriented environment
- Experience in Amazon Web Services (AWS) or other cloud platform tools
- Experience working with Data warehousing tools, including Dynamo DB, SQL, Amazon Redshift, and Snowflake
- Experience architecting data product in Streaming, Serverless and Microservices Architecture and platform.
- Experience working with Data platforms, including EMR, Data Bricks etc
- Experience working with distributed technology tools, including Spark, Presto, Scala, Python, Databricks, Airflow
- Developed the Pysprk code for AWS Glue jobs and for EMR.. Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR, MapR distribution.
- Developed Python and pyspark programs for data analysis. Good working experience with python to develop Custom Framework for generating of rules (just like rules engine).
- Developed Hadoop streaming Jobs using python for integrating python API supported applications.
- Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark. Apache Spark DataFrames/RDD's were used to apply business transformations and utilized Hive Context objects to perform read/write operations.
- Re write some Hive queries to Spark SQL to reduce the overall batch time
- First and most important
- Sound understanding of data structures & SQL concepts and experience in writing complex SQL especially around OLAP systems
- Sound knowledge of the ETL tool like informatica, 5+ years of experience, Big Data technologies'' like Hadoop ecosystem, its various components, along with different tools including Spark, Hive, Sqoop,etc.
- Indepth knowledge of MPP/distributed systems
- The ability to write precise, scalable, and highperformance code
- The ability to write precise, scalable, and highperformance code
- Knowledge/Exposure in data modeling with OLAP (Optional)
Big Data Engineer - Goa/Mumbai/Jammu & Kashmir/Jammu/Srinagar/Pondicherry/Jaipur/Lucknow/Varanasi/Banaras/Patna/Ranchi, India - ATech
ATech
Goa/Mumbai/Jammu & Kashmir/Jammu/Srinagar/Pondicherry/Jaipur/Lucknow/Varanasi/Banaras/Patna/Ranchi, India
4 days ago
Description
Designation:
BIG DATA ENGINEER
Job Description:
Your Role and Responsibilities:
Required Technical and Professional Expertise:
Preferred Technical and Professional Expertise: