Job Description
Our client, a leading global IT service provider,is recruiting for a Data Engineer - Streaming to join their business in the UK.
Key Responsibilities:
Participate in requirement gathering, analysis, solution design to build big data engineering applications on AWS Data services such as AWS EMR, Snowflake using Spark and AWS Glue as ETL framework.
Migrate and build Cloudera based hadoop, hive, impala, Kafka, Sqoop, Spark based data, jobs and security policies to AWS EMR , S3 , Snowflake and other AWS services such AWS Glue, DMS, IAM.
Hands on involvement in Low Level Design, Development and architecting of large Data projects leading developer and testing teams.
Job Scheduling and Automation.
Data Validation, Quality checks, profiling and data reconciliation testing.
Work as an individual contributor as well lead teams when required.
Mentor junior members in the team by improving their skills and knowledge and have the ability to get things delivered.
Work with both senior and junior team members like Project Manager, Hadoop Architect, other data engineers, data modelers, report developers, testers to complete the task.
Setting up security and governance policies on data, users, data pipelines on AWS data services.
Troubleshooting application errors and ensuring that they do not occur again.
Apply agile and CI/CD methodologies and tools for development and deployment in agile mode.
Must have:
Streaming Data Engineering and Analytics over it. Kafka, AWS Data Services, AWS, ETL, Spark/Scala, Java, Python, EMR, AWS Glue, AWS Athena
Good to have: Python, Spark, Hive, HDFS, Impala, Sqoop. Informatica
Qualifications:
Bachelor’s degree and 9+ years of experience in the IT industry.
5+ years’ experience as Big Data Engineer on Hadoop, AWS EMR and its ecosystems
Experience in Banking Domain plus.
Experience on AWS EMR, AWS Glue, Athena, S3, DMS, SCT and Cloudera CDH based hadoop.
Should have extensively worked on Big Data projects on AWS EMR based Hadoop, HDFS/S3, Spark, Hive, Impala leading teams and interacting with Architect roles and Client.
Work experience with SQL, RDBMS, complex queries.
Understanding and experienced in data warehousing data modeling concepts.
Has an understanding of large Batch and Stream processing.
Knowledge of Quality Assurance methodologies, exposure to all facets of Extract, Transform and Load (ETL) processes.
Proficiency on Linux / Unix command line.
Experience in applying agile and CI/CD tools and methodologies for deployment and related automations.
Experience in performance optimization techniques on Hadoop, HDFS, Hive, Spark, File Formats, and providing technical guidance to other application developers.
Banking domain experience is mandatory
English language fluent is required