Job Description:
- Create a Low-Level implementation plan for the ingestion pipeline.
- Develop the ingestion pipeline based on HLD/LLD specifications.
- Define the storage directories in S3 and script the compaction and portioning based on consumption patterns.
- Work on the issues identified in the data pipeline.
- Integrate the pipeline jobs to enterprise scheduler and CI/CD Process.
- Work with support teams to promote it to prod
Skill Set:
- 10 plus years of Experience as Data Engineer
- Expertise in Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
- Experience in working with Oracle and DB2. And making the data to be batch processing using distributed computing.
- Good knowledge of High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security and IT Infrastructure.
- Expertise in Spark SQL, Tuning and Debugging the Spark Cluster
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Knowledge of tools like Datameer