Python/Big Data Developer

Job Description

In-depth understanding and knowledge of Hadoop and Spark architecture and RDD transformation
Proven experience in developing solutions using Spark architecture and PySpark for data engineering pipelines, transformation, and aggregation of data from a variety of sources into the data lake.
At least 3 or more years of relevant experience in developing PySpark programs using APIs. Expertise in different file formats like parquet, ORC.
Experience with troubleshooting, fine-tuning Spark and python based applications for scalability and performance.
Experience in designing hive tables to handle velocity, variety and to handle huge volumes.
Experience in data ingestion, processing and analyzing data using Spark/SQL from disparate sources.
Knowledge in using Spark-Submit and Spark UI. Experience in creating and then performing operations on Spark RDD.
Experience in creating Spark Data Frames from RDD, HIVE and Parquet files and then performing Joins and Aggregations on Dataframes.
Experience in processing data from Python and other API modules.