· Use analytical tools including Hive, Spark with Hortonworks distribution.
· Involve in importing large datasets from Sql, Netezza into HDFS using Sqoop.
· Perform analytics in hive with Hive queries, Views, Partitioning, Bucketing using HiveQL.
· Involve in converting Hive/SQL queries into Spark transformations using Spark RDD with Pyspark and Scala.
· Develop Sqoop jobs to import data in Text format from Sql database, Netezza and created hive tables on top of it.
· Develop Spark core and Spark SQL scripts using Scala for faster data processing.
· Transform the data using Spark applications for analytics consumption.
· Involve in conversion of python scripts into Scala.
· Troubleshoot errors in HBase Shell/API, Hive.
· Work on Spark components such as SQL, RDD, Data Frames and Datasets.
· Work on creating ETL pipeline from on prim to azure and automizing it airflow.
· Develop Airflow-Dags to automize the ETL process.
· Create an airflow plugin to log the ETL process running on prim cluster.
· Develop a python scripts to do data validation after data loaded in Sql-Server, Snowflake.
· Tune long running jobs in spark.
· Work on migrating data from Netezza to azure data lake using Sqoop and spark jobs.
· Work on creating incremental spark jobs to move Data from azure to snowflake.
This position requires a minimum of a bachelor’s degree in Computer Science, Computer Information Systems, Information Technology or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned courses.