Careers

 Home / Careers / Job Postings

Big Data Engineer

Job Description

· Use analytical tools including Hive, Spark with Hortonworks distribution.

· Involve in importing large datasets from Sql, Netezza into HDFS using Sqoop.

· Perform analytics in hive with Hive queries, Views, Partitioning, Bucketing using HiveQL.

· Involve in converting Hive/SQL queries into Spark transformations using Spark RDD with Pyspark and Scala.

· Develop Sqoop jobs to import data in Text format from Sql database, Netezza and created hive tables on top of it.

· Develop Spark core and Spark SQL scripts using Scala for faster data processing.

· Transform the data using Spark applications for analytics consumption.

· Involve in conversion of python scripts into Scala.

· Troubleshoot errors in HBase Shell/API, Hive.

· Work on Spark components such as SQL, RDD, Data Frames and Datasets.

· Work on creating ETL pipeline from on prim to azure and automizing it airflow.

· Develop Airflow-Dags to automize the ETL process.

· Create an airflow plugin to log the ETL process running on prim cluster.

· Develop a python scripts to do data validation after data loaded in Sql-Server, Snowflake.

· Tune long running jobs in spark.

· Work on migrating data from Netezza to azure data lake using Sqoop and spark jobs.

· Work on creating incremental spark jobs to move Data from azure to snowflake.


Qualification:

 

This position requires a minimum of a bachelor’s degree in Computer Science, Computer Information Systems, Information Technology or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned courses. 



Job Tags

Big Data Engineer, SQL, RDD, Data Frames, Datasets, Python, ETL
 Job Location : Princeton
 Job Type : Full Time
 Job Creation Date : 06/09/20 1:16 PM
· Partner with internal stakeholders to understand business requirements, work with cross-functional data and product teams and build efficient and scalable data solutions.· Work in various phases of the software development life cycle right from Requirements gathering, Analysis, Design, Development, and Testing to Production.· Develop sustainable distributed data management solutions and workflows required for Data migration to cloud to enable analytics and derive business insights.· Build and maintain infrastructure pipelines for data extractio... read more
 Job Location : Princeton
 Job Type : Full Time
 Job Creation Date : 06/18/20 8:02 AM
· Partner with internal stakeholders to understand business requirements, work with cross-functional data and product teams and build efficient and scalable data solutions. · Work in various phases of the software development life cycle right from Requirements gathering, Analysis, Design, Development, and Testing to Production. · Provide Solutions to complex problems in data from streaming sources like Event Hubs and Optimizing existing pipelines for faster processing of data · Develop sustainable distributed data management solutions and work... read more
 Job Location : Princeton
 Job Type : Full Time
 Job Creation Date : 11/08/21 2:26 PM
· Use analytical tools including Hive, Spark with Hortonworks distribution. · Involve in importing large datasets from Sql, Netezza into HDFS using Sqoop. · Perform analytics in hive with Hive queries, Views, Partitioning, Bucketing using HiveQL. · Involve in converting Hive/SQL queries into Spark transformations using Spark RDD with Pyspark and Scala. · Develop Sqoop jobs to import data in Text format from Sql database, Netezza and created hive tables on top of it. · Develop Spark core and Spark SQL scripts using S... read more