We have an opening for Data Warehousing, Big Data Technical Specialist / TS4 with the following -
Detailed Day-To-Day Job Duties to be performed:
• Participate in Team activities, Design discussions, Stand up meetings and planning Review with team.
• Perform data analysis, data profiling, data quality and data ingestion in various layers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIX shell scripts.
• Follow the organization coding standard document, Create mappings, sessions and workflows as per the mapping specification document.
• Perform Gap and impact analysis of ETL and IOP jobs for the new requirement and enhancements.
• Create jobs in Hadoop using SQOOP, PYSPARK and Stream Sets to meet the business user needs.
• Create mockup data, perform Unit testing and capture the result sets against the jobs developed in lower environment.
• Updating the production support Run book, Control M schedule document as per the production release.
• Create and update design documents, provide detail description about workflows after every production release.
• Continuously monitor the production data loads, fix the issues, update the tracker document with the issues, Identify the performance issues.
• Performance tuning long running ETL/ELT jobs by creating partitions, enabling full load and other standard approaches.
• Perform Quality assurance check, Reconciliation post data loads and communicate to vendor for receiving fixed data.
• Participate in ETL/ELT code review and design re-usable frameworks.
• Create Remedy/Service Now tickets to fix production issues, create Support Requests to deploy Database, Hadoop, Hive, Impala, UNIX, ETL/ELT and SAS code to UAT environment.
• Create Remedy/Service Now tickets and/or incidents to trigger Control M jobs for FTP and ETL/ELT jobs on ADHOC, daily, weekly, Monthly and quarterly basis as needed.
• Model and create STAGE / ODS / Data warehouse Hive and Impala tables as and when needed.
• Create Change requests, workplan, Test results, BCAB checklist documents for the code deployment to production environment and perform the code validation post deployment.
• Work with Hadoop Admin, ETL and SAS admin teams for code deployments and health checks.
• Create re-usable UNIX shell scripts for file archival, file validations and Hadoop workflow looping.
• Create re-usable framework for Audit Balance Control to capture Reconciliation, mapping parameters and variables, serves as single point of reference for workflows.
• Create PySpark programs to ingest historical and incremental data.
• Create SQOOP scripts to ingest historical data from EDW oracle database to Hadoop IOP, created HIVE tables and Impala views creation scripts for Dimension tables.
• Participate in meetings to continuously upgrade the functional and technical expertise.
REQUIRED Skill Sets:
• 8+ years of experience with Big Data, Hadoop on Data Warehousing or Data Integration projects.
• Analysis, Design, development, support and Enhancements of ETL/ELT in data warehouse environment with Cloudera Bigdata Technologies (with a minimum of 8 years’ experience in Hadoop, MapReduce, Sqoop, PySpark, Spark, HDFS, Hive, Impala, StreamSets, Kudu, Oozie, Hue, Kafka, Yarn, Python, Flume, Zookeeper, Sentry, Cloudera Navigator) along with Oracle SQL/PL-SQL, Unix commands and shell scripting;
• Strong development experience (minimum of 8 years) in creating Sqoop scripts, PySpark programs, HDFS commands, HDFS file formats (Parquet, Avro, ORC etc.), StreamSets pipeline creation, jobs scheduling, hive/impala queries, Unix commands, scripting and shell scripting etc.
• Writing Hadoop/Hive/Impala scripts (minimum of 8 years’ experience) for gathering stats on table post data loads.
• Strong SQL experience (Oracle and Hadoop (Hive/Impala etc.)).
• Writing complex SQL queries and performed tuning based on the Hadoop/Hive/Impala explain plan results.
• Proven ability to write high quality code.
• Experience building data sets and familiarity with PHI and PII data.
• Expertise implementing complex ETL/ELT logic.
• Develop and enforce strong reconciliation process.
• Accountable for ETL/ELT design documentation.
• Good knowledge of Big Data, Hadoop, Hive, Impala database, data security and dimensional model design.
• Basic knowledge of UNIX/LINUX shell scripting.
• Utilize ETL/ELT standards and practices towards establishing and following centralized metadata repository.
• Good experience in working with Visio, Excel, PowerPoint, Word, etc.
• Effective communication, presentation, & organizational skills.
• Familiar with Project Management methodologies like Waterfall and Agile
• Ability to establish priorities & follow through on projects, paying close attention to detail with minimal supervision.
• Required Education: BS/BA degree or combination of education & experience
DESIRED Skill Sets:
• Demonstrate effective leadership, analytical and problem-solving skills
• Required excellent written and oral communication skills with technical and business teams.
• Ability to work independently, as well as part of a team
• Stay abreast of current technologies in area of IT assigned
• Establish facts and draw valid conclusions
• Recognize patterns and opportunities for improvement throughout the entire organization
• Ability to discern critical from minor problems and innovate new solutions
Please Note*** This position has a Remote option with the Exceptions 1. Candidates are required to be present On-Site to collect equipment at On-Boarding and 2. Candidates are required to be available On-Site on Agency Requests.
Due to the above mentioned Requirements Local Candidates are preferred.
If you are looking for an opportunity like this and meet or exceed the skill-sets and experience required, please apply to discuss this opportunity in more detail !