Hadoop Training

Hadoop Training

Master the concepts of HDFS and MapReduce framework, Understand Hadoop 2.x Architecture, Setup Hadoop Cluster and write Complex MapReduce programs, Learn data loading techniques using Sqoop and Flume, Perform data analytic using Pig, Hive and YARN, Implement HBase and MapReduce integration, Implement Advanced Usage and Indexing, Implement best practices for Hadoop development, Work on a real life Project




 

Course Objective:

By the end of the course,  you will:

  • Master the concepts of HDFS and MapReduce framework
  • Understand Hadoop 2.x Architecture
  • Setup Hadoop Cluster and write Complex MapReduce programs
  • Learn data loading techniques using Sqoop and Flume
  • Perform data analytic using Pig, Hive and YARN
  • Implement HBase and MapReduce integration
  • Implement Advanced Usage and Indexing
  • Implement best practices for Hadoop development
  • Work on a real life Project

 

Who should go for Hadoop:

Today, Hadoop has become  a cornerstone of every business technology professional. To stay ahead in the game, Hadoop has become a must-know technology for the  professionals.

Introduction to Hadoop and Architecture

Hadoop 1.0 Architecture

  • Introduction to Hadoop & Big Data
  • Hadoop Evolution
  • Hadoop Architecture
  • Networking Concepts
  • Use cases – Where Hadoop fits into

Hadoop 2.0 Architecture

  • Limitations on Hadoop 1.0 Architecture
  • Features of Hadoop 2.0 Architecture
  • HDFS Federation
  • High Availability of Name Node
  • YARN – Yet Another Resource Negotiator
  • Non MR applications on top of YAR

Quiz on Architecture Concepts

Prerequisites for Hadoop Developer/Data Analysts/Admins

Linux

  • Introduction to Linux
  • Commands & Shell Scripts
  • Vi& Vim editor features

Case Study to develop a Shell Script

Java

  • Introduction to OOPS & JAVA
  • Discussion on Object, Class & Methods

 

 

 

Case Study to develop a Java Code with the concepts learnt

Python

  • Introduction to concepts of Python
  • How different is Python from other Programming Languages
  • Complex data Types in Python (Tuple, List, Dictionary)
  • Inbuilt Modules available in Python
  • File handling functions using Python

Case Study to develop a Python Code with the concepts learnt

Cluster Installation

Hadoop Cluster Installation

  • Types of Hadoop Cluster
  • Installing Pseudo Mode Cluster
  • Walk thru on inbuilt scripts, directories, configuration files and port
  • Discussion on Real Time Cluster Size

Detailed documentation on Installation Procedure

Distributed File System – HDFS

HDFS Commands

  • Introduction to HDFS Commands
  • Discussion on scenarios where specific commands are applicable
  • Introduction to Advanced HDFS Commands including fine tuning of cluster
  • Features & Concepts of Core Java for developing MR jobs
  • Familiarizing Eclipse

Detailed documentation on all the HDFS Commands

Custom Script building using HDFS & Unix commands

Quiz on HDFS Commands

 

 

Map Reduce – MR

Map Reduce using Java

  • Introduction to Map Reduce Architecture
  • Detailed discussion on different phases of MR

➢ Mapper ➢ Reducer ➢ Splitting ➢ Sorting ➢ Shuffling ➢ Combiner ➢ Spilling ➢ Partitioning ➢ Merging

  • Developing Map Reduce Application from Scratch using different use cases
  • Discussion of difference between Old MR API & New MR API
  • Introduction to different file formats and their internal features (Sequential,
  • Binary etc.,)
  • Developing MR code for Image Analytics
  • Case Study on Map Reduce (Customer Sentiment Analyser)

Map Reduce using Python – Streaming

  • Developing Map Reduce Application using Python
  • Discussion of different features available in Streaming

Quiz on Map Reduce

Hadoop Eco System Components

Hive (Data Warehouse on top of HDFS)

  • Introduction to Hive Architecture
  • Configuring Hive Metadata store in different ways
  • Basic Queries in Hive (DDL,DML)
  • Advanced features of Hive

Case Study on Map Reduce Streaming (Analytics on Temperature Datasets)

 

 

Quiz on Hive

PIG (Data Flow Language)

  • Introduction to Pig Latin
  • Basic Commands in Pig
  • Explanation advanced features of Pig with real time scenarios
  • Different ways of using PigStorage
  • Dealing with Unstructured data
  • Developing Regular Expressions
  • Developing User Defined Functions (UDF’s) in Java & Python

Quiz on Pig

SQOOP (Import – Export utility)

  • Introduction to Sqoop
  • Basic Sqoop Commands
  • Advanced Import Features
  • Advanced Export Features

➢ Upsert calls ➢ EVAL ➢ Compressed formats

➢ Partitioning ➢ Bucketing ➢ Sampling ➢ Multi Table load Queries ➢ Serialize & De Serialize

  • Dealing with different formats of data (Flat file, JSON, CSV etc.,)
  • Query optimization using Hive.
  • Developing User Defined Functions (UDF’s) in Java & Python

Case Study (Analytics on Telecom Datasets)

Case Study (Analytics on Books Datasets)

Case Study (Analytics on Telecom Datasets)

 

 

Quiz on Sqoop

HBASE (Versioned Database)

  • Introduction to HBASE & NOSQL
  • Basic difference in Row Oriented and Column Oriented storage
  • Basic HBASE Commands
  • Advanced HBASE Features

➢ Versions ➢ Compression Techniques ➢ Bloom Filters ➢ Sequential Scans

  • Bulk Loads to HBASE Features

Quiz on HBASE

Flume

  • ✓ Flume Architecture
  • ✓ Configuring Flume Components

➢ Source ➢ Sink ➢ Channel ➢ Agents ✓ Building Flume Config files for different scenarios

➢ Basic Config File building ➢ Config file for connecting to different File Servers ➢ Config file for connecting to Web Servers

Quiz on Flume

Scheduler (OOZIE & Autosys)

  • Introduction to Oozie
  • Introduction to Autosys
  • Using Schedulers for Batch Processing

Case Study on HBASE

Quiz on OOZIE

Finally this series of Practical Sessions ends with Quiz on entire course.

 

  • Can I get recorded sessions of a live class?

    Yes, this can be done. Moreover, this ensures that when you will start with your batch, the concepts explained during the classes will  be recorded and available to you .

  • How will I execute the Practicals?

    We will help you to setup the required environment for practicals.

  • I have a windows system. Can that be used to work on the assignments?

    Yes, One can always use Windows to work on assignments. Our 24*7 team support will guide you to get the set-up ready.

Vidhyalive certified ‘Hyper-v Expert’ based on your project performance, reviewed by our expert panel

CONTACT US

Online Classroom

Any time
Any Day
Any time
$500
Any Time
Any Day
Any time
$500

Course Feature

Online Classes: 40 Hrs

40 live classes of 1 hrs each by Industry practitioners

Assignments: 20 HRS

Personal assistance/installation guides for setting up the required environment for Assignments / Projects

Project: 15 HRS

Live project based on any of the selected use cases, involving Big Data Analytics using MapReduce, Pig, Hive, Flume and Sqoop

Lifetime Access: Lifetime

Lifetime access to the learning management system including Class recordings, presentations, sample code and projects

24 x 7 Support

Lifetime access to the support team (available 24/7) in resolving queries during and after the course completion

Get Certified

Vidhyalive certified ‘Hadoop Expert’ based on your project performance, reviewed by our expert panel