Octus offer Big data hadoop training in noida. Big Data is a term that explains a large volume of data. Big Data is both structured and unstructured data entering any industry on day to day. Big Data Hadoop has emerged as one of the most all the rage Map / Reduce open source framework based on large volumes of data and information other commanders have adopted this technology.
Introduction to Big Data and Hadoop
- What is Big Data? Diifferent Sources of Big Data?
- 3V’s of Big Data
- What are the challenges associated with big data?
- What is Hadoop and its Architecture with core components?
- Hadoop features and Complete cycle of Hadoop big data capture, storage, analysis and visualize
- History of Hadoop
- Diff Platform for Hadoop
- Batch Vs Real-Time Processing (OLTP and OLAP)
- Big data life cycle (storage, processing and visualization)
Hadoop Core Components HDFS & MapReduce
HDFS (Hadoop Distributed File System)
- Introduction to HDFS
- Features of HDFS
- Hadoop Cluster Environment
- HDFS internal mechanism to store and manage datasets in distribution and Scaling Manner
- HDFS Storage Aspects
- HDFS memory block
- Why HDFS block size so large?
- Design Principles of block size
- HDFS Architecture – 5 Daemons of Hadoop
- Name Node and its functionality
- Data Node and its functionality
- Secondary Name Node and its functionality
- Task Tracker and its functionality
- Job Tracker and its functionality
- Data Replication in Hadoop
- Data Storage in Data Nodes
- Failover Mechanism in Hadoop-Replication
- Replication Configuration
- Custom Replication
- Design Constraints with Replication Factor
- Diff Modes of Accessing HDFS
- Command Line Interface (CLI) and HDFS Commands
- Metadata, FS image, Edit log, Secondary Name Node
- Start and Stop Nodes
- Data Locality, Data Integrity and Rack Awareness
- Nodes Heartbeat Mechanism
MAP REDUCE (Processing)
- Introduction to Map Reduce
- Map Reduce architecture
- Functional Programming Basics.
- Map and Reduce Basics
- How Map Reduce Works
- Anatomy of a Map Reduce Job Run
- Job Completion, Failures
- Shuffling and Sorting
- Input Splits, Blocks, Record reader, Partitioner and Combiner
- Optimization Techniques -> Speculative Execution.
- YARN.
- Sequential Files and Map Files.
- Enabling Compression Codec’s.
- Map side Join with distributed Cache.
- Types of I/O Formats: Multiple outputs, NLINE input format.
Map/Reduce Programming – Java Programming
- Hands on “Word Count & in Map/Reduce in standalone and Pseudo distribution Mode.
- Different Input and Output File Formats Supported by MR
- Practical discussion about how we can write codes for structured and unstructured data sets.
- Input Format API discussion
- Input Split API discussion
Apache HIVE
- Introduction
- Hive Architecture
- OLTP Vs OLAP
- Hive Query Language
- Diff b/w HQL and SQL
- Hive Built in Functions
- Loading Data from Local files to Hive tables
- Loading Data from HDFS files to Hive tables
- Tables Types
- Inner Tables
- External Tables
- Partitioned Tables
- Non-Partitioned Tables
- Dynamic Partitions in Hive
- Concept of Bucketing
- Hive Views
- Hive Unions
- Hive Joins
- Array Operations in Hive.
- How to tackle semi structured data sets into hive using Array, Map and Struct Data Types.
Apache PIG
- Introduction to Apache PIG
- Introduction to PIG Data Flow Engine
- Data Types in PIG
- Basic PIG programming
- Modes of Execution in PIG
- Local Mode and MapReduce Mode
- Grunt Shell
- Script
- Operators/Transformations in PIG
- Word Count Example in PIG
- The difference between the MapReduce and PIG
- Using PIG On structured and unstructured data.
- Joins In PIG.
- Grouping, Co-grouping, Union In PIG
Apache SQOOP
- Introduction to SQOOP
- SQOOP Import
- SQOOP Export
- Importing Data from RDBMS to HDFS
- Importing Data from RDBMS to HIVE
- Exporting Data HBase to RDBMS
- Exporting Data HIVE to RDBMS
- Exporting Data HDFS to RDBMS Transformation while importing.
HBASE (NoSQL Database Column Based)
- Introduction to Big Table
- What is NOSQL and Columner store Database
- HBase Introduction
- HBase use cases
- HBase Basics
- Column Families HBase CURD operators and Java Approach
Apache OOZIE and Kafka
- Introduction to OOZIE
- Use of OOZIE.
- Where to use OOZIE?
- Introduction to Web-UI "HUE"
- Introduction to Messaging Service "Kafka".