👉🏻 Configure Hadoop Using Ansible Playbook
Hello folks 🙋🏻♂️,
I hope you all are doing well…I am back with one another interesting article. In this article i am going to write a playbook to automate our hadoop cluster, sounds cool huh , let get our hands dirty ✍️
🔰 Prerequisites :
- Three Virtual Machines (Controller node, master node, slave node)
- hadoop software & jdk in the controller nodes local system
- Good network connectivity
🔰 What is ansible ?
Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks.
So basically, ansible is one of the automation tool, which makes configuration management very easy and time efficient too.
🔰 What is ansible playbook ?
Playbooks are the files where Ansible code is written. Playbooks are written in YAML format. YAML stands for Yet Another Markup Language. Playbooks are one of the core features of Ansible and tell Ansible what to execute. They are like a to-do list for Ansible that contains a list of tasks.
Playbooks contain the steps which the user wants to execute on a particular machine. Playbooks are run sequentially. Playbooks are the building blocks for all the use cases of Ansible.
🔰 Steps to configure ansible playbook for hadoop
#1 — SOFTWARE CONFIGURATION ON TARGET NODES
- STEP I : UPDATE INVENTORY WITH MASTER IP & SLAVE IP
- STEP II : COPY & INSTALL SOFTWARE IN BOTH THE MASTER & SLAVE
!- COMMANDS
# ansible-playbook hadoop.yml : to run ansible playbook
# java -version : to check installed java version
# hadoop version: to check installed hadoop version
As you can after running the playbook , hadoop & jdk software’s are successfully installed on the target nodes.
#2 — MASTER NODE CONFIGURATION USING PLAYBOOK
- STEP I : CREATE DIRECTORY ON THE MASTER NODE
To create directory on Target Nodes, file module is used.
Directory has been successfully created…
- STEP II : COPY MASTER CONFIGURATION FILE FROM MASTER NODE TO CONTROLLER NODE & EDIT CONFIGURATION FILES AS FOLLOWS & RUNNING SERVICES WITH THE HELP OF SHELL MODULE
after editing configuration files save it to the current workspace, now we are going to write code to copy this file into master nodes configuration location files so that master get’s configured.
Shell module helps to run command directly on the target nodes…
#3— SLAVE NODE CONFIGURATION USING PLAYBOOK
- For configuration slave we just have to follow same steps as master so i have directly written code as below with just few changes.
🔰 PLAYBOOK : hadoop.yml
- hosts: all
tasks:
- copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/hadoop-1.rpm"- copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/jdk-1.rpm"
- command: "rpm -i /root/jdk-1.rpm --force"
- command: "rpm -i /root/hadoop-1.rpm --force"- hosts: master
tasks:
- file:
state: directory
path: "/NN"- copy:
src: "/ansible-hadoop-ws/hdfs-site-master.xml"
dest: "/etc/hadoop/hdfs-site.xml"- copy:
src: "/ansible-hadoop-ws/core-site.xml"
dest: "/etc/hadoop/core-site.xml"- shell: "echo Y|hadoop namenode -format"- shell: "hadoop-daemon.sh start namenode"- hosts: slave
tasks:
- file:
state: directory
dest: "/DN"- copy:
src: "/ansible-hadoop-ws/hdfs-site-slave.xml"
dest: "/etc/hadoop/hdfs-site.xml"- copy:
src: "/ansible-hadoop-ws/core-site.xml"
dest: "/etc/hadoop/core-site.xml"- shell: "hadoop-daemon.sh start datanode"
🔰 FINAL DEMO