Ansible deployment for AIDMOIt

This project aims to deploy a datalake and its ecosystem using Ansible and/or Vagrant.

Acutally, this project does :

  1. Deploy a mono-node HDFS on :
  2. Deploy a cluster HDFS on :
  3. Deploy a geonetwork as metadata system management
    • [Your own computer using a virtual machine with Vagrant & ansible provision]

Prod Environment : Deploy a mononode HDFS on a server using ansible Sandbox Environment: Deploy a mononode HDFS on a VM from Vagrant & Ansible

Prerequisites

  • Ansible
    • Install ansible on your own computer (for Ubuntu or Debian) :
     apt-get install ansible
    • If you connect to your remote machine using password instead of ssh-key (as recommanded), you have to install this apt :
     apt-get install sshpass
    • Configure ansible (allow becoming unprivileged user without error : "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user")
     sed -i 's/.*pipelining.*/pipelining = True/' /etc/ansible/ansible.cfg
     sed -i 's/.*allow_world_readable_tmpfiles.*/allow_world_readable_tmpfiles = True/' /etc/ansible/ansible.cfg
  • Vagrant with virtualox
    • Install virtualbox on your own computer (for Ubuntu or Debian):
     apt-get install virtualbox
    • Install vagrant on your own computer (for Ubuntu or Debian):
     apt-get install vagrant

Ansible

Description of Ansible

Wikipedia definition :

Ansible is an open-source software provisioning, configuration management, and application-deployment tool.[2] It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Ansible is a good tool to deploy and maintain IT systems. Based on Yaml configuration files, ansible makes it easy to describe your configuration and share it with your collaborators. Then you can deploy it to your infrastructure, you only need to have a ssh access to your servers.

Getting started

Ansible playbook is a list of system instructions which has to be send to a machine. That's why you only need 2 things :

  • Get ansible installed on your own computer
  • Have a remote machine (physical, vmware, virtualbox, docker, lxc, ...) with a ssh server running

Vagrant & Ansible

Description of Vagrant

Wikipedia definition :

Vagrant is an open-source software product for building and maintaining portable virtual software development environments,[5] e.g. for VirtualBox, KVM, Hyper-V, Docker containers, VMware, and AWS. It tries to simplify the software configuration management of virtualizations in order to increase development productivity. Vagrant is written in the Ruby language, but its ecosystem supports development in a few languages.

Vagrant manages your virtual machine (VM) on command line. The benefits are :

  • Quickly create VM with a know & controlled environment
  • Restore your VM to a known state
  • Destribute yours VM easly

Vagrant and ansible can be combined to create/deploy/maintain your VM as we do in this project

Getting started

You only need virtual box and vagrant installed on your computer. This project is going to create VM that you need for your datalake

Deploy a mono-node HDFS

Deploy mono node HDFS on a VM

  1. In cli : Go to the directory which contains the VagrantFile
  2. In cli : start your VM:
vagrant up

Deploy mono-node HDFS on a server

First configure your IP adress in the inventory file

Then run the script ansible-launch.sh :

/bin/bash ansible-launch.sh

Deploy a HDFS cluster

Deploy cluster HDFS with multiple VMs

  1. Set your nodes' IP address in VagrantFile. Inside this file, edit your network setting (as DNS nameserver: if your host machine is on a corporate network, your network administrator may have set rules about using DNS servers. In this case, your network allows only your company's DNS. Please edit them in the settings section and in provision shell in vagrantFile. If this is not the case (i.e. no DNS rules), please assign "false" to the COMPANY_NETWORK_DNS_RULE variable.
  2. Declare those IP for ansible provision in vars. If you did not change IP setting, skip this step.
  3. Configure your own computer to access to your nodes using their hostname (need for access to hadoop web ui)
    vim /etc/hosts
    if you did not edit your cluster settings since you have git pulled, you may want to use this default settings:
    10.0.0.10 namenode
    10.0.0.11 datanode1
    10.0.0.12 datanode2
  4. in cli : start your multiple VM from this directory : vagrant/cluster :
    vagrant up
  5. Format HDFS :
    • ssh on namenode
    • in cli : as user hadoop : change directory & format HDFS
    sudo su hadoop
    cd /usr/local/hadoop/bin/
    hdfs namenode -format
  6. Start HDFS deamon on your cluser
    • ssh on namenode
    • in cli : as root : start service hadoop
    sudo systemctl start hadoop
    • WORK In Progress : systemd will tell you something wrong happens but cluster is working anyway.
  7. Verify your cluster is up:
    • on your own device, use a webbrowser
    • go on [IP-of-your-namenode]:9870 if default : http://10.0.0.10:9870

Deploy cluster HDFS on servers

work in progress

Deploy GeoNetwork

With Vagrant and ansible

  1. Set your nodes' IP address in VagrantFile. Inside this file, edit your network setting (as DNS nameserver: if your host machine is on a corporate network, your network administrator may have set rules about using DNS servers. In this case, your network allows only your company's DNS. Please edit them in the settings section and in provision shell in vagrantFile. If this is not the case (i.e. no DNS rules), please assign "false" to the COMPANY_NETWORK_DNS_RULE variable.
  2. in cli : start a geonetwork vm form this directory : vagrant/geonetwork](vagrant/geonetwork) :
    vagrant up
  3. On your own computer, using a webrowser, go on http://10.0.0.9:8080/geonetwork (if IP address as default)

On server

work in progress

License

licence

Aidmoit's Collect is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses