Ansible collection to deploy the components of TDP
hadoop: deploys the Hadoop TDP Release (HDFS + YARN + MapReduce)hbase: deploys the HBase TDP Release (HBase Master + HBase RegionServer), Phoenix and Phoenix Query Serverhive: deploys the Hive TDP Release (Hiveserver2 + Tez)knox: deploys the Knox TDP Release (Knox Gateway)ranger: deploys the Ranger TDP Release (Ranger Admin + Ranger plugins)spark: deploys the Spark TDP Release (Spark Client + Spark History Server)zookeeper: deploys the Apache ZooKeeper Release
The best to get started with TDP and the Ansible roles is to go through the website Trunk Data Platform.
Ansible-core 2.16 does not handle installing a collection from a Git repository with ansible-galaxy. Instead, clone the repository in the correct folder.
For example, set the property collections_path in your ansible.cfg:
[defaults]
collections_path=collectionsThen create the folders structures and clone:
mkdir -p collections/ansible_collections/tosit
git clone https://github.com/TOSIT-IO/ansible-tdp-roles collections/ansible_collections/tosit/tdpThe project structure should look like this:
.
├── ansible.cfg
├── collections
│ └── ansible_collections
│ └── tosit
│ └── tdp
│ ├── galaxy.yml
│ ├── README.md
│ └── roles
│ ├── hadoop
│ ├── hive
│ ├── ranger
│ ├── spark
│ ├── ...
│ └── zookeeper
├── roles
├── test.ymlNote that the first role folder is not the roles from this collection, but any other roles the project has. The collections folder has been set in ansible.cfg.
hdfs_filemodule: file and directory handling in HDFS
Example usage:
- name: Add directory for spark logs
delegate_to: "{{ groups['hdfs_nn'][0] }}"
tosit.tdp.hdfs_file:
hdfs_conf: "{{ hadoop_conf_dir }}"
path: "{{ item.path }}"
state: "{{ item.state | default(omit) }}"
owner: "{{ item.owner | default(omit) }}"
group: "{{ item.group | default(omit) }}"
mode: "{{ item.mode | default(omit) }}"
become: yes
become_user: "{{ hdfs_user }}"
loop:
- path: /spark3-logs
state: directory
owner: "{{ spark_user }}"
group: "{{ hadoop_group }}"
mode: '777'access_fqdnfilter plugin: returnsaccess_fqdn, oraccess_sn+domain, orinventory_hostname+domain(checking if variables exist for the host in this order)
Example usage:
- debug:
msg: "{{ groups[hdfs_nn][0] | access_fqdn(hostvars) }}"
- debug:
msg: "{{ groups['hdfs_jn'] | map('access_fqdn', hostvars) | list }}"The best way to use the roles from the collection is to call the related file from the playbooks directory inside another playbook.
Examples:
- name: Deploy ZooKeeper
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/zookeeper.yml
- name: Deploy Hadoop
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hadoop.yml
- name: Deploy Hive
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hive.yml- Python >= 3.12 with virtual env package
Please follow the guidelines at contributing and respect the code of conduct.