Overview
Implementation of LB Cluster solution deployment is based on ansible playbooks. Parameters that are important for deployment are defined in configuration files, playbook commands are set in makefile. To start deployment user runs command "make deploy" (see user rights requirements), "make" is a command which executes command from makefile, "deploy" is the main deployment command. "Make" utility finds "deploy" label in makefile and runs script for ansible playbook.
Distribution Package
files included in distribution package
File description | Example |
---|---|
FIXEdge distribution archive | FIXEdge-6.6.1-lb-cluster.138-FA-2.25.0.138-Linux-3.10.0-gcc-4.8.5-x86_64.tar.gz |
Configuration Service distribution archive | configuration-service-1.0.0-SNAPSHOT.tar |
REST API key file | rapi.key |
REST API certificate | rapi.crt |
Playbook files | yml configuration files, readme files, etc. |
files not included in distribution package - license files, TLS certificates, Consul distribution packages
Playbook
Playbook and all deployment-dependent files are saved on deployment workstation - Linux-based machine that may be a separate machine or one of the hosts configured for cluster components.
Playbook requires
- ansible installed on deployment workstation with all dependent packages (ansible version 2.5.1 and higher)
- passwordless sudo configured on all hosts
- python installed on all hosts (version 2.7 and higher)
- ssh access to all hosts
- "make" utility available on deployment workstation
User runs ansible playbook using "make" utility on deployment workstation.
example of makefile:
APB=ansible-playbook INVENTORY=hosts.yml PB=deploy.yml SSH_ARGS=-o StrictHostKeyChecking=no # Default target all: deploy # Target to deploy the services to the configured machines deploy: ${INVENTORY} ${PB} ${APB} -i ${INVENTORY} ${PB} --ssh-common-args="${SSH_ARGS}" -vv
User rights
Deployment process require superuser rights on all hosts for user that will run playbook.
Parameters Configuration
Directory structure
Name | Description |
---|---|
PLAYBOOK_ROOT_DIR | root directory for deployment files Contains: ansible.cfg - ansible configuration file deploy.yml - steps of deployment hosts.yml - file for hardware units configuration makefile - file for "make" command arguments Readme.MD - root readme document |
PLAYBOOK_ROOT_DIR/doc PLAYBOOK_ROOT_DIR/doc/quickstart | directories for documentation files Contains: readme files |
PLAYBOOK_ROOT_DIR/files | directory for distribution packages, license, key files |
PLAYBOOK_ROOT_DIR/group_vars | directory for files containing global variables |
PLAYBOOK_ROOT_DIR/roles | directory for deployment-related files for roles |
Roles
For convenience all deployment jobs are divided into parts according to functionality that is deployed. These parts are called "roles". Parameters for roles are defined in PLAYBOOK_ROOT_DIR/roles/<role_name>/defaults/main.yml. These parameters can be overrided by global parameters with the same name. For example user can define "fe_destdir" parameter in all.yml file and it will be taken by ansible playbook instead of parameter defined in PLAYBOOK_ROOT_DIR/roles/fixedge/defaults/main.yml
Hosts
Hardware unit parameters are stored in hosts.yml
Parameter | Description |
---|---|
ansible_host | IP address for hw unit |
ansible_port | IP port used for ssh access on hw units |
ansible_user | user login used by ansible for changing configuration on hw unit |
ansible_ssh_pass | user ssh pass used by ansible for changing configuration on hw unit |
ansible_connection: | used for docker implementation of Oracle DB in test configuration |
ansible_connection_args: | used for docker implementation of Oracle DB in test configuration |
Global variables
variable | component | description |
---|---|---|
dba_user | Oracle DB | DB Administrator login. Database user with administrator rights is needed for adding user, create and configure database for the solution needs. |
dba_password | Oracle DB | DB Administrator password |
db_user | Oracle DB | DB user login |
db_password | Oracle DB | DB user password |
db_address | Oracle DB | IP address for connection to DB |
db_port | Oracle DB | IP port for connection to DB |
db_sid | Oracle DB | DB SID |
fe_cluster_id | Cluster | unique identity of LB Cluster |
fe_lic_dnl | FIXEdge nodes | path to license file |
fe_archive_dest | FIXEdge nodes | path to FIXEdge distribution archive |
file_cs | Configuration Service | path to Configuration Service distribution archive |
file_consul | Consul | path to Consul Agent distribution archive |
file_ctemplate | Consul | path to Consul template software package (ZIP archive). |
fe_rapi_key | FIXEdge nodes | path to REST-API key file |
fe_rapi.crt | FIXEdge nodes | path to REST-API certificate file |
fe_rapi_port | FIXEdge nodes | IP port for REST-API |
fe_splunk_host | Splunk | IP address for connection to Splunk system |
fe_splunk_port | Splunk | IP port for connection to Splunk system |
consul_deploy_dir | Consul | path for Consul deployment |
Global variables are stored in group_vars/all.yml file
Troubleshooting
Deployment process will display messages on the terminal output indicating results for every step. When all the steps finished successfully user will get the output similar to:
Incorrect deployment process ending will be similar to:
Host is unreachable
Problem: user gets error message similar to:
LAY [Debug variables] ********************************************************************************************************************************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
task path: /home/egor/work/ansible-play-lbc/deploy.yml:3
ok: [cs_1]
ok: [haproxy]
ok: [fixedge_2]
ok: [consul_server_1]
fatal: [fixedge_1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.6.223.22 port 22: Connection timed out\r\n", "unreachable": true}
Description: possible reason may be that host is unreachable via network
Solution:
- check network connectivity to troubled host (ip address is present in error message). Restore network connectivity if it was lost.
Role not found
Problem: user gets error message similar to:
env DOCKER_HOST="tcp://10.6.221.187:2375" ansible-playbook -i hosts.yml deploy.yml --ssh-common-args="-o StrictHostKeyChecking=no" -vv
ansible-playbook 2.5.1
config file = /home/user/work/ansible-play-lbc/ansible.cfg
configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.15rc1 (default, Nov 12 2018, 14:31:15) [GCC 7.3.0]
Using /home/user/work/ansible-play-lbc/ansible.cfg as config file
ERROR! the role 'ha-proxy' was not found in /home/user/work/ansible-play-lbc/roles:/home/user/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/home/user/work/ansible-play-lbc
The error appears to have been in '/home/user/work/ansible-play-lbc/deploy.yml': line 39, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- consul-agent
- ha-proxy
^ here
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 1
user@lbc:~/work/ansible-play-lbc$
Description: possible reason may be that role name defined in PLAYBOOK_ROOT_DIR/deploy.yml is wrong - due to human mistake it differs from role name in PLAYBOOK_ROOT_DIR/roles
Solution:
- change role name
Wrong oracle access parameters
Problem: user gets error message similar to:
TASK [setup-oracle-db : Cleanup Oracle Database contents] *********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/setup-oracle-db/tasks/main.yml:22
failed: [oracle] (item=database-cleanup.sql) => {"changed": true, "cmd": "sqlplus admin/oracle@10.6.221.187/XE < /tmp/database-cleanup.sql", "delta": "0:00:02.290971", "end": "2019-02-22 11:44:03.956940", "item": "database-cleanup.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 11:44:01.665969", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:02 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-01017: invalid username/password; logon denied\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:02 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
failed: [oracle] (item=database-data.sql) => {"changed": true, "cmd": "sqlplus admin/oracle@10.6.221.187/XE < /tmp/database-data.sql", "delta": "0:00:03.318344", "end": "2019-02-22 11:44:14.010157", "item": "database-data.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 11:44:10.691813", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:12 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-01017: invalid username/password; logon denied\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:12 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP ********************************************************************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=4 changed=3 unreachable=0 failed=1
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: possible reason may be wrong access parameters
Solution:
- correct database access parameters in PLAYBOOK_ROOT_DIR/group_vars/all.yml
- check that access to database works using database client software
Database (oracle) unavailable
Problem: user gets error message similar to:
TASK [setup-oracle-db : Cleanup Oracle Database contents] *********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/setup-oracle-db/tasks/main.yml:22
failed: [oracle] (item=database-cleanup.sql) => {"changed": true, "cmd": "sqlplus system/oracle@10.6.221.187/XE < /tmp/database-cleanup.sql", "delta": "0:00:01.033081", "end": "2019-02-22 07:08:42.678543", "item": "database-cleanup.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 07:08:41.645462", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:42 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-12514: TNS:listener does not currently know of service requested in connect\ndescriptor\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:42 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-12514: TNS:listener does not currently know of service requested in connect", "descriptor", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
failed: [oracle] (item=database-data.sql) => {"changed": true, "cmd": "sqlplus system/oracle@10.6.221.187/XE < /tmp/database-data.sql", "delta": "0:00:01.000175", "end": "2019-02-22 07:08:49.490630", "item": "database-data.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 07:08:48.490455", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:49 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-12514: TNS:listener does not currently know of service requested in connect\ndescriptor\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:49 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-12514: TNS:listener does not currently know of service requested in connect", "descriptor", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP ********************************************************************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=4 changed=4 unreachable=0 failed=1
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: possible reason may be in database. It may be unreachable or may work not properly
Solution:
- check network connectivity. Restore it if needed.
- try to access database with database client using credentials configured in deploy config files.
- check database health.
Lack of user rights
Problem: user gets error message similar to:
TASK [consul-agent : Create the Consul Agent deployment directory] ************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:11
ok: [fixedge_2] => (item=bin) => {"changed": false, "gid": 0, "group": "root", "item": "bin", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/bin", "size": 20, "state": "directory", "uid": 0}
failed: [fixedge_1] (item=bin) => {"changed": false, "item": "bin", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
ok: [fixedge_2] => (item=etc) => {"changed": false, "gid": 0, "group": "root", "item": "etc", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/etc", "size": 6, "state": "directory", "uid": 0}
failed: [fixedge_1] (item=etc) => {"changed": false, "item": "etc", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
failed: [fixedge_1] (item=var) => {"changed": false, "item": "var", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
ok: [fixedge_2] => (item=var) => {"changed": false, "gid": 0, "group": "root", "item": "var", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/var", "size": 90, "state": "directory", "uid": 0}
TASK [consul-agent : Install unzip package if missing] ************************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:21
ok: [fixedge_2] => {"changed": false, "msg": "", "rc": 0, "results": ["unzip-6.0-19.el7.x86_64 providing unzip is already installed"]}
TASK [consul-agent : set_fact] ************************************************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:28
ok: [fixedge_2] => {"ansible_facts": {"consul_agent_dst": "/home/user/work/ansible-play-lbc/files/consul_1.4.0_linux_amd64.zip"}, "changed": false}
Description: the reason is that passwordless sudo is not configured for user on one of the hosts.
Solution:
- Configure passwordless sudo on host (ip address is present in the error message)
File is not found
Problem: user gets error message similar to:
TASK [cs : Create the Configuration Service destination directory] ************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/cs/tasks/main.yml:17
ok: [cs_1] => {"changed": false, "gid": 0, "group": "root", "mode": "0755", "owner": "root", "path": "/srv/configuration-service", "size": 61, "state": "directory", "uid": 0}
TASK [cs : Unpack the Configuration Service] **********************************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/cs/tasks/main.yml:23
fatal: [cs_1]: FAILED! => {"changed": false, "msg": "Could not find or access '/home/user/work/ansible-play-lbc/files/configuration-service-1.0.0-SNAPSHOT.tar'"}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP ********************************************************************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=14 changed=1 unreachable=0 failed=1
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=6 changed=5 unreachable=0 failed=0
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: deployment script can not find one of archive files for some reasons
Solution:
- check the file mentioned in error message exist in the proper directory. If not - put the file there
- check the path is defined properly in config file. If the path is wrong - change the path in configuration file