Overview
Implementation of LB Cluster solution deployment is based on ansible playbooks. Parameters that are important for deployment are defined in configuration files, playbook commands are set in makefile. To start deployment user runs command "make deploy" (see user rights requirements), "make" is a command which executes command from makefile, "deployFE" and "deployFEJ" are the main deployment commands. "Make" utility finds "deployFE" or "deployFEJ" label in makefile and runs script for ansible playbook.
- Run "make deployFE" to deploy cluster with FIXEdge C++
- Run "make deployFEJ" to deploy cluster with FIXEdge Java
Distribution Package
Part of needed files included in distribution package
File description | Example |
---|---|
FIXEdge distribution archive | FIXEdge-6.6.1-lb-cluster.138-FA-2.25.0.138-Linux-3.10.0-gcc-4.8.5-x86_64.tar.gz |
Configuration Service distribution archive | configuration-service*.tar |
Scheduler service distribution archive | scheduler-service*.tar |
Playbook files | yml configuration files, readme files, etc. |
Files not included in distribution package - license files, TLS certificates, Consul distribution packages.
Playbook
Playbook and all deployment-dependent files are saved on deployment workstation - Linux-based machine that may be a separate machine or one of the hosts configured for cluster components.
Playbook requires
- ansible installed on deployment workstation with all dependent packages (ansible version 2.5.1 and higher)
- passwordless sudo configured on all hosts
- python installed on all hosts (version 2.7 and higher)
- ssh access to all hosts
- "make" utility available on deployment workstation
User runs ansible playbook using "make" utility on deployment workstation.
example of makefile:
APB=ansible-playbook INVENTORY=hosts.yml PB=deploy.yml SSH_ARGS=-o StrictHostKeyChecking=no ANSIBLE_FLAGS_FIXEDGE=--skip-tags # Default target all: deploy # Target to deploy the services to the configured machines deployFE: ${INVENTORY} ${PB} ${APB} -i ${INVENTORY} ${ANSIBLE_FLAGS_FIXEDGE} java ${PB} --ssh-common-args="${SSH_ARGS}" deployFEJ: ${INVENTORY} ${PB} ${APB} -i ${INVENTORY} ${ANSIBLE_FLAGS_FIXEDGE} c++ ${PB} --ssh-common-args="${SSH_ARGS}"
User rights
Deployment process require superuser rights on all hosts for user that will run playbook.
Parameters Configuration
Directory structure
Name | Description |
---|---|
PLAYBOOK_ROOT_DIR | root directory for deployment files Contains: ansible.cfg - ansible configuration file deploy.yml - steps of deployment hosts.yml - file for hardware units configuration makefile - file for "make" command arguments Readme.MD - root readme document |
PLAYBOOK_ROOT_DIR/doc PLAYBOOK_ROOT_DIR/doc/quickstart | directories for documentation files Contains: readme files |
PLAYBOOK_ROOT_DIR/files | directory for distribution packages, license, key files |
PLAYBOOK_ROOT_DIR/group_vars | directory for files containing global variables |
PLAYBOOK_ROOT_DIR/roles | directory for deployment-related files for roles |
Roles
For convenience all deployment jobs are divided into parts according to functionality that is deployed. These parts are called "roles". Parameters for roles are defined in PLAYBOOK_ROOT_DIR/roles/<role_name>/defaults/main.yml. These parameters can be overrided by global parameters with the same name. For example user can define "fe_destdir" parameter in all.yml file and it will be taken by ansible playbook instead of parameter defined in PLAYBOOK_ROOT_DIR/roles/fixedge/defaults/main.yml
Hosts
Hardware unit parameters are stored in hosts.yml
Parameter | Description |
---|---|
ansible_host | IP address for hw unit |
ansible_port | IP port used for ssh access on hw units |
ansible_user | user login used by ansible for changing configuration on hw unit |
ansible_ssh_pass | user ssh pass used by ansible for changing configuration on hw unit |
ansible_connection: | used for docker implementation of Oracle DB in test configuration |
ansible_connection_args: | used for docker implementation of Oracle DB in test configuration |
Global variables
variable | component | required for FIXEdge C++ | required | description |
---|---|---|---|---|
dba_user | Oracle DB | + | + | DB Administrator login. Database user with administrator rights is needed for adding user, create and configure database for the solution needs. |
dba_password | Oracle DB | + | + | DB Administrator password |
db_user | Oracle DB | + | + | DB user login |
db_password | Oracle DB | + | + | DB user password |
db_address | Oracle DB | + | + | IP address for connection to DB |
db_port | Oracle DB | + | + | IP port for connection to DB |
db_sid | Oracle DB | + | + | DB SID |
file_consul | Consul | + | + | path to Consul Agent distribution archive |
file_ctemplate | Consul | + | + | path to Consul template software package (ZIP archive). |
fe_cluster_id | Cluster | + | + | unique identity of FIXEdge C++ LB Cluster |
file_cs | Configuration Service | + | + | path to Configuration Service distribution archive |
fe_splunk_host | Splunk | + | + | IP address for connection to Splunk system |
fe_splunk_port | Splunk | + | + | IP port for connection to Splunk system |
consul_deploy_dir | Consul | + | + | path for Consul deployment |
fe_archive_dest_scheduler | Scheduler Service | + | path to Scheduler Service distribution archive | |
fe_lic_dnl | FIXEdge C++ nodes | + | path to license file for FIXEdge C++ | |
fe_archive_dest | FIXEdge C++ nodes | + | path to FIXEdge C++ distribution archive | |
fe_rapi_key | FIXEdge C++ nodes | + | path to REST-API key file | |
fe_rapi.crt | FIXEdge C++ nodes | + | path to REST-API certificate file | |
fe_rapi_port | FIXEdge C++ nodes | + | IP port for REST-API | |
fe_lic_dnl_java | FIXEdge Java nodes | + | path to license file for FIXEdge Java | |
fe_archive_dest_java | FIXEdge Java nodes | + | path to FIXEdge Java distribution archive | |
fe_archive_dest_fo | FIXEdge Java nodes | + | path to FO Java distribution archive |
Global variables are stored in group_vars/all.yml file
Troubleshooting
Deployment process will display messages on the terminal output indicating results for every step. When all the steps finished successfully user will get the output similar to:
Incorrect deployment process ending will be similar to:
Host is unreachable
Problem: user gets error message similar to:
LAY [Debug variables]
*********************************************************************************************************************************************************
TASK [Gathering Facts]
*********************************************************************************************************************************************************
task path: /home/egor/work/ansible-play-lbc/deploy.yml:3
ok: [cs_1]
ok: [haproxy]
ok: [fixedge_2]
ok: [consul_server_1]
fatal: [fixedge_1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.6.223.22 port 22: Connection timed out\r\n", "unreachable": true}
Description: possible reason may be that host is unreachable via network
Solution:
- check network connectivity to troubled host (ip address is present in error message). Restore network connectivity if it was lost.
Wrong oracle access parameters
Problem: user gets error message similar to:
TASK [setup-oracle-db : Cleanup Oracle Database contents] *********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/setup-oracle-db/tasks/main.yml:22
failed: [oracle] (item=database-cleanup.sql) => {"changed": true, "cmd": "sqlplus admin/oracle@10.6.221.187/XE < /tmp/database-cleanup.sql", "delta": "0:00:02.290971", "end": "2019-02-22 11:44:03.956940", "item": "database-cleanup.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 11:44:01.665969", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:02 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-01017: invalid username/password; logon denied\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:02 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
failed: [oracle] (item=database-data.sql) => {"changed": true, "cmd": "sqlplus admin/oracle@10.6.221.187/XE < /tmp/database-data.sql", "delta": "0:00:03.318344", "end": "2019-02-22 11:44:14.010157", "item": "database-data.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 11:44:10.691813", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:12 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-01017: invalid username/password; logon denied\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 11:44:12 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP
*********************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=4 changed=3 unreachable=0 failed=1
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: possible reason may be wrong access parameters
Solution:
- correct database access parameters in PLAYBOOK_ROOT_DIR/group_vars/all.yml
- check that access to database works using database client software
Database (oracle) unavailable
Problem: user gets error message similar to:
TASK [setup-oracle-db : Cleanup Oracle Database contents]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/setup-oracle-db/tasks/main.yml:22
failed: [oracle] (item=database-cleanup.sql) => {"changed": true, "cmd": "sqlplus system/oracle@10.6.221.187/XE < /tmp/database-cleanup.sql", "delta": "0:00:01.033081", "end": "2019-02-22 07:08:42.678543", "item": "database-cleanup.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 07:08:41.645462", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:42 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-12514: TNS:listener does not currently know of service requested in connect\ndescriptor\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:42 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-12514: TNS:listener does not currently know of service requested in connect", "descriptor", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
failed: [oracle] (item=database-data.sql) => {"changed": true, "cmd": "sqlplus system/oracle@10.6.221.187/XE < /tmp/database-data.sql", "delta": "0:00:01.000175", "end": "2019-02-22 07:08:49.490630", "item": "database-data.sql", "msg": "non-zero return code", "rc": 1, "start": "2019-02-22 07:08:48.490455", "stderr": "", "stderr_lines": [], "stdout": "\nSQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:49 2019\n\nCopyright (c) 1982, 2014, Oracle. All rights reserved.\n\nERROR:\nORA-12514: TNS:listener does not currently know of service requested in connect\ndescriptor\n\n\nEnter user-name: SP2-0306: Invalid option.\nUsage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]\nwhere <logon> ::= <username>[/<password>][@<connect_identifier>]\n <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]\nEnter user-name: ERROR:\nORA-01017: invalid username/password; logon denied\n\n\nSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus", "stdout_lines": ["", "SQL*Plus: Release 12.1.0.2.0 Production on Fri Feb 22 07:08:49 2019", "", "Copyright (c) 1982, 2014, Oracle. All rights reserved.", "", "ERROR:", "ORA-12514: TNS:listener does not currently know of service requested in connect", "descriptor", "", "", "Enter user-name: SP2-0306: Invalid option.", "Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM|SYSBACKUP|SYSDG|SYSKM}] [edition=value]]", "where <logon> ::= <username>[/<password>][@<connect_identifier>]", " <proxy> ::= <proxyuser>[<username>][/<password>][@<connect_identifier>]", "Enter user-name: ERROR:", "ORA-01017: invalid username/password; logon denied", "", "", "SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus"]}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP
*********************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=4 changed=4 unreachable=0 failed=1
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: possible reason may be in database. It may be unreachable or may work not properly
Solution:
- check network connectivity. Restore it if needed.
- try to access database with database client using credentials configured in deploy config files.
- check database health.
Lack of user rights
Problem: user gets error message similar to:
TASK [consul-agent : Create the Consul Agent deployment directory]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:11
ok: [fixedge_2] => (item=bin) => {"changed": false, "gid": 0, "group": "root", "item": "bin", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/bin", "size": 20, "state": "directory", "uid": 0}
failed: [fixedge_1] (item=bin) => {"changed": false, "item": "bin", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
ok: [fixedge_2] => (item=etc) => {"changed": false, "gid": 0, "group": "root", "item": "etc", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/etc", "size": 6, "state": "directory", "uid": 0}
failed: [fixedge_1] (item=etc) => {"changed": false, "item": "etc", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
failed: [fixedge_1] (item=var) => {"changed": false, "item": "var", "module_stderr": "Shared connection to 10.6.223.22 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}
ok: [fixedge_2] => (item=var) => {"changed": false, "gid": 0, "group": "root", "item": "var", "mode": "0755", "owner": "root", "path": "/srv/consul-agent/var", "size": 90, "state": "directory", "uid": 0}
TASK [consul-agent : Install unzip package if missing]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:21
ok: [fixedge_2] => {"changed": false, "msg": "", "rc": 0, "results": ["unzip-6.0-19.el7.x86_64 providing unzip is already installed"]}
TASK [consul-agent : set_fact]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/consul-agent/tasks/main.yml:28
ok: [fixedge_2] => {"ansible_facts": {"consul_agent_dst": "/home/user/work/ansible-play-lbc/files/consul_1.4.0_linux_amd64.zip"}, "changed": false}
Description: the reason is that passwordless sudo is not configured for user on one of the hosts.
Solution:
- Configure passwordless sudo on host (ip address is present in the error message)
File is not found
Problem: user gets error message similar to:
TASK [cs : Create the Configuration Service destination directory]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/cs/tasks/main.yml:17
ok: [cs_1] => {"changed": false, "gid": 0, "group": "root", "mode": "0755", "owner": "root", "path": "/srv/configuration-service", "size": 61, "state": "directory", "uid": 0}
TASK [cs : Unpack the Configuration Service]
*********************************************************************************************************************************************************
task path: /home/user/work/ansible-play-lbc/roles/cs/tasks/main.yml:23
fatal: [cs_1]: FAILED! => {"changed": false, "msg": "Could not find or access '/home/user/work/ansible-play-lbc/files/configuration-service-1.0.0-SNAPSHOT.tar'"}
to retry, use: --limit @/home/user/work/ansible-play-lbc/deploy.retry
PLAY RECAP
*********************************************************************************************************************************************************
consul_server_1 : ok=2 changed=0 unreachable=0 failed=0
cs_1 : ok=14 changed=1 unreachable=0 failed=1
fixedge_1 : ok=2 changed=0 unreachable=0 failed=0
fixedge_2 : ok=2 changed=0 unreachable=0 failed=0
haproxy : ok=2 changed=0 unreachable=0 failed=0
oracle : ok=6 changed=5 unreachable=0 failed=0
makefile:11: recipe for target 'deploy' failed
make: *** [deploy] Error 2
user@lbc:~/work/ansible-play-lbc$
Description: deployment script can not find one of archive files for some reasons
Solution:
- check the file mentioned in error message exist in the proper directory. If not - put the file there
- check the path is defined properly in config file. If the path is wrong - change the path in configuration file