Issue Starting MariaDB as a Slave



  • I have a MariaDB / MySQL database running on one node, no problems (as far as I am aware.) On a second node I have a MariaDB slave that used to be fine but now cannot start. I have not seen these database errors before and I am attempting to debug this to figure out what is wrong.

    150615 23:31:33 mysqld_safe Starting mysqld daemon with databases from /db/mysql
    150615 23:31:33 mysqld_safe WSREP: Running position recovery with --log_error='/db/mysql/wsrep_recovery.KrXayq' --pid-file='/db/mysql/mysql_slave-app_slave00-recover.pid'
    150615 23:31:40 mysqld_safe WSREP: Recovered position f373d3de-c577-11e3-9ccf-fe8f0d75d59c:3896780
    150615 23:31:40 [Note] WSREP: wsrep_start_position var submitted: 'f373d3de-c577-11e3-9ccf-fe8f0d75d59c:3896780'
    150615 23:31:40 [Note] WSREP: Read nil XID from storage engines, skipping position init
    150615 23:31:40 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
    150615 23:31:40 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy info@codership.com loaded successfully.
    150615 23:31:40 [Note] WSREP: CRC-32C: using hardware acceleration.
    150615 23:31:40 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    150615 23:31:40 [Note] WSREP: Passing config to GCS: base_host = x.x.x.x; base_port = 4567; cert.log_conflicts = no; evs.inactive_check_period = PT3S; evs.inactive_timeout = PT60S; evs.install_timeout = PT60S; evs.keepalive_period = PT1S; evs.suspect_timeout = PT10S; gcache.dir = /db/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /db/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 5G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
    150615 23:31:40 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    150615 23:31:40 [Note] WSREP: wsrep_sst_grab()
    150615 23:31:40 [Note] WSREP: Start replication
    150615 23:31:40 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    150615 23:31:40 [Note] WSREP: protonet asio version 0
    150615 23:31:40 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
    150615 23:31:40 [Note] WSREP: backend: asio
    150615 23:31:40 [Note] WSREP: GMCast version 0
    150615 23:31:40 [Note] WSREP: (aca0e458-13b6-11e5-92a0-b2b8427ffb6c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    150615 23:31:40 [Note] WSREP: (aca0e458-13b6-11e5-92a0-b2b8427ffb6c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    150615 23:31:40 [Note] WSREP: EVS version 0
    150615 23:31:40 [Note] WSREP: PC version 0
    150615 23:31:40 [Note] WSREP: gcomm: connecting to group 'xxxxx', peer 'x.x.x.x:,x.x.x.x:'
    150615 23:31:40 [Warning] WSREP: (aca0e458-13b6-11e5-92a0-b2b8427ffb6c, 'tcp://0.0.0.0:4567') address 'tcp://x.x.x.x:4567' points to own listening address, blacklisting
    150615 23:31:40 [Note] WSREP: (aca0e458-13b6-11e5-92a0-b2b8427ffb6c, 'tcp://0.0.0.0:4567') address 'tcp://x.x.x.x:4567' pointing to uuid aca0e458-13b6-11e5-92a0-b2b8427ffb6c is blacklisted, skipping
    150615 23:31:41 [Note] WSREP: declaring 1d809986-1390-11e5-868f-2b0449a68232 stable
    150615 23:31:41 [Note] WSREP: Node 1d809986-1390-11e5-868f-2b0449a68232 state prim
    150615 23:31:41 [Note] WSREP: view(view_id(PRIM,1d809986-1390-11e5-868f-2b0449a68232,34) memb {
    1d809986-1390-11e5-868f-2b0449a68232,0
    aca0e458-13b6-11e5-92a0-b2b8427ffb6c,0
    } joined {
    } left {
    } partitioned {
    })
    150615 23:31:41 [Note] WSREP: gcomm: connected
    150615 23:31:41 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    150615 23:31:41 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    150615 23:31:41 [Note] WSREP: Opened channel 'xxxxx'
    150615 23:31:41 [Note] WSREP: Waiting for SST to complete.
    150615 23:31:41 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    150615 23:31:41 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    150615 23:31:41 [Note] WSREP: STATE EXCHANGE: sent state msg: acee0164-13b6-11e5-bca2-6a93a1c3e829
    150615 23:31:41 [Note] WSREP: STATE EXCHANGE: got state msg: acee0164-13b6-11e5-bca2-6a93a1c3e829 from 0 (mysql00)
    150615 23:31:41 [Note] WSREP: STATE EXCHANGE: got state msg: acee0164-13b6-11e5-bca2-6a93a1c3e829 from 1 (mysql_slave-app_slave00)
    150615 23:31:41 [Note] WSREP: Quorum results:
    version = 3,
    component = PRIMARY,
    conf_id = 33,
    members = 1/2 (joined/total),
    act_id = 3903514,
    last_appl. = -1,
    protocols = 0/5/2 (gcs/repl/appl),
    group UUID = f373d3de-c577-11e3-9ccf-fe8f0d75d59c
    150615 23:31:41 [Note] WSREP: Flow-control interval: [23, 23]
    150615 23:31:41 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3903514)
    150615 23:31:41 [Note] WSREP: State transfer required:
    Group state: f373d3de-c577-11e3-9ccf-fe8f0d75d59c:3903514
    Local state: 00000000-0000-0000-0000-000000000000:-1
    150615 23:31:41 [Note] WSREP: New cluster view: global state: f373d3de-c577-11e3-9ccf-fe8f0d75d59c:3903514, view# 34: Primary, number of nodes: 2, my index: 1, protocol version 2
    150615 23:31:41 [Warning] WSREP: Gap in state sequence. Need state transfer.
    150615 23:31:43 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address 'x.x.x.x' --auth 'no:nono' --datadir '/db/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '16465''
    WSREP_SST: [INFO] Streaming with tar (20150615 23:31:43.615)
    WSREP_SST: [INFO] Note: Advanced xtrabackup features - encryption,compression etc. not available with tar. (20150615 23:31:43.619)
    WSREP_SST: [INFO] However, xbstream requires manual cleanup of data directory before SST - lp:1193240. (20150615 23:31:43.623)
    150615 23:31:43 [Note] WSREP: Prepared SST request: xtrabackup|x.x.x.x:4444/xtrabackup_sst
    150615 23:31:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    150615 23:31:43 [Note] WSREP: REPL Protocols: 5 (3, 1)
    150615 23:31:43 [Note] WSREP: Assign initial position for certification: 3903514, protocol version: 3
    150615 23:31:43 [Note] WSREP: Service thread queue flushed.
    150615 23:31:43 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f373d3de-c577-11e3-9ccf-fe8f0d75d59c): 1 (Operation not permitted)
    at galera/src/replicator_str.cpp:prepare_for_IST():445. IST will be unavailable.
    150615 23:31:43 [Note] WSREP: Node 1.0 (mysql_slave-app_slave00) requested state transfer from 'any'. Selected 0.0 (mysql00)(SYNCED) as donor.
    150615 23:31:43 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3903514)
    150615 23:31:43 [Note] WSREP: Requesting state transfer: success, donor: 0
    WSREP_SST: [ERROR] xtrabackup process ended without creating '/db/mysql//xtrabackup_galera_info' (20150615 23:31:45.340)
    WSREP_SST: [ERROR] Cleanup after exit with status:32 (20150615 23:31:45.340)
    150615 23:31:45 [Warning] WSREP: 0.0 (mysql00): State transfer to 1.0 (mysql_slave-app_slave00) failed: -1 (Operation not permitted)
    150615 23:31:45 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.
    150615 23:31:45 [Note] WSREP: gcomm: terminating thread
    150615 23:31:45 [Note] WSREP: gcomm: joining thread
    150615 23:31:45 [Note] WSREP: gcomm: closing backend
    WSREP_SST: [ERROR] Killing nc pid (20150615 23:31:45.363)
    WSREP_SST: [INFO] Removing the sst_in_progress file (20150615 23:31:45.369)
    150615 23:31:45 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address 'x.x.x.x' --auth 'no:no' --datadir '/db/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '16465': 32 (Broken pipe)
    150615 23:31:45 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    150615 23:31:45 [ERROR] WSREP: SST failed: 32 (Broken pipe)
    150615 23:31:45 [ERROR] Aborting

    150615 23:31:46 [Note] WSREP: view(view_id(NON_PRIM,1d809986-1390-11e5-868f-2b0449a68232,34) memb {
    aca0e458-13b6-11e5-92a0-b2b8427ffb6c,0
    } joined {
    } left {
    } partitioned {
    1d809986-1390-11e5-868f-2b0449a68232,0
    })
    150615 23:31:46 [Note] WSREP: view((empty))
    150615 23:31:46 [Note] WSREP: gcomm: closed
    150615 23:31:46 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    150615 23:31:46 mysqld_safe mysqld from pid file /db/mysql/mysqld.pid ended



  • This appears to be the most relevant section:

    WSREP_SST: [ERROR] xtrabackup process ended without creating '/db/mysql//xtrabackup_galera_info' (20150615 23:31:45.340)
    WSREP_SST: [ERROR] Cleanup after exit with status:32 (20150615 23:31:45.340)
    150615 23:31:45 [Warning] WSREP: 0.0 (mysql00): State transfer to 1.0 (mysql_slave-app_slave00) failed: -1 (Operation not permitted)
    150615 23:31:45 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.


  • Although this could definitely be something:

    150615 23:48:22 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f373d3de-c577-11e3-9ccf-fe8f0d75d59c): 1 (Operation not permitted)


  • Could this be an issue related to Percona XtraBackup?



  • Trying anything I can think of, this part looks promising to maybe be the issue...

    WSREP_SST: [INFO] Removing the sst_in_progress file (20150615 23:31:45.369)
    150615 23:31:45 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address 'x.x.x.x' --auth 'no:no' --datadir '/db/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '16465': 32 (Broken pipe)
    150615 23:31:45 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    150615 23:31:45 [ERROR] WSREP: SST failed: 32 (Broken pipe)
    150615 23:31:45 [ERROR] Aborting