Quantcast
Channel: Percona Database Performance Blog
Viewing all articles
Browse latest Browse all 1785

CommitQuorum in Index Creation From Percona Server for MongoDB 4.4

$
0
0
CommitQuorum MongoDB

Before Percona Server for MongoDB 4.4 (PSMDB), the best practice to create an index was doing it in a rolling manner. Many folks used to create directly on Primary, resulting in the first index being created successfully on Primary and then replicated to Secondary nodes.

Starting from PSMDB 4.4, there was a new parameter commitQuorum introduced in the createIndex command. If you are not passing this parameter explicitly with the createIndex command, it will use the default settings on a replica set or sharded cluster and start building the index simultaneously across all data-bearing voting replica set members.

Below is the command used to create an index using commitQuorum as the majority:

db.getSiblingDB("acme").products.createIndex({ "airt" : 1 }, { }, "majority")

The above command will run the index create command on the majority of data-bearing replica set members. There are other options available too when using commitQuorum:

  1. “Voting Members” – This is the default behavior when an index will be created on all data-bearing voting replica set members (Default). A “voting” member is any replica set member where votes are greater than 0.
  2. “Majority” – A simple majority of data-bearing replica set members.
  3. “<int>” – A specific number of data-bearing replica set members. Specify an integer greater than 0.
  4. “Tag name” – A replica set tag name of a node is used.

Now we will see the scenarios of what happens when the index is created with the default and majority commitQuorum.

  1. When all data-bearing replica set members are available, and the index is created with default commitQuorum, below are the details from the Primary and the Secondary nodes. Create index:
    rs1:PRIMARY> db.products.createIndex({ "airt" : 1 })

    Primary logs:
    {"t":{"$date":"2023-06-26T12:33:18.417+00:00"},"s":"I",  "c":"INDEX",    "id":20384,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: starting","attr":{"namespace":"acme.products","buildUUID":{"uuid":{"$uuid":"58f4e7bf-7b8f-4eb6-8de0-0ad774c4b51f"}},"properties":{"v":2,"key":{"airt":1.0},"name":"airt_1"},"method":"Hybrid","maxTemporaryMemoryUsageMB":200}}

    Secondary logs:
    {"t":{"$date":"2023-06-26T12:33:18.417+00:00"},"s":"I",  "c":"INDEX",    "id":20384,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: starting","attr":{"namespace":"acme.products","buildUUID":{"uuid":{"$uuid":"58f4e7bf-7b8f-4eb6-8de0-0ad774c4b51f"}},"properties":{"v":2,"key":{"airt":1.0},"name":"airt_1"},"method":"Hybrid","maxTemporaryMemoryUsageMB":200}}}

    Secondary logs:
    {"t":{"$date":"2023-06-26T12:33:28.445+00:00"},"s":"I",  "c":"INDEX",    "id":20384,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: starting","attr":{"namespace":"acme.products","buildUUID":{"uuid":{"$uuid":"58f4e7bf-7b8f-4eb6-8de0-0ad774c4b51f"}},"properties":{"v":2,"key":{"airt":1.0},"name":"airt_1"},"method":"Hybrid","maxTemporaryMemoryUsageMB":200}}

    We can see the above index was created simultaneously on all the data-bearing voting replica set members.
  2. When one secondary is down, and the index is created with default commitQuorum, below are the details from the Primary and the Secondary nodes.

    Status of nodes:

    rs1:PRIMARY> rs.status().members.forEach(function (d) {print(d.name) + " " + print(d.stateStr)});
    127.0.0.1:27017
    PRIMARY
    localhost:27018
    SECONDARY
    localhost:27019
    (not reachable/healthy)
    rs1:PRIMARY>

    Index command:

    rs1:PRIMARY> db.products.createIndex({ "airt" : 1 })

    Replication status:

    rs1:PRIMARY> db.printSecondaryReplicationInfo()
    source: localhost:27018
            syncedTo: Mon Jun 26 2023 17:56:30 GMT+0000 (UTC)
            0 secs (0 hrs) behind the primary
    source: localhost:27019
            syncedTo: Thu Jan 01 1970 00:00:00 GMT+0000 (UTC)
            1687802190 secs (468833.94 hrs) behind the primary
    rs1:PRIMARY>

    Index status:

    rs1:PRIMARY> db.currentOp(true).inprog.forEach(function(op){ if(op.msg!==undefined) print(op.msg) })
    Index Build: draining writes received during build
    rs1:PRIMARY> Date()
    Mon Jun 26 2023 18:07:26 GMT+0000 (UTC)
    rs1:PRIMARY>

    CurrentOp:

    "active" : true,
          "currentOpTime" :"2023-06-26T19:04:33.175+00:00",
          "opid" : 329147,
      "lsid" : {
               "id" :UUID("dd9672f8-4f56-47ce-8ceb-31caf5e8baf8"),
               "uid": BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
                },
     "secs_running" : NumberLong(4214),
     "microsecs_running" : NumberLong("4214151233"),
     "op" : "command",
     "ns" : "acme.products",
     "command" : {
                  "createIndexes" : "products",
                  "indexes" : [
                                  {
                                   "key" : {
                                            "airt" : 1
                                            },
                                   "name" : "airt_1"
                                  }
                               ],
                   "lsid" : {
                   "id" :UUID("dd9672f8-4f56-47ce-8ceb-31caf5e8baf8")
                             },
                   "$clusterTime" : {
                                     "clusterTime" : Timestamp(1687801980, 1),
                                     "signature" : {
                                     "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                                     "keyId" : NumberLong(0)
                                            }
                                    },
                    "$db" : "acme"
                            }

    Logs from Primary node:

    {"t":{"$date":"2023-06-26T17:54:21.419+00:00"},"s":"I",  "c":"STORAGE",  "id":3856203, "ctx":"IndexBuildsCoordinatorMongod-1","msg":"Index build: waiting for next action before completing final phase","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}}}}

     Logs from up-and-running Secondary node:

    {"t":{"$date":"2023-06-26T17:54:21.424+00:00"},"s":"I",  "c":"STORAGE",  "id":3856203, "ctx":"IndexBuildsCoordinatorMongod-1","msg":"Index build: waiting for next action before completing final phase","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}}}}

    You can see above that when one node is down, and the index is created with default commitQuorum, the index command will keep running till that third data-bearing voting node comes up. Now we can check if the index is created on Primary or not:

    rs1:PRIMARY> db.products.getIndexes()
    [
            {
                    "v" : 2,
                    "key" : {
                            "_id" : 1
                    },
                    "name" : "_id_"
            },
            {
                    "v" : 2,
                    "key" : {
                            "airt" : 1
                    },
                    "name" : "airt_1"
            }
    ]
    rs1:PRIMARY>

    We can see the index is created, but you will not be able to use the above index as the index is not marked as completed.

    Below is the explain plan of a query, where we can see the query is doing COLLSCAN instead of IXSCAN:

    rs1:PRIMARY> db.products.find({"airt" : 1.9869362536440427}).explain()
    {
            "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "acme.products",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                            "airt" : {
                                    "$eq" : 1.9869362536440427
                            }
                    },
                    "queryHash" : "65E2F79D",
                    "planCacheKey" : "AA490985",
                    "winningPlan" : {
                            "stage" : "COLLSCAN",
                            "filter" : {
                                    "airt" : {
                                            "$eq" : 1.9869362536440427
                                    }
                            },
                            "direction" : "forward"
                    },
                    "rejectedPlans" : [ ]
            },
            "serverInfo" : {
                    "host" : "ip-172-31-82-235.ec2.internal",
                    "port" : 27017,
                    "version" : "4.4.22-21",
                    "gitVersion" : "be7a5f4a1000bed8cf1d1feb80a20664d51503ce"
    }

    Now I will bring up the third node, and we will see that index op will complete.

    Index status:

    rs1:PRIMARY> db.products.createIndex({ "airt" : 1 })
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "commitQuorum" : "votingMembers",
            "ok" : 1,
            "$clusterTime" : {
                    "clusterTime" : Timestamp(1687806737, 3),
                    "signature" : {
                            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                            "keyId" : NumberLong(0)
                    }
            },
            "operationTime" : Timestamp(1687806737, 3)
    }
    rs1:PRIMARY>

    Now will run the same query, and we can see index (IXSCAN) is getting used as the index was created successfully above:

    rs1:PRIMARY> db.products.find({"airt" : 1.9869362536440427}).explain()
    {
            "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "acme.products",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                            "airt" : {
                                    "$eq" : 1.9869362536440427
                            }
                    },
                    "queryHash" : "65E2F79D",
                    "planCacheKey" : "AA490985",
                    "winningPlan" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                    "stage" : "IXSCAN",
                                    "keyPattern" : {
                                            "airt" : 1
                                    },
                                    "indexName" : "airt_1",
                                    "isMultiKey" : false,
                                    "multiKeyPaths" : {
                                            "airt" : [ ]
                                    },
                                    "isUnique" : false,
                                    "isSparse" : false,
                                    "isPartial" : false,
                                    "indexVersion" : 2,
                                    "direction" : "forward",
                                    "indexBounds" : {
                                            "airt" : [
                                                    "[1.986936253644043, 1.986936253644043]"
                                            ]
                                    }
                            }
                    },
                    "rejectedPlans" : [ ]
            },
            "serverInfo" : {
                    "host" : "ip-172-31-82-235.ec2.internal",
                    "port" : 27017,
                    "version" : "4.4.22-21",
                    "gitVersion" : "be7a5f4a1000bed8cf1d1feb80a20664d51503ce"
            }

    Primary logs once the third node came up and the index was created successfully:

    {"t":{"$date":"2023-06-26T19:12:17.450+00:00"},"s":"I",  "c":"STORAGE",  "id":3856201, "ctx":"conn40","msg":"Index build: commit quorum satisfied","attr":{"indexBuildEntry":{"_id":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"},"collectionUUID":{"$uuid":"a963b7e7-1054-4a5f-a935-a5be8995cff0"},"commitQuorum":"votingMembers","indexNames":["airt_1"],"commitReadyMembers":["127.0.0.1:27017","localhost:27018","localhost:27019"]}}}
    
    {"t":{"$date":"2023-06-26T19:12:17.450+00:00"},"s":"I",  "c":"STORAGE",  "id":3856204, "ctx":"IndexBuildsCoordinatorMongod-1","msg":"Index build: received signal","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}},"action":"Commit quorum Satisfied"}}
    
    {"t":{"$date":"2023-06-26T19:12:17.451+00:00"},"s":"I",  "c":"INDEX",    "id":20345,   "ctx":"IndexBuildsCoordinatorMongod-1","msg":"Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}},"namespace":"acme.products","index":"airt_1","commitTimestamp":{"$timestamp":{"t":1687806737,"i":2}}}}
    
    {"t":{"$date":"2023-06-26T19:12:17.452+00:00"},"s":"I",  "c":"STORAGE",  "id":20663,   "ctx":"IndexBuildsCoordinatorMongod-1","msg":"Index build: completed successfully","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}},"namespace":"acme.products","uuid":{"uuid":{"$uuid":"a963b7e7-1054-4a5f-a935-a5be8995cff0"}},"indexesBuilt":1,"numIndexesBefore":1,"numIndexesAfter":2}}
    
    {"t":{"$date":"2023-06-26T19:12:17.554+00:00"},"s":"I",  "c":"INDEX",    "id":20447,   "ctx":"conn34","msg":"Index build: completed","attr":{"buildUUID":{"uuid":{"$uuid":"46451b37-141f-4312-a219-4b504736ab5b"}}}}
    
    {"t":{"$date":"2023-06-26T19:12:17.554+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn34","msg":"Slow query","attr":{"type":"command","ns":"acme.products","appName":"MongoDB Shell","command":{"createIndexes":"products","indexes":[{"key":{"airt":1.0},"name":"airt_1"}],"lsid":{"id":{"$uuid":"dd9672f8-4f56-47ce-8ceb-31caf5e8baf8"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1687801980,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"acme"},"numYields":0,"reslen":271,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1,"w":4}},"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":1,"w":4}},"Database":{"acquireCount":{"w":3}},"Collection":{"acquireCount":{"r":1,"w":1,"W":1}},"Mutex":{"acquireCount":{"r":3}}},"flowControl":{"acquireCount":3,"timeAcquiringMicros":7},"storage":{"data":{"bytesRead":98257,"timeReadingMicros":3489}},"protocol":"op_msg","durationMillis":4678530}}

    Above, you can see how much time it took to complete the index build; the op was running till the third node was down.

  3. When one secondary is down, and the index is created with commitQuorum as the majority, below are the details from the Primary and the Secondary nodes. Status of nodes:
    rs1:PRIMARY> rs.status().members.forEach(function (d) {print(d.name) + " " + print(d.stateStr)});
    127.0.0.1:27017
    PRIMARY
    localhost:27018
    SECONDARY
    localhost:27019
    (not reachable/healthy)
    rs1:PRIMARY>

    Index command:

    rs1:PRIMARY> db.products.createIndex({ "airt" : 1 }, { }, "majority")
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "commitQuorum" : "majority",
            "ok" : 1,
            "$clusterTime" : {
                    "clusterTime" : Timestamp(1687808148, 4),
                    "signature" : {
                            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                            "keyId" : NumberLong(0)
                    }
            },
            "operationTime" : Timestamp(1687808148, 4)
    }
    rs1:PRIMARY>

    Logs from Primary node:

    {"t":{"$date":"2023-06-26T19:35:48.821+00:00"},"s":"I",  "c":"STORAGE",  "id":3856201, "ctx":"conn7","msg":"Index build: commit quorum satisfied","attr":{"indexBuildEntry":{"_id":{"$uuid":"5f8f75ee-aa46-42a6-b4c2-59a68fea47a7"},"collectionUUID":{"$uuid":"a963b7e7-1054-4a5f-a935-a5be8995cff0"},"commitQuorum":"majority","indexNames":["airt_1"],"commitReadyMembers":["127.0.0.1:27017","localhost:27018"]}}}
    
    {"t":{"$date":"2023-06-26T19:35:48.821+00:00"},"s":"I",  "c":"STORAGE",  "id":3856204, "ctx":"IndexBuildsCoordinatorMongod-3","msg":"Index build: received signal","attr":{"buildUUID":{"uuid":{"$uuid":"5f8f75ee-aa46-42a6-b4c2-59a68fea47a7"}},"action":"Commit quorum Satisfied"}}
    
    {"t":{"$date":"2023-06-26T19:35:48.822+00:00"},"s":"I",  "c":"INDEX",    "id":20345,   "ctx":"IndexBuildsCoordinatorMongod-3","msg":"Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"5f8f75ee-aa46-42a6-b4c2-59a68fea47a7"}},"namespace":"acme.products","index":"airt_1","commitTimestamp":{"$timestamp":{"t":1687808148,"i":3}}}}
    
    {"t":{"$date":"2023-06-26T19:35:48.824+00:00"},"s":"I",  "c":"STORAGE",  "id":20663,   "ctx":"IndexBuildsCoordinatorMongod-3","msg":"Index build: completed successfully","attr":{"buildUUID":{"uuid":{"$uuid":"5f8f75ee-aa46-42a6-b4c2-59a68fea47a7"}},"namespace":"acme.products","uuid":{"uuid":{"$uuid":"a963b7e7-1054-4a5f-a935-a5be8995cff0"}},"indexesBuilt":1,"numIndexesBefore":1,"numIndexesAfter":2}}
    
    {"t":{"$date":"2023-06-26T19:35:48.923+00:00"},"s":"I",  "c":"INDEX",    "id":20447,   "ctx":"conn34","msg":"Index build: completed","attr":{"buildUUID":{"uuid":{"$uuid":"5f8f75ee-aa46-42a6-b4c2-59a68fea47a7"}}}}
    
    {"t":{"$date":"2023-06-26T19:35:48.923+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn34","msg":"Slow query","attr":{"type":"command","ns":"acme.products","appName":"MongoDB Shell","command":{"createIndexes":"products","indexes":[{"key":{"airt":1.0},"name":"airt_1"}],"commitQuorum":"majority","lsid":{"id":{"$uuid":"dd9672f8-4f56-47ce-8ceb-31caf5e8baf8"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1687808123,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"acme"},"numYields":0,"reslen":266,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1,"w":4}},"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":1,"w":4}},"Database":{"acquireCount":{"w":3}},"Collection":{"acquireCount":{"r":1,"w":1,"W":1}},"Mutex":{"acquireCount":{"r":3}}},"flowControl":{"acquireCount":3,"timeAcquiringMicros":7},"storage":{},"protocol":"op_msg","durationMillis":2469}}

    Above, we can see when one node is down, and we used commitQuorum as majority while creating the index, index op got completed as per expected behavior as two voting (majority) nodes were up and running.

So far, we have discussed how to use commitQuorum and when to use it. Now we will see a scenario when one node (voting) is down for any reason, and someone created an index with default commitQuorum. The op will keep running, and you want to kill the op.

I created the index with the default commitQuorum when one node is down.

Status of nodes:

rs1:PRIMARY> rs.status().members.forEach(function (d) {print(d.name) + " " + print(d.stateStr)});
127.0.0.1:27017
PRIMARY
localhost:27018
SECONDARY
localhost:27019
(not reachable/healthy)
rs1:PRIMARY>

CurrentOp:

"active" : true,
"currentOpTime" : "2023-06-26T21:27:41.304+00:00",
"opid" : 536535,
"lsid" : {
          "id" : UUID("dd9672f8-4f56-47ce-8ceb-31caf5e8baf8"),
          "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
 },
 "secs_running" : NumberLong(264),
 "microsecs_running" : NumberLong(264345444),
 "op" : "command",
 "ns" : "acme.products",
 "command" : {
              "createIndexes" : "products",
              "indexes" : [
                      {
                           "key" : {
                                     "airt" : 1
                                    },
                           "name" : "airt_1"
                       }
                            ],
 "lsid" : {
               "id" : UUID("dd9672f8-4f56-47ce-8ceb-31caf5e8baf8")
           },
               "$clusterTime" : {
                                "clusterTime" : Timestamp(1687814589, 2),
                                "signature" : {
                                               "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                                 "keyId" : NumberLong(0)
                                 }
                                },
                                "$db" : "acme"
                        }

Now you need to kill the above opid to release the above op:

rs1:PRIMARY> db.killOp(536535)
{
        "info" : "attempting to kill op",
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1687815189, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1687815189, 2)
}
rs1:PRIMARY>
rs1:PRIMARY> db.products.createIndex({ "airt" : 1 })
{
        "operationTime" : Timestamp(1687815192, 2),
        "ok" : 0,
        "errmsg" : "operation was interrupted",
        "code" : 11601,
        "codeName" : "Interrupted",
        "$clusterTime" : {
                "clusterTime" : Timestamp(1687815192, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs1:PRIMARY>
rs1:PRIMARY> db.products.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
rs1:PRIMARY>

Above, we can see when we killed the op, and the index creation op got killed.

Conclusion

We have seen how commitQuorum works while creating indexes from PSMDB 4.4. Still, the best practice is to create indexes in a rolling manner.

We recommend checking out our products for Percona Server for MongoDB, Percona Backup for MongoDB, and Percona Operator for MongoDB. We also recommend checking out our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?


Viewing all articles
Browse latest Browse all 1785

Trending Articles