---
id: c892e028-7f6c-4136-9afa-21a2db2f6013
---

# MongoDB Replication Lag Incident
---

A MongoDB Replication Lag Incident occurs when there is a delay in the replication of data from the primary MongoDB instance to its secondary instances. This delay can cause data inconsistencies and affect the performance of the application. It is important to identify and resolve this incident as soon as possible to ensure that the application is functioning optimally.

### Parameters
```shell
# Environment Variables

export HOST="PLACEHOLDER"

export PORT="PLACEHOLDER"

export DATABASE="PLACEHOLDER"

export PASSWORD="PLACEHOLDER"

export USERNAME="PLACEHOLDER"

export MEMBER_NAME="PLACEHOLDER"

```

## Debug

### Connect to the MongoDB instance with <host> and <port>
```shell
mongo ${HOST}:${PORT}
```

### Check the status of the replica set
```shell
mongo ${HOSTNAME}:${PORT}/${DATABASE} -u ${USERNAME} -p ${PASSWORD} --authenticationDatabase admin --eval "rs.status()"
```

### Check the replication lag for all members of the replica set
```shell
mongo ${HOSTNAME}:${PORT}--eval "printjson(db.printSlaveReplicationInfo())"
```

### Check the replication lag for a specific member of the replica set
```shell
mongo ${HOSTNAME}:${PORT}/${DATABASE} --eval "printjson(db.printSlaveReplicationInfo())" | grep "${MEMBER_NAME}" -A 1 | grep "lagSeconds"
```

### Check the oplog size for all members of the replica set
```shell
mongo ${HOST}:${PORT}/admin --eval "db.getSiblingDB('local').oplog.rs.stats()"
```

### Check the oplog size for a specific member of the replica set
```shell
mongo ${MONGO_URI} --eval "db.getSiblingDB('local').oplog.rs.stats().storageSize"
```

### Check the slow queries log for any queries that may be causing replication lag
```shell
cat /var/log/mongodb/mongod.log | grep "slow query"
```

### Check the system logs for any errors or warnings related to replication
```shell
cat /var/log/mongodb/mongod.log/mongod.log | grep "repl"
```

### Check the network latency between the replica set members
```shell
ping ${HOST}
```

### Check the network throughput between the replica set members
```shell
iperf -c ${HOST}
```

## Repair

### Define the hostnames or IP addresses of the primary and secondary nodes
```shell
PRIMARY="PLACEHOLDER"

SECONDARY="PLACEHOLDER"
```

### Check the status of the primary node
```shell
if ping -c 1 $PRIMARY &> /dev/null

then

    echo "Primary node is up."

else

    echo "Primary node is down."

fi
```

### Check the status of the secondary node
```shell
if ping -c 1 $SECONDARY &> /dev/null

then

    echo "Secondary node is up."

else

    echo "Secondary node is down."

fi
```

### Reduce the write concern settings on the MongoDB cluster to allow for faster replication.
```shell


#!/bin/bash



# Set the MongoDB URI

MONGODB_URI="PLACEHOLDER"



# Set the new write concern settings

NEW_WRITE_CONCERN="PLACEHOLDER"



# Update the write concern settings on the MongoDB cluster

mongo $MONGODB_URI --eval "db.getMongo().setWriteConcern('$NEW_WRITE_CONCERN')"


```

### Increase the replication buffer size to allow for more data to be replicated between the primary and secondary nodes.
```shell
bash

#!/bin/bash



# Set the new replication buffer size

NEW_SIZE="PLACEHOLDER"



# Get the current replication buffer size

CURRENT_SIZE=$(mongo --eval "printjson(db.adminCommand({getCmdLineOpts: 1})).parsed.net.maxIncomingConnectionsBytes")



# Check if the new size is greater than the current size

if [ $NEW_SIZE -gt $CURRENT_SIZE ]

then

  # Update the replication buffer size

  sudo sed -i "s/maxIncomingConnectionsBytes=${CURRENT_SIZE}/maxIncomingConnectionsBytes=${NEW_SIZE}/" /etc/mongod.conf



  # Restart the MongoDB service to apply the changes

  sudo systemctl restart mongod

  

  # Output success message

  echo "Replication buffer size increased to ${NEW_SIZE} bytes."

else

  # Output error message

  echo "New size must be greater than the current size (${CURRENT_SIZE} bytes)."

fi


```

A MongoDB Replication Lag Incident occurs when there is a delay in the replication of data from the primary MongoDB instance to its secondary instances. This delay can cause data inconsistencies and affect the performance of the application. It is important to identify and resolve this incident as soon as possible to ensure that the application is functioning optimally.


This incident type refers to a situation where a SQL Server database is not marked for replication sync, which means it may not be synced with its backup. This can result in data loss or inconsistencies between the primary database and its backup. It is important to address this issue promptly to ensure data integrity and prevent potential downtime or data loss.


SQL Server database not marked for replication sync.

The Redis too many masters incident occurs when there are too many master nodes in a Redis cluster, leading to connection issues and potential data loss. This can happen due to misconfiguration, network issues, or other factors, and requires immediate attention to prevent further damage.


Redis too many masters incident

This incident type refers to an issue with Redis replication, which means that there is a problem with the synchronization of data between Redis instances. This issue could impact the availability and performance of the system and may require immediate attention to restore the replication and ensure data consistency. The incident could be caused by various factors, such as network problems, hardware failures, or configuration issues. The incident must be investigated and resolved as soon as possible to avoid any data loss or downtime.


Redis replication broken incident.

This incident type refers to an issue with Redis where one or more slave instances have become disconnected, resulting in replication failure. This can cause data inconsistencies and may require immediate attention to restore normal functioning. The incident may be caused by a variety of factors, such as network issues, server failures, or misconfiguration.


Redis disconnected slaves incident

This incident type refers to the failure of a replica node in a PostgreSQL database system that is running on a Linux-based operating system. A replica node is a copy of the primary database node that is used to provide high availability and fault tolerance. When a replica node fails, it can result in data loss, decreased system performance, and potential downtime for users. This type of incident requires immediate attention from a software engineer to diagnose and resolve the issue as quickly as possible.


PostgreSQL Replica Node Failure on Linux.

```shell
# Environment Variables

export HOST="PLACEHOLDER"

export PORT="PLACEHOLDER"

export DATABASE="PLACEHOLDER"

export PASSWORD="PLACEHOLDER"

export USERNAME="PLACEHOLDER"

export MEMBER_NAME="PLACEHOLDER"

```


### Connect to the MongoDB instance with <host> and <port>

```shell
mongo ${HOST}:${PORT}
```

### Check the status of the replica set

```shell
mongo ${HOSTNAME}:${PORT}/${DATABASE} -u ${USERNAME} -p ${PASSWORD} --authenticationDatabase admin --eval "rs.status()"
```

### Check the replication lag for all members of the replica set

```shell
mongo ${HOSTNAME}:${PORT}--eval "printjson(db.printSlaveReplicationInfo())"
```

### Check the replication lag for a specific member of the replica set

```shell
mongo ${HOSTNAME}:${PORT}/${DATABASE} --eval "printjson(db.printSlaveReplicationInfo())" | grep "${MEMBER_NAME}" -A 1 | grep "lagSeconds"
```

### Check the oplog size for all members of the replica set

```shell
mongo ${HOST}:${PORT}/admin --eval "db.getSiblingDB('local').oplog.rs.stats()"
```

### Check the oplog size for a specific member of the replica set

```shell
mongo ${MONGO_URI} --eval "db.getSiblingDB('local').oplog.rs.stats().storageSize"
```

### Check the slow queries log for any queries that may be causing replication lag

```shell
cat /var/log/mongodb/mongod.log | grep "slow query"
```

### Check the system logs for any errors or warnings related to replication

```shell
cat /var/log/mongodb/mongod.log/mongod.log | grep "repl"
```

### Check the network latency between the replica set members

```shell
ping ${HOST}
```

### Check the network throughput between the replica set members

```shell
iperf -c ${HOST}
```


### Define the hostnames or IP addresses of the primary and secondary nodes

```shell
PRIMARY="PLACEHOLDER"

SECONDARY="PLACEHOLDER"
```

### Check the status of the primary node

```shell
if ping -c 1 $PRIMARY &> /dev/null

then

    echo "Primary node is up."

else

    echo "Primary node is down."

fi
```

### Check the status of the secondary node

```shell
if ping -c 1 $SECONDARY &> /dev/null

then

    echo "Secondary node is up."

else

    echo "Secondary node is down."

fi
```

### Reduce the write concern settings on the MongoDB cluster to allow for faster replication.

```shell


#!/bin/bash



# Set the MongoDB URI

MONGODB_URI="PLACEHOLDER"



# Set the new write concern settings

NEW_WRITE_CONCERN="PLACEHOLDER"



# Update the write concern settings on the MongoDB cluster

mongo $MONGODB_URI --eval "db.getMongo().setWriteConcern('$NEW_WRITE_CONCERN')"


```

### Increase the replication buffer size to allow for more data to be replicated between the primary and secondary nodes.

```shell
bash

#!/bin/bash



# Set the new replication buffer size

NEW_SIZE="PLACEHOLDER"



# Get the current replication buffer size

CURRENT_SIZE=$(mongo --eval "printjson(db.adminCommand({getCmdLineOpts: 1})).parsed.net.maxIncomingConnectionsBytes")



# Check if the new size is greater than the current size

if [ $NEW_SIZE -gt $CURRENT_SIZE ]

then

  # Update the replication buffer size

  sudo sed -i "s/maxIncomingConnectionsBytes=${CURRENT_SIZE}/maxIncomingConnectionsBytes=${NEW_SIZE}/" /etc/mongod.conf



  # Restart the MongoDB service to apply the changes

  sudo systemctl restart mongod

  

  # Output success message

  echo "Replication buffer size increased to ${NEW_SIZE} bytes."

else

  # Output error message

  echo "New size must be greater than the current size (${CURRENT_SIZE} bytes)."

fi


```


MongoDB Replication Lag Incident

Overview

Parameters

Debug

Connect to the MongoDB instance with <host> and <port>

Check the status of the replica set

Check the replication lag for all members of the replica set

Check the replication lag for a specific member of the replica set

Check the oplog size for all members of the replica set

Check the oplog size for a specific member of the replica set

Check the slow queries log for any queries that may be causing replication lag

Check the network latency between the replica set members

Check the network throughput between the replica set members

Repair

Define the hostnames or IP addresses of the primary and secondary nodes

Check the status of the primary node

Check the status of the secondary node

Reduce the write concern settings on the MongoDB cluster to allow for faster replication.

Increase the replication buffer size to allow for more data to be replicated between the primary and secondary nodes.

Learn more

Related Runbooks

SQL Server database not marked for replication sync.

Redis too many masters incident

Redis replication broken incident.

Redis disconnected slaves incident

Support