现象

线上服务报错大量接口超时，然后自动恢复。查看服务日志发现是连不上redis了。发现redis自动重启了。查看redis日志：

探活失败：

容器日志报错：Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis

结论

我这里redis是在k8s中的，存储使用的NFS，底层磁盘不是SSD高性能io磁盘，redis配置了aof，并且是appendfsync everysec。当redis后台有fsync操作，并且等待超过了2s，就会阻塞write操作，此时redis是不可写入的，就会打印出这条log日志，触发业务超时。

查看storageclass

查看redis配置

原理

查看redis源码，搜索这条日志。

-----------函数注释-----------
/* Write the append only file buffer on disk.
*
* Since we are required to write the AOF before replying to the client,
* and the only way the client socket can get a write is entering when the
* the event loop, we accumulate all the AOF writes in a memory
* buffer and write it on disk using this function just before entering
* the event loop again.
*
函数的作用是：将AOF buffer的内容刷到磁盘上，一般情况下在回复客户端响应之前需要将AOF写入，业务在事件循环结束的时候才能收到已经写入数据的反馈，我们把所有的AOF写入在一个内存buffer中，然后使用这个函数将它写入磁盘，然后开始进入下一个事件循环。

* About the 'force' argument:
*
* When the fsync policy is set to 'everysec' we may delay the flush if there
* is still an fsync() going on in the background thread, since for instance
* on Linux write(2) will be blocked by the background fsync anyway.
* When this happens we remember that there is some aof buffer to be
* flushed ASAP, and will try to do that in the serverCron() function.
*
* However if force is set to 1 we'll write regardless of the background
* fsync. */
---------------函数--------------
#define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */
void flushAppendOnlyFile(int force) {
    ssize_t nwritten;
    int sync_in_progress = 0;
    mstime_t latency;

    if (sdslen(server.aof_buf) == 0) return;

    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
        sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0;

    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
        /* With this append fsync policy we do background fsyncing.
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. */
        if (sync_in_progress) {
            if (server.aof_flush_postponed_start == 0) {
                /* No previous write postponing, remember that we are
                 * postponing the flush and return. */
                 * 前面没有推迟过 write 操作，这里将推迟写操作的时间记录下来
                 * 然后就返回，不执行 write 或者 fsync
                server.aof_flush_postponed_start = server.unixtime;
                return;
            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. */
                 * 如果之前已经因为 fsync 而推迟了 write 操作
                 * 但是推迟的时间不超过 2 秒，那么直接返回
                 * 不执行 write 或者 fsync
                return;
            }
            /* Otherwise fall trough, and go write since we can't wait
             * over two seconds. */
             * 如果后台还有 fsync 在执行，并且 write 已经推迟 >= 2 秒
             * 那么执行写操作（write 将被阻塞）
            server.aof_delayed_fsync++;
            serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }

什么情况下会导致redis的fsync阻塞2s：

如果开启了appendfsync everysec的fsync策略，并且no-appendfsync-on-rewrite参数为no，则redis在做AOF重写的时候，也会每秒将命令fsync到磁盘上，而此时Redis的写入量大而磁盘性能较差，fsync的等待就会严重。
单纯的写入量很大，大到磁盘无法支撑这个写入。

为什么超过2s，Redis就会阻塞写入？

代码注释中有一句：When the fsync policy is set to 'everysec' we may delay the flush if there is still an fsync() going on in the background thread, since for instance on Linux write(2) will be blocked by the background fsync anyway.

意思是：当fsync策略设置为'everysec'时，如果后台线程中仍然有fsync()正在进行，我们可能会延迟刷写，因为在Linux上，无论如何write(2)都会被后台fsync阻塞。

redis采用IO多路复用，多路，指的是多个网络地址，复用是指重复利用单个线程。redis将网络请求转换成一个个事件，在处理完每个事件后会调用linux write(2)机制将数据写入到操作系统内核的buffer，如果这个时候write(2)被阻塞，Redis就不能执行下一个事件。

Linux中规定，当一个文件执行write(2)时候，如果对同一个文件正在执行fdatasync(2), write(2)就会被阻塞住。如果系统IO繁忙，比如有别的应用在写盘，或者redis自己在AOF rewrite或RDB snapshot(虽然此时写入的是另一个临时文件，并且各自都在连续写，但两个文件间的切换使得磁盘磁头的寻道时间加长），则很有可能导致上述fdatasync(2)超时, write(2)就会被阻塞住，那么整个Redis也会被阻塞住。

不过redis也提供了一个挽救措施：
当发现有fdatasync(2)的时候，先不进行write(2)，而直接将数据存储在redis自身的cache中，但是如果超过2s还是这样，还是会继续调用write(2)，redis会被Block住，然后打印日志，将aof_delayed_fsync变量加一。

所以在appendonly=everysec这个刷盘策略的情况下，redis意外关闭会造成最多不超过2s的数据丢失。如果fdatasync运行正常，redis意外关闭没有影响，只有当操作系统crash时才会造成少于1秒的数据丢失。

解决

1.更换SSD磁盘

这个需要根据现场资源情况来确定是否能更换。

2.修改配置

edis在执行write的时候，由操作系统自身被动控制何时进行fsync。这里如果我们要主动触发操作系统的fsync，可以设置操作系统级别的参数：

查看内存的脏页字节大小，设置为0代表由系统自己控制何时调用fsync
sysctl -a | grep vm.dirty_bytes

修改为32M，达到这个数据量就fsync，让操作系统fsync这个动作更频繁一点，避免单次fsync太多数据，导致阻塞
echo "vm.dirty_bytes=33554432" >> /etc/sysctl.conf

3.关闭RDB或者AOF

这个线上环境是不能关的，除非是纯缓存场景。可以从库开启AOF，而主库关闭AOF。

现象

结论

原理

什么情况下会导致redis的fsync阻塞2s：

为什么超过2s，Redis就会阻塞写入？

解决

1.更换SSD磁盘

2.修改配置

3.关闭RDB或者AOF

相关文章