Dealing With A MySQL Deadlock Problem Caused By Concurrent DML

2020/12/06

记一次由并发DML引起的MySQL死锁场景

一次出差中,在客户现场遇到了由并发DML引起的MySQL死锁场景

场景再现

首先把场景重现并简化,触发条件:①事务隔离级别为InnoDB默认的REPEATABLE-READ,②有多个Session同时进行事务操作,而事务的内容是先删除一行特定记录然后重新插入该记录。

如下图所示:

数据库show engine innodb status吐出的死锁信息如下:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-12-06 22:14:23 0x70000ca32000
*** (1) TRANSACTION:
TRANSACTION 1563, ACTIVE 34 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 21, OS thread handle 123145523712000, query id 86 localhost root update
insert into tb values(1)

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2 page no 4 n bits 72 index PRIMARY of table `test`.`tb` trx id 1563 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2 page no 4 n bits 72 index PRIMARY of table `test`.`tb` trx id 1563 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) TRANSACTION:
TRANSACTION 1564, ACTIVE 21 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 29, OS thread handle 123145523408896, query id 87 localhost root update
insert into tb values(2)

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2 page no 4 n bits 72 index PRIMARY of table `test`.`tb` trx id 1564 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2 page no 4 n bits 72 index PRIMARY of table `test`.`tb` trx id 1564 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)

从死锁信息中可以看出,两个事务持有的锁是锁在相同的一块区域上:

0: len 8; hex 73757072656d756d; asc supremum;;

死锁分析

上面已经指出了不同的SQL语句会加什么锁,更加详细的内容可以在这里看:15.7.3 Locks Set by Different SQL Statements in InnoDB

我们只看本次场景涉及到的DELETE和INSERT语句。

DELETE语句的锁操作

DELETE FROM ... WHERE ... sets an exclusive next-key lock on every record the search encounters. 
However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.

DELETE FROM ... WHERE ...会在所有满足条件的记录上都设置一个排他的next-key锁,即锁住之前的Gap和待删除的记录。
如果删除的数据比当前最大数据max还大,就会锁住(max, +∞)这个GAP;如果删除的数据比当前最小数据min还小,则会锁住(-∞, min)这个GAP。

不过要注意:如果待搜索字段是用唯一键搜索的,那么只会加一个Index Record Lock。

至于为何会出现+∞和-∞两个概念,请看这里22.2.1.3 The Infimum and Supremum Records

INSERT语句的锁操作

插入语句由于引入意向锁(Insert Intetion Lock),会稍微复杂一些。

INSERT sets an exclusive lock on the inserted row. This lock is an index-record lock, not a next-key lock (that is, there is no gap lock) and does not prevent other sessions from inserting into the gap before the inserted row.

INSERT 会在待插入记录上加排他锁,它是行锁而非GAP锁,并不会阻塞其他会话在该行之前的GAP插入数据。

Prior to inserting the row, a type of gap lock called an insert intention gap lock is set. This lock signals the intent to insert in such a way that multiple transactions inserting into the same index gap need not wait for each other if they are not inserting at the same position within the gap. 

但在插入之前,一个名为Insert Intention Gap Lock的GAP锁会加在待插入区域,本次场景中即为(5,+∞)区域。

GAP锁的作用

GAP锁作用是,防止其他事务在这个gap内的插入,但不排斥其他事务在同一个gap内加GAP锁,因此Gap X锁和Gap S锁效果相同。

解决

直接解决方法:将事务隔离级别由REPEATABLE-READ改为READ-COMMITTED。

因为在RR隔离级别中,读会加排他锁,而RC隔离级别中读只会加共享锁,这样可以避免出现本次场景中的互锁现象。