BDB deadlocks

Tagged:  


8000009f dd= 0 locks held 2    write locks 2    pid/thread 65927/0
8000009f WRITE         1 WAIT    history.db                page          1
8000009f WRITE         1 HELD    history.db                page          3
8000009f WRITE         1 HELD    data.db                   page          7
800000a4 dd= 0 locks held 2    write locks 1    pid/thread 65927/0
800000a4 WRITE         1 WAIT    history.db                page          3
800000a4 READ          1 HELD    history.db                page          1
800000a4 WRITE         1 HELD    data.db                   page          9

Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
8000009f WRITE         1 HELD    data.db                   page          7
800000a4 READ          1 HELD    history.db                page          1
8000009f WRITE         1 WAIT    history.db                page          1
       1 READ          1 HELD    data.db                   handle        0
       3 READ          1 HELD    history.db                handle        0
8000009f WRITE         1 HELD    history.db                page          3
800000a4 WRITE         1 WAIT    history.db                page          3

800000a4 WRITE         1 HELD    data.db                   page          9


That's a very fun dump, first, since my code does not grab read lock at all, I use read-modify-write flags, which should end up with write lock for the read operation. Second, because to read and then update the entry I have to grab two locks. For the same entry: on page 1 and 3. And two threads get them in diffrent order.

Code in question is rather trivial:
   memset(&key, 0, sizeof(DBT));
  memset(&data, 0, sizeof(DBT));

    key.data = cmd->id;
     key.size = DNET_ID_SIZE;

      data.size = 0;
     data.flags = DB_DBT_USERMEM;

        err = e->db->get(e->db, txn, &key, &data, DB_RMW);

        offset = data.size;


   memset(&key, 0, sizeof(DBT));
  memset(&data, 0, sizeof(DBT));

    key.data = cmd->id;
     key.size = DNET_ID_SIZE;

      data.data = io;
    data.doff = offset;
        data.ulen = sizeof(struct dnet_io_attr);
   data.dlen = sizeof(struct dnet_io_attr);
   data.size = sizeof(struct dnet_io_attr);
   data.flags = DB_DBT_PARTIAL | DB_DBT_USERMEM;

 err = e->db->put(e->db, txn, &key, &data, 0);

There is no other code ever started.
My explaination (and it somehow correlate with the above read locks, which were never taken) is related to the read-modify-write flag and likely can be observed with pure reading also. Page one above likely contains index or some other metadata needed to be checked when reading, so we lock it read-only. But data entry itself is locked read-write (according to read-modify-write flag) on the page 3. Another thread already checked the index and now wants to put some new entry into the btree (also on the page 3), and thus has to write-lock the index, which will deadlock: thread 1 has readlock A and waits to writelock B, thread 2 has writelock B and waits to writelock A.

I can not say if BDB 4.7.25 really has this logic inside, but commenting out RMW reading fixes the problem.
Thinking...