50633: kv: unset CanForwardReadTimestamp flag on batches that spans ranges r=nvanbenschoten a=nvanbenschoten Fixes #50202. In #50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions: 1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU) 2. this batch of locking reads spans multiple ranges 3. this batch of locking reads is issued in parallel by the DistSender 4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error 5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer. It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: #50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index. This PR fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability. It then adds support for locking scans to kvnemesis, which would have caught to bug fairly easily. Finally, it fixes a KV API UX issue around locking scans and retry errors. Before this change, it was possible for a non-transactional locking scan (which itself doesn't make much sense) to hit a WriteTooOld retry error. This was caused by eager propagation of WriteTooOld errors from MVCC when FailOnMoreRecent was enabled for an MVCCScan. I'd appreciate if @itsbilal could give that last commit a review. Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability. Co-authored-by: Nathan VanBenschoten <[email protected]>
Showing with 242 additions and 28 deletions
This diff was suppressed by a .gitattributes entry.