This project is mirrored from https://github.com/cockroachdb/cockroach. Pull mirroring updated .
  1. 30 Jun, 2020 22 commits
    • craig[bot]'s avatar
      Merge #50768 #50831 #50839 · 7fa77eb4
      craig[bot] authored
      50768: colexec: use Get methods on native type slices r=jordanlewis a=jordanlewis
      
      Also, add //gcassert:inline assertions on all Get methods but the
      DatumVec ones and the gcassert linter, to prove that all of these Get
      methods are inlined.
      
      Here are some benchmarks. I didn't run them all, because I believe that
      the guaranteed inlining proven by gcassert implies that any benchmark
      differences must be due to "binary layout" or other ghosts.
      
      ```
      Distinct/Ordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=2-24     468µs ± 0%     462µs ± 0%  -1.26%  (p=0.000 n=10+10)
      
      name                                                                                      old speed      new speed      delta
      Distinct/Ordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=2-24  2.24GB/s ± 0%  2.27GB/s ± 0%  +1.28%  (p=0.000 n=10+10)
      
      CastOp/useSel=true/hasNulls=true/int_to_float-24      5.18µs ± 2%    4.60µs ± 1%  -11.21%  (p=0.000 n=9+9)
      CastOp/useSel=true/hasNulls=false/int_to_float-24     1.06µs ± 7%    1.03µs ± 5%     ~     (p=0.101 n=10+10)
      CastOp/useSel=false/hasNulls=true/int_to_float-24     5.33µs ± 1%    5.47µs ± 2%   +2.53%  (p=0.000 n=9+10)
      CastOp/useSel=false/hasNulls=false/int_to_float-24    1.08µs ± 0%    1.08µs ± 0%     ~     (p=0.136 n=10+10)
      
      name                                                old speed      new speed      delta
      CastOp/useSel=true/hasNulls=true/int_to_float-24    1.58GB/s ± 2%  1.78GB/s ± 1%  +12.62%  (p=0.000 n=9+9)
      CastOp/useSel=true/hasNulls=false/int_to_float-24   7.73GB/s ± 7%  7.93GB/s ± 5%     ~     (p=0.089 n=10+10)
      CastOp/useSel=false/hasNulls=true/int_to_float-24   1.54GB/s ± 1%  1.50GB/s ± 2%   -2.46%  (p=0.000 n=9+10)
      CastOp/useSel=false/hasNulls=false/int_to_float-24  7.60GB/s ± 0%  7.59GB/s ± 0%     ~     (p=0.143 n=10+10)
      ```
      
      Release note: None
      
      50831: kv/kvserver: Remove buffering from SSTSnapshotStorageFile r=itsbilal a=itsbilal
      
      When investigating a durability violation around ingested sstables
      that disappeared after a power outage, we realized that
      SSTSnapshotStorageFile does not flush its buffer before doing
      a Sync(). When used with a pebble SSTWriter, we expect Sync()
      to ensure all bytes written to that file are synced to disk,
      which is not what was happening here for the last couple bytes
      written to the buffer.
      
      This change removes buffering from SSTSnapshotStorageFile
      as pebble's SST writer already does its own buffering. This
      makes Sync() do what it is expected to do.
      
      Release note (bug fix): Fix a bug where a badly timed power outage
      or system crash could result in an error upon process restart.
      
      50839: partitionccl,kvserver: skip some flaky tests r=rytaft a=rytaft
      
      This commit skips 4 tests as part of the test-infra-team flaky
      test cleanup:
      
      TestGossipHandlesReplacedNode
      TestStoreRangeMergeConcurrentRequests
      TestCheckConsistencyInconsistent
      TestRepartitioning
      
      Informs #50024
      Informs #50795
      Informs #50830
      Informs #49112
      
      Release note: None
      Co-authored-by: default avatarYahor Yuzefovich <[email protected]>
      Co-authored-by: default avatarJordan Lewis <[email protected]>
      Co-authored-by: default avatarBilal Akhtar <[email protected]>
      Co-authored-by: default avatarRebecca Taft <[email protected]>
      7fa77eb4
    • Bilal Akhtar's avatar
      kv/kvserver: Remove buffering from SSTSnapshotStorageFile · 26f8a572
      Bilal Akhtar authored
      When investigating a durability violation around ingested sstables
      that disappeared after a power outage, we realized that
      SSTSnapshotStorageFile does not flush its buffer before doing
      a Sync(). When used with a pebble SSTWriter, we expect Sync()
      to ensure all bytes written to that file are synced to disk,
      which is not what was happening here for the last couple bytes
      written to the buffer.
      
      This change removes buffering from SSTSnapshotStorageFile
      as pebble's SST writer already does its own buffering. This
      makes Sync() do what it is expected to do.
      
      Release note (bug fix): Fix a bug where a badly timed power outage
      or system crash could result in an error upon process restart.
      26f8a572
    • Rebecca Taft's avatar
      partitionccl,kvserver: skip some flaky tests · 8f478453
      Rebecca Taft authored
      This commit skips 4 tests as part of the test-infra-team flaky
      test cleanup:
      
      TestGossipHandlesReplacedNode
      TestStoreRangeMergeConcurrentRequests
      TestCheckConsistencyInconsistent
      TestRepartitioning
      
      Informs #50024
      Informs #50795
      Informs #50830
      Informs #49112
      
      Release note: None
      8f478453
    • Jordan Lewis's avatar
      colexec: use Get methods on native type slices · 49780f3f
      Jordan Lewis authored
      Also, add //gcassert:inline assertions on all Get methods but the
      DatumVec ones and the gcassert linter, to prove that all of these Get
      methods are inlined.
      
      Here are some benchmarks. I didn't run them all, because I believe that
      the guaranteed inlining proven by gcassert implies that any benchmark
      differences must be due to "binary layout" or other ghosts.
      
      ```
      Distinct/Ordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=2-24     468µs ± 0%     462µs ± 0%  -1.26%  (p=0.000 n=10+10)
      
      name                                                                                      old speed      new speed      delta
      Distinct/Ordered/hasNulls=false/newTupleProbability=0.010/rows=65536/cols=2/ordCols=2-24  2.24GB/s ± 0%  2.27GB/s ± 0%  +1.28%  (p=0.000 n=10+10)
      
      CastOp/useSel=true/hasNulls=true/int_to_float-24      5.18µs ± 2%    4.60µs ± 1%  -11.21%  (p=0.000 n=9+9)
      CastOp/useSel=true/hasNulls=false/int_to_float-24     1.06µs ± 7%    1.03µs ± 5%     ~     (p=0.101 n=10+10)
      CastOp/useSel=false/hasNulls=true/int_to_float-24     5.33µs ± 1%    5.47µs ± 2%   +2.53%  (p=0.000 n=9+10)
      CastOp/useSel=false/hasNulls=false/int_to_float-24    1.08µs ± 0%    1.08µs ± 0%     ~     (p=0.136 n=10+10)
      
      name                                                old speed      new speed      delta
      CastOp/useSel=true/hasNulls=true/int_to_float-24    1.58GB/s ± 2%  1.78GB/s ± 1%  +12.62%  (p=0.000 n=9+9)
      CastOp/useSel=true/hasNulls=false/int_to_float-24   7.73GB/s ± 7%  7.93GB/s ± 5%     ~     (p=0.089 n=10+10)
      CastOp/useSel=false/hasNulls=true/int_to_float-24   1.54GB/s ± 1%  1.50GB/s ± 2%   -2.46%  (p=0.000 n=9+10)
      CastOp/useSel=false/hasNulls=false/int_to_float-24  7.60GB/s ± 0%  7.59GB/s ± 0%     ~     (p=0.143 n=10+10)
      ```
      
      Release note: None
      49780f3f
    • craig[bot]'s avatar
      Merge #50650 #50770 #50815 #50832 · a6a58f4b
      craig[bot] authored
      50650: colexec: do not generate some dead code r=jordanlewis a=yuzefovich
      
      6 comparison operators that are handled by `proj_const_ops_tmpl.go` are
      normalized so that the constant is always on the right side. This
      renders the code for these comparison operators useless for the case
      when the constant is on the left, so we omit generating it.
      
      Release note: None
      
      50770: roachtest: update version map and create fixtures r=jlinder a=asubiotto
      
      This commit adds the recently released 19.1.10, 19.2.8, and 20.1.3 to the
      version map in PredecessorVersion.
      
      Release note: None (testing change)
      
      50815: opt: fix ResolvedType() for aggregateInfo in the optbuilder r=rytaft a=rytaft
      
      Prior to this commit, it was possible that calling `ResolvedType()`
      on an `aggregateInfo` object returned the wrong type. This was because
      `aggregateInfo` did not implement `ResolvedType()`, and was therefore
      passing the call to the embedded `tree.FuncExpr`, which may have been
      stripped of its original type information. This commit fixes the
      problem by adding an implementation of `ResolvedType()` to `aggregateInfo`,
      which simply returns the type of the aggregation column represented by
      the struct.
      
      Fixes #46914
      
      Release note (bug fix): Fixed an internal error that could happen
      during planning for some queries with aggregate functions embedded in
      complex scalar expressions.
      
      50832: Add [email protected] to AUTHORS r=jreut a=jreut
      
      Release note: None
      Co-authored-by: default avatarYahor Yuzefovich <[email protected]>
      Co-authored-by: default avatarAlfonso Subiotto Marques <[email protected]>
      Co-authored-by: default avatarRebecca Taft <[email protected]>
      Co-authored-by: default avatarjordan ryan reuter <[email protected]>
      a6a58f4b
    • jordan ryan reuter's avatar
      Add [email protected] to AUTHORS · cd1f021e
      jordan ryan reuter authored
      Release note: None
      cd1f021e
    • Rebecca Taft's avatar
      opt: fix ResolvedType() for aggregateInfo in the optbuilder · bcc89ba9
      Rebecca Taft authored
      Prior to this commit, it was possible that calling ResolvedType()
      on an aggregateInfo object returned the wrong type. This was because
      aggregateInfo did not implement ResolvedType(), and was therefore
      passing the call to the embedded tree.FuncExpr, which may have been
      stripped of its original type information. This commit fixes the
      problem by adding an implementation of ResolvedType() to aggregateInfo,
      which simply returns the type of the aggregation column represented by
      the struct.
      
      Fixes #46914
      
      Release note (bug fix): Fixed an internal error that could happen
      during planning for some queries with aggregate functions embedded in
      complex scalar expressions.
      bcc89ba9
    • craig[bot]'s avatar
      Merge #50678 · 733b32eb
      craig[bot] authored
      50678: sql: remove Impure flag r=RaduBerinde a=RaduBerinde
      
      This set of commits changes all uses of Impure to use volatilities instead, and removes the flag altogether.
      
      #### sql: use volatility when sanitizing expressions
      
      As part of work toward removing the coarse (and unreliable) `Impure` flag, this
      change modifies the expression sanitizers to take a `maxVolatility` argument
      instead of an `allowImpure` boolean. Currently we only support either Stable or
      Volatile as max-volatility. There are some TODO cases where we may want to
      support Immutable (but note that we would have to check operators and casts as
      well).
      
      Release note: None
      
      #### sql: use volatility in normalization / IsConst
      
      Release note: None
      
      #### sqlsmith: use volatility in sqlsmith check
      
      Update the impure check in sqlsmith to check the Volatility. We want only
      immutable functions so the results of any given query are consistent across
      servers.
      
      Release note: None
      
      #### sql: remove Impure flag
      
      Release note: None
      Co-authored-by: default avatarRadu Berinde <[email protected]>
      733b32eb
    • Yahor Yuzefovich's avatar
      colexec: do not generate some dead code · 0eedb3e1
      Yahor Yuzefovich authored
      6 comparison operators that are handled by `proj_const_ops_tmpl.go` are
      normalized so that the constant is always on the right side. This
      renders the code for these comparison operators useless for the case
      when the constant is on the left, so we omit generating it.
      
      Release note: None
      0eedb3e1
    • craig[bot]'s avatar
      Merge #50654 #50702 #50794 · e4b3bb90
      craig[bot] authored
      50654: geomfn: apply bounding box checks for DWithin/DFullyWithin r=sumeerbhola a=otan
      
      We are able to optimize DWithin/DFullyWithin by extended the bounding
      box for geometry types by distance units in each direction and
      checking if they intersect / covers.
      
      Also implement cartesian bounding box covers and use it for other
      relevant operations.
      
      Release note: None
      
      
      
      50702: localmetrics: add local Grafana timeseries tooling r=petermattis a=tbg
      
      
      
      Add a docker-compose setup that starts a local Grafana backed by a local
      Postgres along with a helper that can import timeseries data into the
      Postgres instance which the Grafana instance is configured to display.
      
      Consult scripts/localmetrics/README.md for a quickstart.
      
      This isn't a valuable debug tool just yet, but with a bit of elbow
      grease, I believe that it will become an invaluable tool to avoid
      the many back-and-forth round-trips we have these days with customers
      to exchange screenshots of the Admin UI.
      
      To make it truly useful, we need
      
      1. [timeseries in debug.zip](#50432)
      2. auto-generate dashboards from `./pkg/ts/catalog`.
      
      Both are totally doable, and even without 2) there's already some
      utility as it's easy to make ad-hoc panels in Grafana thanks to the
      built-in query builder.
      
      Finally, here's a screenshot of the one panel included here right now,
      the rate of DistSender batches, taken from `sample.csv`.
      
      
      ![image](https://user-images.githubusercontent.com/5076964/85878083-78756580-b7d8-11ea-88b4-fb515ef26186.png)
      
      
      Release note: None
      
      50794: sql: disallow creation of schemas with the "pg_" prefix r=arulajmani a=rohany
      
      This is disallowed by Postgres, and will be useful when integrating user
      defined schema resolution with the temporary schema (which isn't backed
      by a descriptor).
      
      Release note: None
      Co-authored-by: default avatarOliver Tan <[email protected]>
      Co-authored-by: default avatarTobias Schottdorf <[email protected]>
      Co-authored-by: default avatarRohan Yadav <[email protected]>
      e4b3bb90
    • Tobias Schottdorf's avatar
      localmetrics: add local Grafana timeseries tooling · 296fb4d9
      Tobias Schottdorf authored
      Add a docker-compose setup that starts a local Grafana backed by a local
      Postgres along with a helper that can import timeseries data into the
      Postgres instance which the Grafana instance is configured to display.
      
      Consult scripts/localmetrics/README.md for a quickstart.
      
      This isn't a valuable debug tool just yet, but with a bit of elbow
      grease, I believe that it will become an invaluable tool to avoid
      the many back-and-forth round-trips we have these days with customers
      to exchange screenshots of the Admin UI.
      
      To make it truly useful, we need
      
      1. [timeseries in debug.zip](https://github.com/cockroachdb/cockroach/pull/50432)
      2. auto-generate dashboards from `./pkg/ts/catalog`.
      
      Both are totally doable, and even without 2) there's already some
      utility as it's easy to make ad-hoc panels in Grafana thanks to the
      built-in query builder.
      
      Release note: None
      296fb4d9
    • Oliver Tan's avatar
      geomfn: apply bounding box checks for DWithin/DFullyWithin · da6cd48b
      Oliver Tan authored
      We are able to optimize DWithin/DFullyWithin by extended the bounding
      box for geometry types by distance units in each direction and
      checking if they intersect / covers.
      
      Also implement cartesian bounding box covers and use it for other
      relevant operations.
      
      Release note: None
      da6cd48b
    • craig[bot]'s avatar
      Merge #49997 #50802 · a9ff82f3
      craig[bot] authored
      49997: kvclient: merge RangeDescriptorCache and LeaseholderCache r=andreimatei a=andreimatei
      
      Each patch works towards integrating leaseholder infomation in the RangeDescriptorCache, culminating with the last commit which deletes the LeaseholderCache.
      
      The motivation for this is that having these two separate datastructures doesn't make sense: all the users want both descriptor and leaseholder information, and having incoherence between the two caches (i.e. a leaseholder that's not part of the cached descriptor, or a leaseholder with a missing descriptor) is very awkward.
      
      I've got patches coming that update these caches more, and having a single cache to deal with makes things much easier.
      
      50802: Makefile: fix vendor modvendor r=petermattis a=otan
      
      Since modvendor requires installation during vendor rebuilding, but a
      dependency may have changed, `go mod` complains about inconsistent
      versioning. To fix this, use -mod=mod when installing modvendor.
      
      Release note: None
      Co-authored-by: default avatarAndrei Matei <[email protected]>
      Co-authored-by: default avatarOliver Tan <[email protected]>
      a9ff82f3
    • craig[bot]'s avatar
      Merge #50759 · 955ccb95
      craig[bot] authored
      50759: backupccl: fix bug when restoring cluster with empty db as highest id r=pbardea a=pbardea
      
      Previously, there was a bug where cluster restore would fail when the
      largest descriptor in the backup was a database. This commit fixes the
      bug.
      
      Fixes #50561.
      
      Release note (bug fix): Previously, there was a bug where cluster
      restore would fail when the largest descriptor in the backup was a
      database. This is typically seen when the last action in backing up
      cluster was a database creation.
      Co-authored-by: default avatarPaul Bardea <[email protected]>
      955ccb95
    • Oliver Tan's avatar
      Makefile: fix vendor modvendor · 1895f708
      Oliver Tan authored
      Since modvendor requires installation during vendor rebuilding, but a
      dependency may have changed, `go mod` complains about inconsistent
      versioning. To fix this, use -mod=mod when installing modvendor.
      
      Release note: None
      1895f708
    • craig[bot]'s avatar
      Merge #50765 · 0b65365f
      craig[bot] authored
      50765: geo: fix BoundingBox for Geography types r=sumeerbhola a=otan
      
      Our previous assumption that bounding boxes for geography types was
      wrong -- we assumed that we can just compare min / max coordinates, but
      for spherical geometries bounding boxes can traverse latitude/longitude
      bounds and these min / max checks do not hold.
      
      As such, we need to replace these bounding boxes with the s2.Rect
      representation, which has complex logic that accounts for this
      behaviour. This means bounding boxes are different between GEOMETRY and
      GEOGRAPHY types.
      
      The following changes have been made:
      * Rename BoundingBox vars to Lo/Hi instead of Min/Max, to be more true
        to the name.
      * Move BoundingBox comparison logic to the geo package from the geopb
        package, using CartesianBoundingBox as the name for GEOMETRY types.
      * Change logic that converts between GEOMETRY and GEOGRAPHY types to
        change cached bounding boxes.
      * Add SpatialObjectType to the SpatialObject protobuf. This allows us to
        differentiate geo vs geom types (PostGIS does this too). We in theory
        don't need this, but I think it stops programmer error so I'd rather
        keep it around.
      * Added regression tests for this observed behaviour in the geogfn
        package.
      
      Release note: None
      Co-authored-by: default avatarOliver Tan <[email protected]>
      0b65365f
    • Oliver Tan's avatar
      geo: fix BoundingBox for Geography types · 86ad9e76
      Oliver Tan authored
      Our previous assumption that bounding boxes for geography types was
      wrong -- we assumed that we can just compare min / max coordinates, but
      for spherical geometries bounding boxes can traverse latitude/longitude
      bounds and these min / max checks do not hold.
      
      As such, we need to replace these bounding boxes with the s2.Rect
      representation, which has complex logic that accounts for this
      behaviour. This means bounding boxes are different between GEOMETRY and
      GEOGRAPHY types.
      
      The following changes have been made:
      * Rename BoundingBox vars to Lo/Hi instead of Min/Max, to be more true
        to the name.
      * Move BoundingBox comparison logic to the geo package from the geopb
        package, using CartesianBoundingBox as the name for GEOMETRY types.
      * Change logic that converts between GEOMETRY and GEOGRAPHY types to
        change cached bounding boxes.
      * Add SpatialObjectType to the SpatialObject protobuf. This allows us to
        differentiate geo vs geom types (PostGIS does this too). We in theory
        don't need this, but I think it stops programmer error so I'd rather
        keep it around.
      * Added regression tests for this observed behaviour in the geogfn
        package.
      
      Release note: None
      86ad9e76
    • craig[bot]'s avatar
      Merge #50754 · 9cecf3b3
      craig[bot] authored
      50754: opt: add indexed var for inverted column when execbuilding inverted joins r=rytaft a=rytaft
      
      This commit fixes an issue identified in #50709 in which the `execbuilder`
      code for building inverted joins was failing due to the lack of an indexed
      var for the column indexed by the inverted index. The indexed var is
      necessary since the inverted expression contains a reference to the inverted
      column. This commit fixes the issue by including the column in the indexed var
      helper.
      
      This commit does not include a release note since this issue was introduced
      a few days ago and has not been released.
      
      Release note: None
      Co-authored-by: default avatarRebecca Taft <[email protected]>
      9cecf3b3
    • Andrei Matei's avatar
      kvclient: switch DistSender to use the unified cache · f2ec747c
      Andrei Matei authored
      This patch finishes the transition to the combined range descriptor /
      leaseholder cache. The DistSender now queries and updates the range
      cache for leases. This patch deletes the LeaseholderCache.
      
      The leaseholder cache never made sense as a separate cache from the
      range descriptor cache, since the two need to be coherent. Having a
      leaseholder that's not part of the range descriptor cache is an awkward
      situation for the code to handle. Remnants of this awkwardness remain
      even with this patch (when a replica returns a NotLeaseHolderError
      pointing to a replica not in the cached descriptor), but at least we
      have a common data structure to address the issue in following patches.
      Information on descriptor and the leaseholder are also always needed in
      tandem, so the separate datastructures were unnatural for all the
      consumers.
      
      Release note: None
      f2ec747c
    • craig[bot]'s avatar
      Merge #50633 · a55e6339
      craig[bot] authored
      50633: kv: unset CanForwardReadTimestamp flag on batches that spans ranges r=nvanbenschoten a=nvanbenschoten
      
      Fixes #50202.
      
      In #50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions:
      1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU)
      2. this batch of locking reads spans multiple ranges
      3. this batch of locking reads is issued in parallel by the DistSender
      4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error
      5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer.
      
      It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: #50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index.
      
      This PR fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability.
      
      It then adds support for locking scans to kvnemesis, which would have caught to bug fairly easily.
      
      Finally, it fixes a KV API UX issue around locking scans and retry errors. Before this change, it was possible for a non-transactional locking scan (which itself doesn't make much sense) to hit a WriteTooOld  retry error. This was caused by eager propagation of WriteTooOld errors from MVCC when FailOnMoreRecent was enabled for an MVCCScan. I'd appreciate if @itsbilal could give that last commit a review.
      
      Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability.
      Co-authored-by: default avatarNathan VanBenschoten <[email protected]>
      a55e6339
    • Rohan Yadav's avatar
      sql: disallow creation of schemas with the "pg_" prefix · 7231d573
      Rohan Yadav authored
      This is disallowed by Postgres, and will be useful when integrating user
      defined schema resolution with the temporary schema (which isn't backed
      by a descriptor).
      
      Release note: None
      7231d573
    • craig[bot]'s avatar
      Merge #50755 #50782 · 61f2787d
      craig[bot] authored
      50755: storage,ccl: Plumbs session user information to the ExternalStorage r=miretskiy a=adityamaru
      
      The new UserFileTableSytem (#50493) needs knowledge of the user
      interacting with it to check access privileges on the underlying user
      scoped tables.  These interactions are most notably via IMPORT, BACKUP
      and CHANGEFEED jobs and processors.
      
      This change plumbs the session user string from these various entities
      via the factory methods used to create the ExternalStorage objects.  It
      also passes an internal executor and kvdb from server.Start(), which
      are required by the FileTableSystem to execute its user scoped SQL
      queries.
      
      This PR will be followed up by the implementation of the FileTable
      ExternalStorage, which will supply the user string, internal executor
      and kvDB to the UserFileTableSystem.
      
      Informs: #47211
      
      Release note: None
      
      50782: geo: minor fix ups to ST_Summary r=sumeerbhola a=otan
      
      Minor fixes to ST_Summary to make it more PostGIS like:
      * Correct ordering of the symbols.
      * Fix bounding boxes not being reported in sub structures.
      
      Release note: None
      Co-authored-by: default avatarAditya Maru <[email protected]>
      Co-authored-by: default avatarOliver Tan <[email protected]>
      61f2787d
  2. 29 Jun, 2020 18 commits
    • craig[bot]'s avatar
      Merge #50657 #50672 · f1d6bc33
      craig[bot] authored
      50657: kvserver: Replace multiTestContext with TestCluster in client_replica_gc_test r=lunevalex a=lunevalex
      
      Makes progress on #8299
      
      multiTestContext is legacy construct that is deprecated in favor of running
      tests via TestCluster. This is one PR out of many to remove the usage of
      multiTestContext in the client_replica_gc test cases.
      
      Release note: none
      
      50672: opt: add partial index implication logic for inexact atoms r=rytaft a=mgartner
      
      ### opt: rename partialidx/testdata/predicate/exact-atom to atom
      
      This commit renames the `exact-atom` test file. Future commits will be
      adding tests to this file that are not exact matches to atoms, which
      would make the current file name confusing.
      
      Release note: None
      
      #### opt: add partial index implication logic for inexact atoms
      
      This commit adds support for proving that a query filter implies a
      partial index predicate when it contains expressions that do not exactly
      match expressions in the predicate.
      
      For example, the filter `a > 10` implies the predicate `a > 0`, because
      all values greater than 10 are also greater than 0.
      
      An atom A (an expression that is not a conjunction or disjunction)
      implies another atom B when B contains A. To determine if an atom
      contains another, we rely on the `constraints` package, which robustly
      handles many types of operators (inequalities, `IN` expressions, etc.)
      and operands (integers, strings, tuples, etc.).
      
      Fixes #50218
      
      Release note: None
      Co-authored-by: default avatarAlex Lunev <[email protected]>
      Co-authored-by: default avatarMarcus Gartner <[email protected]>
      f1d6bc33
    • Radu Berinde's avatar
      sql: remove Impure flag · a8434379
      Radu Berinde authored
      Release note: None
      a8434379
    • Radu Berinde's avatar
      sqlsmith: use volatility in sqlsmith check · 4ee72a16
      Radu Berinde authored
      Update the impure check in sqlsmith to check the Volatility. We want only
      immutable functions so the results of any given query are consistent across
      servers.
      
      Release note: None
      4ee72a16
    • Radu Berinde's avatar
      sql: use volatility in normalization / IsConst · 5725ea04
      Radu Berinde authored
      Release note: None
      5725ea04
    • Radu Berinde's avatar
      sql: use volatility when sanitizing expressions · 66094721
      Radu Berinde authored
      As part of work toward removing the coarse (and unreliable) `Impure` flag, this
      change modifies the expression sanitizers to take a `maxVolatility` argument
      instead of an `allowImpure` boolean. Note that currently we only check
      functions, we will need to check operators and casts for the cases where we
      require Immutable expressions (this is left as a TODO).
      
      As part of this commit, we also correct the volatility of `concat`, `concat_ws`,
      `convert_to`, `convert_from`. This is easy to verify, as the implementations
      don't use the eval context. Note that the `concat` in Postgres can convert to
      string any argument (which is the non-immutable part), but our function only
      takes strings (which explains the difference).
      
      Release note: None
      66094721
    • craig[bot]'s avatar
      Merge #50708 · ba4fb1b2
      craig[bot] authored
      50708: geogfn: implement ST_Azimuth and correct angle normalization r=sumeerbhola a=otan
      
      * Implement ST_Azimuth
      * Correct angle normalization of lat/lng - these were previously
        incorrect as they clipped the wrong ends.
      
      Release note (sql change): Implement the ST_Azimuth functionality for
      the geography operator.
      Co-authored-by: default avatarOliver Tan <[email protected]>
      ba4fb1b2
    • Nathan VanBenschoten's avatar
      storage: return largest timestamp from Scan when FailOnMoreRecent · 3c656678
      Nathan VanBenschoten authored
      Don't just return the first timestamp we see. This ensures that the
      single retry in `Replica.executeReadOnlyBatchWithServersideRefreshes`
      is enough to ensure that non-transactional ScanForUpdate operations
      to not hit WriteTooOld retry errors.
      3c656678
    • Nathan VanBenschoten's avatar
      kv/kvnemesis: add support for locking scans · c78e0982
      Nathan VanBenschoten authored
      Without the call to unsetCanForwardReadTimestampFlag added to
      DistSender.divideAndSendBatchToRanges in the previous commit,
      `TestKVNemesisSingleNode` fails under stress in about 35 iterations.
      With the call, I haven't seen this fail in over 10,000 iterations.
      c78e0982
    • Nathan VanBenschoten's avatar
      kv/kvclient: unset CanForwardReadTimestamp flag on batch that spans ranges · 77dfe9e1
      Nathan VanBenschoten authored
      Fixes #50202.
      
      In #50202, we saw that we would accidentally allow batches to be split
      on range boundaries and continue to carry the `CanForwardReadTimestamp`
      flag. This can lead to serializability violations under very specific
      conditions:
      1. the first operation that the transaction performs is a batch of
         locking read requests (due to implicit or explicit SFU)
      2. this batch of locking reads spans multiple ranges
      3. this batch of locking reads is issued in parallel by the DistSender
      4. this locking read hits contention and is bumped on at least one of
         the ranges due to a WriteTooOld error
      5. an unreplicated lock from one of the non-refreshed sub-batches is
         lost during a lease transfer.
      
      It turns out that the `kv/contention` roachtest meets these requirements
      perfectly when implicit SFU support is added to UPSERT statements: #50180.
      It creates a tremendous amount of contention and issues a batch
      of locking ScanRequests during a LookupJoin as its first operation. This
      materializes as ConditionFailedErrors (which should be impossible) in
      the CPuts that the UPSERT issues to maintain the table's secondary
      index.
      
      This commit fixes this bug by ensuring that if a batch is going to be
      split across ranges and any of its requests would need to refresh on
      read timestamp bumps, it does not have its CanForwardReadTimestamp flag
      set. It would be incorrect to allow part of a batch to perform a
      server-side refresh if another part of the batch might have returned a
      different result at the higher timestamp, which is a fancy way of saying
      that it needs to refresh because it is using optimistic locking. Such
      behavior could cause a transaction to observe an inconsistent snapshot
      and violate serializability.
      
      Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR
      UPDATE statement containing an IN clause could fail to observe a
      consistent snapshot and violate serializability.
      77dfe9e1
    • craig[bot]'s avatar
      Merge #50710 · 08f1c789
      craig[bot] authored
      50710: opt: use normalized expression for lazy props r=RaduBerinde a=RaduBerinde
      
      Lazy properties like interesting orderings are calculated once per
      group. We currently use whatever expression in that group we looked at
      first, which makes things hard to reason about. Switch to always using
      the normalized expression.
      
      Also make the opttester use the normalized expression when filling in
      lazy properties (in non-test code, all of them are only calculated
      on the normalized expression).
      
      Release note: None
      Co-authored-by: default avatarRadu Berinde <[email protected]>
      08f1c789
    • Andrei Matei's avatar
      kvclient: modernize a test · fd5f9b84
      Andrei Matei authored
      This patch introduces subtests to a test and also moves most of the
      subtests away from using leases with sequence number 0. Such leases
      don't generally exist, and 0 is about to get a special meaning in the
      RangeCache, which will mess up the test.
      
      Release note: None
      fd5f9b84
    • Andrei Matei's avatar
      sql: switch DistSQL to use the combined cache · 66b3ae30
      Andrei Matei authored
      Release note: None
      66b3ae30
    • Andrei Matei's avatar
      roachpb: return lease info and more ranges on RangeMismatchError · 213de8d0
      Andrei Matei authored
      Before this patch, on range mismatch (request key span outside of the
      intended range id) the server would return descriptor info about the
      mismatched range. This patch adds leaseholder info, so that the
      information can be used more directly to update the kvclient's caches.
      And also because the lease info can come in very handy, particularly for
      the "suggested range" part of the response.
      The old fields are kept around for now for backwards compatibility.
      
      This patch also makes the error carry information about possibly more
      ranges, not just one or two as before. The error used to have info just
      on the range with the requested range id and optionally on the range
      covering the start of the requested key span; now the error carries info
      on other ranges that might exist in between.
      
      Release note: None
      213de8d0
    • Andrei Matei's avatar
      sql/physicalplan: make the spanResolver use the unified cache · 0a464d98
      Andrei Matei authored
      This patch decouples the binPackingOracle from the leaseholder cache,
      and also removes its dependency on gossip. Lease information is now
      gotten from the range descriptor cache by the span resolver and passed
      into the oracle.
      
      Release note: None
      0a464d98
    • Andrei Matei's avatar
      kvclient: store lease info in the RangeDescriptorCache · 06cd1cba
      Andrei Matei authored
      This commit represents preliminary work for combining the
      RangeDescriptorCache and the LeaseHolderCache. The patch expands the
      former to also maintain and expose lease info. This info is not
      populated yet. The cache becomes more sophisticated in how it deals with
      updates - for example one can update an entry's lease even through an
      EvictionToken referencing a slightly older descriptor (and this
      referencing a cache entry that doesn't exist any more).
      
      The DistSender and RangeIter interfaces are also expanded to expose
      lease info.
      
      Release note: None
      06cd1cba
    • Andrei Matei's avatar
      kvclient: EvictionToken values · 2f9dd738
      Andrei Matei authored
      This patch gets rid of EvictionToken.DoOnce. That Once protected against
      concurrent Evict()ions of the token, which made little sense: different
      lookups were returning a shared token only when they were coalesced onto
      a single db lookup flight. Generally, further queries on the cache
      returned new tokens, which could be evicted separately from each other.
      It's true that only the lookups coalesced onto a flight shared a
      "nextToken" (and they still do), but that's no reason to synchronize
      their evictions any more than any other evictions.
      
      Getting rid of that Once makes the token copiable. The patch switches
      everybody to using tokens by value, which opens the door in the future
      to updating a token (the previous sharing meant it had to be immutable).
      
      Release note: None
      2f9dd738
    • Andrei Matei's avatar
      kvclient: improve testing util · 18746855
      Andrei Matei authored
      We had a test transport that was returning empty replica descriptors,
      instead of replicas from the descriptor that it was provided. That
      worked only by accident, and will stop working.
      
      Release note: None
      18746855
    • craig[bot]'s avatar
      Merge #50715 #50772 #50773 · e6b273fe
      craig[bot] authored
      50715: distsender: improve DistSender's request sub-division behavior r=andreimatei a=andreimatei
      
      The DistSender sometimes needs to split a request in response to new
      information about the state of the range that the request is being sent
      to (i.e. if it finds out that a range has split, it will split the
      request as a consequence). The code dealing with this wasn't very good;
      it had distinct handling of the cases where a new descriptor(s) needs to
      be read versus the case where the batch needs to be split. It was
      therefore possible (and, in fact, happening) that a new descriptor is
      read from the database; this descriptor is smaller than the descriptor
      it's replacing. The DistSender doesn't check that this descriptor is
      smaller, and simply routes the request to it. The request gets rejected
      with a RangeKeyMismatchError, as could have been predicted. In response
      to the RangeKeyMismatchError the DistSender would finally split the
      batch.
      
      This patch improves the code to split the request immediately when a
      smaller descriptor is read and avoid the needless roundtrip for a
      RangeKeyMismatchError.
      
      Release note: None
      
      50772: kv/kvserver: don't assert Replica state after lease transfer r=nvanbenschoten a=nvanbenschoten
      
      During a large IMPORT, I saw that **3.89%** of CPU on the hottest node in the cluster was being spent in `assertStateLocked`. `assertStateLocked` loads various pieces of information about the Replica from on-disk and compares it to in-memory state. The IMPORT creates a large amount of read-amplification (40-60x on this node), which makes these read operations extra expensive.
      
      <img width="1668" alt="Screen Shot 2020-06-27 at 3 53 01 PM" src="https://user-images.githubusercontent.com/5438456/86042458-2d946180-ba15-11ea-842b-305049ef41d8.png">
      
      The assertion is important for catching bugs, but I think we're calling it a little too often. Specifically, in this case, I think it's being called so often because we perform the assertion on lease transfers.
      
      <img width="990" alt="Screen Shot 2020-06-27 at 3 51 42 PM" src="https://user-images.githubusercontent.com/5438456/86042478-384ef680-ba15-11ea-9aeb-5171cd820ddc.png">
      
      Lease transfers have much less of a chance of causing bugs than operations like Splits and Merges, so this commit stops asserting after lease transfers. While here, it also gives the same treatment to bumps to the GCThreshold.
      
      50773: kv/kvserver: avoid unnecessary lock contention during application r=nvanbenschoten a=nvanbenschoten
      
      This PR contains a mix of cleanup and a bit of perf optimization:
      
      #### kv/kvserver: remove unused uint64Slice in raft_log_queue.go
      
      This was just unused but being kept alive by the `sort.Interface` type assertion assignment.
      
      #### kv/kvserver: remove stale TODO in replicaAppBatch.ApplyToStateMachine
      
      The TODOs for tbg were correct – the bootstrap store no longer has nil split or merge queues, so the nil checks are not needed.
      
      The other TODO was added in 63b1120f and addressed four days later in 2d25bf0f by throttling queue additions.
      
      #### kv/kvserver: remove handleNoRaftLogDeltaResult
      
      The `if cmd.replicatedResult().RaftLogDelta == 0 {` condition looked like a bug, but it was just a premature optimization to avoid potentially queueing on log truncation entries. We already throttle using `raftLogLastCheckSize`, so this doesn't seem worth the complexity.
      
      Meanwhile, this call was incurring a Replica mutex lock on every proposal (not even on every application batch, because this was in ApplySideEffects), which is expensive. This commit merges this into the existing critical section in `ApplyToStateMachine`, which removes complexity and should remove lock contention.
      
      #### kv/kvserver: accept *ReplicatedEvalResult in handleNonTrivialReplicatedEvalResult
      
      The commit switches `replicaStateMachine.handleNonTrivialReplicatedEvalResult` to accept a `*ReplicatedEvalResult` instead of a `ReplicatedEvalResult`. Passing by value is already not stopping us from modifying it because the struct contains interior pointers. In fact, the method intends to modify the argument, so the pointer is just obscuring that fact. This now mirrors `clearTrivialReplicatedEvalResultFields`.
      
      ReplicatedEvalResult is also 232 bytes large, so it generally shouldn't be passed around by value.
      Co-authored-by: default avatarAndrei Matei <[email protected]>
      Co-authored-by: default avatarNathan VanBenschoten <[email protected]>
      e6b273fe