Skip to content

Conversation

@prestodb-ci
Copy link
Collaborator

Test PR for branch staging-953c2cc8e-pr with head 953c2cc

zhouyuan and others added 30 commits October 17, 2025 07:44
Removed a sed command that replaces 'oap-project' with 'IBM' in the get-velox.sh script.
Update the get-velox.sh script to replace 'ibm' with 'ibm-xxx'.
…tor#15615)

Summary:
Pull Request resolved: facebookincubator#15615

Adding an API to allow developers to fluently create (untyped)
expression trees without having to rely on a SQL parser. Details and
examples provided in code comments and unit tests.

Reviewed By: mbasmanova

Differential Revision: D87828592

fbshipit-source-id: 4bdc10ffb6814852698141df3ddbc53a49f75ca6
Summary:
Refactors the Iceberg connector implementation to separate it from Hive connector. Sets the foundation for independent evolution of both connectors.

Pull Request resolved: facebookincubator#15581

Reviewed By: kgpai

Differential Revision: D87666371

Pulled By: kagamiori

fbshipit-source-id: 4b29860470d795ce5a266df6f0a387a2cb45eb80
…iler" (facebookincubator#15632)

Summary:
Pull Request resolved: facebookincubator#15632

Queries fails with

```
VeloxUserError:  Scalar function presto.default.array_sort not registered with arguments: (ARRAY<ROW<numerator_metric_values:MAP<INTEGER,DOUBLE>,denominator_metric_values:MAP<INTEGER,DOUBLE>>>, FUNCTION<ROW<numerator_metric_values:MAP<INTEGER,DOUBLE>,denominator_metric_values:MAP<INTEGER,DOUBLE>>, ROW<numerator_metric_values:MAP<INTEGER,DOUBLE>,denominator_metric_values:MAP<INTEGER,DOUBLE>>, INTEGER>). Found function registered with the following signatures:
((array(T)) -> array(T))
((array(T),constant function(T,U)) -> array(T))
((array(T),constant function(T,T,bigint)) -> array(T))
```

Query shape
```
  SELECT
    "transform_values"(
      "transform_values"(
        "multimap_agg"(
          metric_dimensions,
          ROW (
            metric_dimensions_bitmap, pre_metric_stats
          )
        ),
        (k, v) -> "array_sort"(
          "remove_nulls"(v),
          (u, w) -> IF(
            (
              "json_format"(
                CAST(u AS json)
              ) > "json_format"(
                CAST(w AS json)
              )
            ),
            -1,
            IF(
              (
                "json_format"(
                  CAST(u AS json)
                ) = "json_format"(
                  CAST(w AS json)
                )
              ),
              0,
              1
            )
          )
        )
      ),
      (k, v) -> "map_from_entries"(v)
    ) pre_metric_stats_map
  FROM
    pre_metric_stats
  WHERE
    (metric_dimensions IS NOT NULL)
```

Original commit changeset: 63f1556253f7

Original Phabricator Diff: D87495859

Reviewed By: kevintang2022

Differential Revision: D87887320

fbshipit-source-id: dc7d567fc293e04aee16b90d2d3292d9f657d0d9
…expressions (facebookincubator#15635)

Summary: Pull Request resolved: facebookincubator#15635

Reviewed By: pedroerp

Differential Revision: D87904120

fbshipit-source-id: 1de705ec43a060fee18cd0d286eca88a074f9007
…ator#15611)

Summary:
Pull Request resolved: facebookincubator#15611

Add plan builder API to enable users to use parsed expressions, in
addition to SQL snippets.

Reviewed By: mbasmanova

Differential Revision: D87718424

fbshipit-source-id: e0468b3d02acfb278a92db88a5c3b04802ed04f5
…ubator#15634)

Summary:
X-link: facebookincubator/axiom#662

Pull Request resolved: facebookincubator#15634

Adding support for alias() and .alias() in expression builder. It
requires a new API in IExpr because IExpr are always immutable.

Reviewed By: mbasmanova

Differential Revision: D87737347

fbshipit-source-id: c37160763b5260fe581ed656461b3cc0924ddb9e
Summary:
Pull Request resolved: facebookincubator#15628

- Request context in makeVeloxContinuePromiseContract.
- Do not require context for empty promises.

Helps debug "Broken promise" exceptions.

Reviewed By: xiaoxmeng

Differential Revision: D87861045

fbshipit-source-id: 7dd038b836ac9e9d1ef0fa86956a1a523a8c912b
Summary:
Pull Request resolved: facebookincubator#15638

Avoiding unnecessary string copy from StringView to std::string.

Reviewed By: amitkdutta

Differential Revision: D87908833

fbshipit-source-id: adc8b0eec8ae1e92c1883d34d9a7ac5a565cd185
…y.concurrentArbitration/parallel (facebookincubator#15637)

Summary:
Pull Request resolved: facebookincubator#15637

Test can flake when we're running with `arbitratorCapacity=16MB` but memory pool `queryCapacity =128MB` - basically no participant can ever be spilled since they're all too small to hit that threshold.

The issue is the head of line blocking in resumeGlobalArbitrationWaitersLocked(). Say we have waiters [id=8 requestBytes=8MB, id=10 requestBytes=1MB, ..., id=20...] , and arbitrator has freeNonReserved=7.98MB. The  resume funcs tries waiter 8, gets `allocatedBytes=0` from `allocateCapacityLocked()`, then just breaks out without even trying waiter 10. So even though we could satisfy the 1MB request, we don't - everything gets blocked by the first waiter.
These two issues combined create a livelock: can't satisfy first waiter, can't reclaim memory (no victims hit minReclaimBytes=128MB), can't satisfy other waiters due to HOL blocker. After `kMemoryPoolAbortCapacityLimit=2.5 mins` timeout it started aborting.
The first time it gets unstuck by aborting some queries, but when it gets stuck again the second time, `checkIfTimeout()` fires when `op.hasTimeout()` returns true and we failed as timeout

1. Accept Timeout-Based Abort as Temporary Unblock Mechanism
The head-of-line blocking situation can be complicated and various, and the current abort-by-timeout is actually working as an unblock procedure. We might revisit the fairness vs. best-effort tradeoff later, but for now, let's keep it as it is. We can treat the timeout and cascading abort behavior as expected in these scenarios.

2. Reduce Timeout from 5 Minutes to 1 Minute
The 5-minute timeout is too generous. In most cases, when arbitration is not able to move forward, continuing to wait is just a waste of time due to the stuck situratoin. 1 minute is large enough.

Reviewed By: xiaoxmeng, tanjialiang

Differential Revision: D87905815

fbshipit-source-id: 1c02470efb425453ebcabae74b98658b21ef44e3
…#15598)

Summary:
Pull Request resolved: facebookincubator#15598

misc: Add TopNRowNumber in MemoryArbitrationFuzzer

Reviewed By: xiaoxmeng

Differential Revision: D87612585

fbshipit-source-id: 9a54bca3e3a3dcb067ffee42b730f4f50aac0a07
…incubator#15612)

Summary:
Pull Request resolved: facebookincubator#15612

freeUnusedCapacityWhenReclaimMemoryPool was flaky

the main test thread reads stats immediately after joining the allocation thread, but the global arbitration controller updates the reclaimedUsedBytes_ stat asynchronously  later.

1. Main thread: allocThread.join() - waits for allocation to complete
2. Main thread: arbitrator_->stats() - reads stats (returns 0B), and assertion failed
3. Global arbitration controller thread: Updates reclaimedUsedBytes_ = 512MB

From system  causality perspective: stats should reflect the state of the system before other threads can observe it.

Reviewed By: xiaoxmeng, tanjialiang

Differential Revision: D87809685

fbshipit-source-id: be593bc11db69ebf9f11e5c3f35c12d36bce1678
Summary:
Pull Request resolved: facebookincubator#15617

A recent change facebookincubator#14582
made the exception initialization codepath create and maintain a
static list of exceptions shared by all Cast to JSON expressions.
Since Velox error messages add additional context when generated,
the static exception ends up retaining the message which contains
the expression and the top level expression details of the very
first error produced. This then ends up being re-used across
expression objects and across different query plan resulting in
wrong error message being thrown.
This change reverts the original PR to ensure exceptions with the
expected context are thrown.

Reviewed By: Yuhta

Differential Revision: D87838044

fbshipit-source-id: 9b70ef28f670fe556cbea6aa0de984a667a8c643
Summary:
Pull Request resolved: facebookincubator#15639

StringView's are getting implicitly copied into std::string whenever
StringView is used in folly/Conv.h functions (folly::to() and similar).
Providing a specialization to avoid the implicit copy.

Reviewed By: peterenescu, mbasmanova

Differential Revision: D87929077

fbshipit-source-id: dfb9364243af624f0ce672527fbe1d5e6ccb4d4b
…ubator#15545)

Summary:
Pull Request resolved: facebookincubator#15545

This is a fallback mode to read flatmap as struct.  It is not used in Presto but needed in other ML compute engines.

Reviewed By: xiaoxmeng

Differential Revision: D87095482

fbshipit-source-id: f46dacb148e51b3a99447fc552164da715513033
Summary:
Pull Request resolved: facebookincubator#15644

Add intermediate type transform for KHyperLogLog to enable fuzzer testing.

**Change**
Adds KHyperLogLog <-> VARBINARY transform for Presto Java compatibility. KHLL is an intermediate type only (used for computation). Storage in hive column tables is as VARBINARY.

Also excludes KHLL from isConstantExprSupported() since it cannot be used as a constant literal in SQL (e.g., WHERE column = KHYPERLOGLOG X'...').

Follows the same pattern as SetDigest (D87554118)

Reviewed By: HeidiHan0000

Differential Revision: D87946570

fbshipit-source-id: 4249164e91e3fee6e93364333c460b668429c118
Summary:
Pull Request resolved: facebookincubator#15649

This diff reverts D87598240
This change introduced incorrect behavior as described in facebookincubator#15648.

Depends on D87598240

Reviewed By: amitkdutta

Differential Revision: D87955860

fbshipit-source-id: bea5622d14db4cedd0060d57b285d6e82b4a051e
…ebookincubator#15625)

Summary:
Pull Request resolved: facebookincubator#15625

**1. DEADLOCK**
We have observed hanging when parallelUnitLoader is turned on ( [stacktrace](https://www.internalfb.com/intern/everpaste/?handle=GGR-SCIae13bMmFZAKsB6AQpWsMJbsIXAAAB&phabricator_paste_number=2015904911)). It hanged because there is a circular dependency when memory arbitration is triggered (P2054964327)
- HashBuild triggers memory arbitration
- It acquires the **arbitration lock** and reclaim memory from TableScan
- That triggers TableScan's destruction which triggers ParallelUnitLoader destruction
- ParallelUnitLoader destructor waits for async load to finish
- the async load triggers memory arbitration again and tries to acquire the same **arbiration lock**

To break the dependency, we no longer wait for asyc load operation at ParallelUnitLoader destruction.

**2. LIFE CYCLE**
To make sure asyc load operation can still function after ParallelUnitLoader destruction and DwrfRowReader destruction:
- Capture DwrfUnit as shared_pointer to keep it alive after ParallelUnitLoader is destroyed
- Make DwrfUnit hold onto the used components:
   - Pass ReaderBase as shared pointer
    - Pass ColumnReaderStatistics as shared pointer
    - Pass columnReaderOptions as copy instead of reference

Reviewed By: xiaoxmeng

Differential Revision: D87828770

fbshipit-source-id: e4a89b3eca5a119cbe716774a4d276e3e4d225d3
)

Summary:
Pull Request resolved: facebookincubator#15626

There are more use cases integrated with velox that needs SerializedPage to be extensible. Extract interface as BaseSerializedPage and make the current one PrestoSerializedPage. There's no logic change in the codebase.

Reviewed By: xiaoxmeng

Differential Revision: D87852061

fbshipit-source-id: 63fa36ed82b9e05411442f04fc2dd154af071b99
…15616)

Summary:
The error message can be misleading when op.participant()->maxCapacity()) > Arbitrator's capacity_

e.g,
[2025-11-22T14:46:29.718-08:00] E1122 14:41:27.973918 1762211 Exceptions.h:53] Line: fbcode/velox/common/memory/SharedArbitrator.cpp:1070, Function: growCapacity, Expression: Exceeded memory pool capacity. Can't grow default_root_7 capacity **with 27.00MB. This will exceed its max capacity 128.00MB, current capacity 11.00MB.**

Pull Request resolved: facebookincubator#15616

Reviewed By: xiaoxmeng, tanjialiang

Differential Revision: D87829034

Pulled By: duxiao1212

fbshipit-source-id: f55f1a6ab38905c2ad4b1f8953bf72c44bf61c66
…tor#15653)

Summary:
Pull Request resolved: facebookincubator#15653

Improve thread efficiency by switching to use AsycSource
- when load() is queued and not running on IOThreadPool, allow caller (driver thread) to directly pick up the work
- when ParallelUnitLoader is destroyed, cancel any load() that hasn't started running

Reviewed By: xiaoxmeng

Differential Revision: D87993259

fbshipit-source-id: 07d2dba712f5a69a9e6d87a7427acc693ccf9ad8
rui-mo and others added 26 commits December 2, 2025 00:06
Signed-off-by: Yuan <[email protected]>

Alchemy-item: (ID = 878) [OAP] Support struct schema evolution matching by name commit 1/1 - 58e3d01
Alchemy-item: (ID = 883) [OAP] [13620] Allow reading integers into smaller-range types  commit 1/1 - 4cae2f5
… outer join

Signed-off-by: Yuan <[email protected]>

Alchemy-item: (ID = 882) [OAP] [11771] Fix smj result mismatch issue commit 1/1 - ada7dd2
… and anti join

Address comments

disable by default

Alchemy-item: (ID = 884) [OAP] Build hash table while adding input rows for left semi and anti join commit 1/1 - b7c9034
Alchemy-item: (ID = 901) [OAP] [14722] Fix memory leak caused by asynchronous prefetch commit 1/1 - 075e790
This reverts commit fd0682b.

Alchemy-item: (ID = 910) Iceberg staging hub commit 1/16 - a387657
…#15461)"

This reverts commit 7576f4e.

Alchemy-item: (ID = 910) Iceberg staging hub commit 2/16 - 88d70b3
…facebookincubator#15477)"

This reverts commit 1895711.

Alchemy-item: (ID = 910) Iceberg staging hub commit 3/16 - 90a3c0c
…incubator#15443)"

This reverts commit 51d4a94.

Alchemy-item: (ID = 910) Iceberg staging hub commit 4/16 - f4f9f6c
…15423)"

This reverts commit 600524b.

Alchemy-item: (ID = 910) Iceberg staging hub commit 5/16 - b74e3af
The function toValues removes duplicated values from the vector and
return them in a std::vector. It was used to build an InPredicate. It
will be needed for building NOT IN filters for Iceberg equality delete
read as well, therefore moving it from velox/functions/prestosql/InPred
icate.cpp to velox/type/Filter.h. This commit also renames it to
deDuplicateValues to make it easier to understand.

Alchemy-item: (ID = 910) Iceberg staging hub commit 6/16 - 02e6679
This commit introduces EqualityDeleteFileReader, which is used to read
Iceberg splits with equality delete files. The equality delete files
are read to construct domain filters or filter functions, which then
would be evaluated in the base file readers.

When there is only one equality delete field, and when that field is
an Iceberg identifier field, i.e. non-floating point primitive types,
the values would be converted to a list as a NOT IN domain filter,
with the NULL treated separately. This domain filter would then be
pushed to the ColumnReaders to filter our unwanted rows before they
are read into Velox vectors. When the equality delete column is a
nested column, e.g. a sub-column in a struct, or the key in a map,
such column may not be in the base file ScanSpec. We need to add/remove
these subfields to/from the SchemaWithId and ScanSpec recursively if
they were not in the ScanSpec already. A test is also added for such
case.

If there are more than one equality delete field, or the field is not
an Iceberg identifier field, the values would be converted to a typed
expression in the conjunct of disconjunts form. This expression would
be evaluated as the remaining filter function after the rows are read
into the Velox vectors. Note that this only works for Presto now as
the "neq" function is not registered by Spark. See https://github.com/
facebookincubator/issues/12667

Note that this commit only supports integral types. VARCHAR and
VARBINARY need to be supported in future commits (see
facebookincubator#12664).

Co-authored-by: Naveen Kumar Mahadevuni <[email protected]>

# Conflicts:
#	velox/connectors/hive/iceberg/tests/IcebergReadTest.cpp

# Conflicts:
#	velox/dwio/common/ScanSpec.h

# Conflicts:
#	velox/type/Filter.h

Alchemy-item: (ID = 910) Iceberg staging hub commit 7/16 - b24a9d8
# Conflicts:
#	velox/connectors/hive/HiveConnectorUtil.cpp

Alchemy-item: (ID = 910) Iceberg staging hub commit 8/16 - f83cddc
Co-authored-by: Chengcheng Jin <[email protected]>

Alchemy-item: (ID = 910) Iceberg staging hub commit 9/16 - 4eafc56
Alchemy-item: (ID = 910) Iceberg staging hub commit 10/16 - b6b1dcd
Alchemy-item: (ID = 910) Iceberg staging hub commit 11/16 - d6c88cc
…finity.

Alchemy-item: (ID = 910) Iceberg staging hub commit 12/16 - ee705c2
Alchemy-item: (ID = 910) Iceberg staging hub commit 13/16 - 782b78a
Alchemy-item: (ID = 910) Iceberg staging hub commit 14/16 - afb0694
Alchemy-item: (ID = 910) Iceberg staging hub commit 15/16 - 5437985
Alchemy-item: (ID = 910) Iceberg staging hub commit 16/16 - 136f9b1
Signed-off-by: Yuan <[email protected]>

Alchemy-item: (ID = 906) fix: Adding daily tests commit 1/2 - e2eb2c6
we can cache ccache on every build even on failure, since ibm/velox is
always incremental build

Alchemy-item: (ID = 906) fix: Adding daily tests commit 2/2 - 0899ddc
Signed-off-by: Yuan <[email protected]>

Alchemy-item: (ID = 756) fix: Remove website folder to bypass the security issues commit 1/1 - f7b127b
Signed-off-by: Yuan <[email protected]>

Alchemy-item: (ID = 902) fix: Disable flaky test temporary commit 1/1 - 0bc5c50
@prestodb-ci
Copy link
Collaborator Author

@prestodb-ci
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.