Skip to content

Conversation

@carloea2
Copy link
Contributor

@carloea2 carloea2 commented Dec 7, 2025

What changes were proposed in this PR?

  • DB / schema

    • Add dataset_upload_session and dataset_upload_session_part tables to track multipart upload sessions, per-part status, and S3/LakeFS metadata.
  • Backend (DatasetResource)

    • Partially new multipart upload API:

      • POST /dataset/multipart-upload?type=init → creates a LakeFS multipart session, stores it in DB, and returns an uploadToken.
      • POST /dataset/multipart-upload/part?token=...&partNumber=... → streams a single part to the presigned URL, with row-level locking and PENDING/UPLOADING/COMPLETED state transitions.
      • POST /dataset/multipart-upload?type=finish|abort → completes or aborts the LakeFS multipart upload and cleans up DB records.
    • Keep existing access control and dataset permissions enforced on all new endpoints.

  • Frontend service (dataset.service.ts)

    • Main changes in multipartUpload(...):

      • Calls init to get uploadToken.
      • Uploads file parts via /multipart-upload/part streaming them with concurrency.
  • Frontend component (dataset-detail.component.ts)

    • Use uploadToken to cancel/abort.

Any related issues, documentation, discussions?

Closes #4110


How was this PR tested?

  • Manually uploaded large files via the dataset detail page (single and multiple), checked:

    • Progress, speed, and ETA updates.
    • Abort behavior (UI state + DB session cleanup).
    • Successful completion path (all parts COMPLETED, LakeFS object present, dataset version creation works).
    • Unit testing is missing

Was this PR authored or co-authored using generative AI tooling?

GPT partial use.

@github-actions github-actions bot added feature ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI service labels Dec 7, 2025
@carloea2 carloea2 changed the title task(dataset): Redirect multipart upload through File Service #4110 refactor(dataset): Redirect multipart upload through File Service #4110 Dec 7, 2025
@Yicong-Huang Yicong-Huang changed the title refactor(dataset): Redirect multipart upload through File Service #4110 refactor(dataset): Redirect multipart upload through File Service Dec 8, 2025
@carloea2
Copy link
Contributor Author

@chenlica @aicam

According to our discussion I will look deeper to find a solution that does not involve creating new tables or mappings and rely on LakeFS/MinIO more.

@carloea2 carloea2 closed this Dec 10, 2025
@chenlica
Copy link
Contributor

Thanks for checking the details. If you create a new PR, please mention it in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL feature frontend Changes related to the frontend GUI service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

task(dataset): Redirect multipart upload through File Service

2 participants