Skip to content

Conversation

@shreyashankar
Copy link
Collaborator

@shreyashankar shreyashankar commented Nov 24, 2025

PR Description

  • Closes #461
  • Adds Python SDK support for code-based operations and exposes ExtractOp per Python API expectations: code_map, code_reduce, code_filter, and extract now work with docetl.api.Pipeline and typed schemas.
  • Enables passing real Python functions to code ops instead of stringified code.

Summary of changes

  • SDK schemas
    • Added CodeMapOp, CodeReduceOp, CodeFilterOp, ExtractOp to docetl.schemas.OpType.
  • API exports
    • Exported the above ops in docetl.api.__all__.
  • Pipeline loader
    • Updated Pipeline._update_from_dict to recognize code_map, code_reduce, code_filter, extract.
  • Callable support for code ops
    • docetl.operations.code_operations.*.schema.code now accepts a callable; it’s converted to source with inspect.getsource and bound as transform.
  • Tests
    • Added tests in tests/test_api.py for:
      • CodeMapOp with callable transform
      • CodeFilterOp with callable predicate
      • CodeReduceOp with callable group reducer
      • Export presence of ExtractOp
    • Tests are deterministic and avoid network/LLM calls.

Usage example

from docetl.api import Pipeline, Dataset, PipelineStep, PipelineOutput, CodeMapOp

def double(doc: dict) -> dict:
    return {"double": doc["x"] * 2}

pipeline = Pipeline(
    name="example",
    datasets={"input": Dataset(type="file", path="input.json")},
    operations=[CodeMapOp(name="double_x", type="code_map", code=double)],
    steps=[PipelineStep(name="s1", input="input", operations=["double_x"])],
    output=PipelineOutput(type="file", path="out.json"),
    default_model="gpt-4o-mini",
)

Files touched

  • docetl/schemas.py: add and export new op schemas; extend OpType
  • docetl/api.py: export new ops; handle new op types in Pipeline._update_from_dict
  • docetl/operations/code_operations.py: accept callables for code via schema validators
  • tests/test_api.py: add tests for code ops and ExtractOp

Backward compatibility

  • No breaking changes to existing APIs.
  • Code ops now accept either strings or callables; string behavior unchanged.

Docs

Notes

  • Callable must be a regular def (no lambda/closures); source is captured via inspect.getsource.
  • CodeMap: transform(doc: dict) -> dict
  • CodeFilter: transform(doc: dict) -> bool
  • CodeReduce: transform(group: list[dict]) -> dict

@shreyashankar shreyashankar merged commit a184a3c into main Nov 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code Map Operations for sdk not found

2 participants