Skip to content

Releases: tensorflow/tfx

TFX 0.23.1

24 Sep 21:12
d352f12
Compare
Choose a tag to compare

Version 0.23.1

  • This is a bug fix version (to resolve impossible dependency conflicts).

Major Features and Improvements

  • N/A

Bug fixes and other changes

  • Stopped depending on google-resumable-media.
  • Depends on apache-beam[gcp]>=2.24,<3.
  • Depends on tensorflow-data-validation>=0.23.1,<0.24.

Breaking changes

  • N/A

For pipeline authors

  • N/A

For component authors

  • N/A

Documentation updates

  • N/A

Deprecations

  • Deprecated Python 3.5 support.

TFX 0.24.0-rc1

22 Sep 23:53
7790b96
Compare
Choose a tag to compare
TFX 0.24.0-rc1 Pre-release
Pre-release

Major Features and Improvements

  • Use TFXIO and batched extractors by default in Evaluator.
  • Supported split configuration for Transform.
  • Added python 3.8 support.

Bug fixes and other changes

  • Supported CAIP Runtime 2.2 for online prediction pusher.
  • Used 'python -m ' style for container entrypoints.
  • Stopped depending on Werkzeug.
  • Depends on absl-py>=0.9,<0.11.
  • Depends on apache-beam[gcp]>=2.24,<3.
  • Depends on ml-metadata>=0.24,<0.25.
  • Depends on tensorflow-data-validation>=0.24,<0.25.
  • Depends on tensorflow-model-analysis>=0.24.2,<0.25.
  • Depends on tensorflow-transform>=0.24,<0.25.
  • Depends on tfx-bsl>=0.24,<0.25.

Breaking changes

  • N/A

For pipeline authors

  • N/A

For component authors

  • N/A

Documentation updates

  • N/A

Deprecations

  • Deprecated python 3.5 support.

TFX 0.24.0-rc0

16 Sep 21:38
9bbf305
Compare
Choose a tag to compare
TFX 0.24.0-rc0 Pre-release
Pre-release

Major Features and Improvements

  • Use TFXIO and batched extractors by default in Evaluator.
  • Supported split configuration for Transform.
  • Added python 3.8 support.

Bug fixes and other changes

  • Supported CAIP Runtime 2.2 for online prediction pusher.
  • Stopped depending on Werkzeug.
  • Depends on absl-py>=0.9,<0.11.
  • Depends on ml-metadata>=0.24,<0.25.
  • Depends on tensorflow-data-validation>=0.24,<0.25.
  • Depends on tensorflow-model-analysis>=0.24,<0.25.
  • Depends on tensorflow-transform>=0.24,<0.25.
  • Depends on tfx-bsl>=0.24,<0.25.

Breaking changes

  • N/A

For pipeline authors

  • N/A

For component authors

  • N/A

Documentation updates

  • N/A

Deprecations

  • Deprecated python 3.5 support.

TFX 0.23.0 Release

02 Sep 03:56
c46f843
Compare
Choose a tag to compare

Major Features and Improvements

  • Added TFX DSL IR compiler that encodes a TFX pipeline into a DSL proto.
  • Supported feature based split partition in ExampleGen.
  • Added the ConcatPlaceholder to tfx.dsl.component.experimental.placeholders.
  • Changed Span information as a property of ExampleGen's output artifact.
    Deprecated ExampleGen input (external) artifact.
  • Added ModelRun artifact for Trainer for storing training related files,
    e.g., Tensorboard logs. Trainer's Model artifact now only contain pure
    models (check tfx/utils/path_utils.py for details).
  • Added support for tf.train.SequenceExample in ExampleGen:
    • ImportExampleGen now supports tf.train.SequenceExample importing.
    • base_example_gen_executor now supports tf.train.SequenceExample as
      output payload format, which can be utilized by custom ExampleGen.
  • Added Tuner component and its integration with Google Cloud Platform as
    the execution and hyperparemeter optimization backend.
  • Switched Transform component to use the new TFXIO code path. Users may
    potentially notice large performance improvement.
  • Added support for primitive artifacts to InputValuePlaceholder.
  • Supported multiple artifacts for Trainer and Tuner's input example Channel.
  • Supported split configuration for Trainer and Tuner.
  • Supported split configuration for Evaluator.
  • Supported split configuration for StatisticsGen, SchemaGen and
    ExampleValidator. SchemaGen will now use all splits to generate schema
    instead of just using train split. ExampleValidator will now validate all
    splits against given schema instead of just validating eval split.
  • Component authors now can create a TFXIO instance to get access to the
    data through tfx.components.util.tfxio_utils. As TFX is going to
    support more data payload formats and data container formats, using
    tfxio_utils is encouraged to avoid dealing directly with each combination.
    TFXIO is the interface of Standardized TFX Inputs.
  • Added experimental BaseStubExecutor and StubComponentLauncher to test TFX
    pipelines.
  • Added experimental TFX Pipeline Recorder to record output artifacts of the
    pipeline.
  • Supported multiple artifacts in an output Channel to match a certain input
    Channel's artifact count. This enables Transform component to process
    multiple artifacts.
  • Transform component's transformed examples output is now optional (enabled
    by default). This can be disabled by specifying parameter
    materialize=False when constructing the component.
  • Supported Version spec in input config for file based ExampleGen.
  • Added custom config to Transform component and made it available to
    pre-processing fn.
  • Supported custom extractors in Evaluator.
  • Deprecated tensorflow dependency from MLMD python client.
  • Supported Date spec in input config for file based ExampleGen.

Bug fixes and other changes

  • Added Tuner component to Iris e2e example.
  • Relaxed the rule that output artifact uris must be newly created. This is a
    temporary workaround to make retry work. We will introduce a more
    comprehensive solution for idempotent execution.
  • Made evaluator output optional (while still recommended) for pusher.
  • Moved BigQueryExampleGen to tfx.extensions.google_cloud_big_query.
  • Moved BigQuery ML Pusher to tfx.extensions.google_cloud_big_query.pusher.
  • Removed Tuner from custom_components/ as it's supported under components/
    now.
  • Added support of non tf.train.Example protos as internal data payload
    format by ImportExampleGen.
  • Used thread local storage for label_utils.scoped_labels() to make it
    thread safe.
  • Requires Bazel to build TFX source code.
  • Upgraded python version in TFX docker images to 3.7. Older version of
    python (2.7/3.5/3.6) is not available anymore in tensorflow/tfx images
    on docker hub. Virtualenv is not used anymore.
  • Stopped requiring avro-python3.
  • Depends on absl-py>=0.7,<0.9.
  • Depends on apache-beam[gcp]>=2.23,<3.
  • Depends on pyarrow>=0.17,<0.18.
  • Depends on attrs>=19.3.0,<20.
  • Depends on ml-metadata>=0.23,<0.24.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3.
    • Note: Dependency like tensorflow-transform might impose a narrower
      range of tensorflow.
  • Depends on tensorflow-data-validation>=0.23,<0.24.
  • Depends on tensorflow-model-analysis>=0.23,<0.24.
  • Depends on tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,<3.
  • Depends on tensorflow-transform>=0.23,<0.24.
  • Depends on tfx-bsl>=0.23,<0.24.

Breaking changes

  • Changed the URIs of the value artifacts to point to files.

For pipeline authors

  • Moved BigQueryExampleGen to tfx.extensions.google_cloud_big_query. The
    previous module path from tfx.components is not available anymore. This is
    a breaking change.
  • Moved BigQuery ML Pusher to tfx.extensions.google_cloud_big_query.pusher.
    The previous module path from tfx.extensions.google_cloud_big_query_ml
    is not available anymore.
  • Updated beam pipeline args, users now need to set both direct_running_mode
    and direct_num_workers explicitly for multi-processing.
  • Added required 'output_data_format' execution property to
    FileBaseExampleGen.
  • Changed ExampleGen to take a string as input source directly instead of a
    Channel of external artifact:
    • Previously deprecated input_base Channel is changed to string type
      instead of Channel. This is a breaking change, users should pass string
      directly to input_base.
  • Fully removed csv_input and tfrecord_input in dsl_utils. This is a breaking
    change, users should pass string directly to input_base.

For component authors

  • Changed GetInputSourceToExamplePTransform interface by removing input_dict.
    This is a breaking change, custom ExampleGens need to follow the interface
    change.
  • Changed ExampleGen to take a string as input source directly instead of a
    Channel of external artifact:
    • input Channel is deprecated. The use of input is valid but
      should change to string type input_base ASAP.

Documentation updates

  • N/A

Deprecations

  • ExternalArtifact and external_input function are deprecated. The use
    of external_input with ExampleGen input is still valid but should change
    to use input_base ASAP.
  • Note: We plan to remove Python 3.5 support after this release.

TFX 0.23.0-rc0 Release

19 Aug 16:16
ed7640c
Compare
Choose a tag to compare
Pre-release

Version 0.23.0

Major Features and Improvements

  • Added TFX DSL IR compiler that encodes a TFX pipeline into a DSL proto.
  • Supported feature based split partition in ExampleGen.
  • Added the ConcatPlaceholder to tfx.dsl.component.experimental.placeholders.
  • Changed Span information as a property of ExampleGen's output artifact.
    Deprecated ExampleGen input (external) artifact.
  • Added ModelRun artifact for Trainer for storing training related files,
    e.g., Tensorboard logs. Trainer's Model artifact now only contain pure
    models (check tfx/utils/path_utils.py for details).
  • Added support for tf.train.SequenceExample in ExampleGen:
    • ImportExampleGen now supports tf.train.SequenceExample importing.
    • base_example_gen_executor now supports tf.train.SequenceExample as
      output payload format, which can be utilized by custom ExampleGen.
  • Added Tuner component and its integration with Google Cloud Platform as
    the execution and hyperparemeter optimization backend.
  • Switched Transform component to use the new TFXIO code path. Users may
    potentially notice large performance improvement.
  • Added support for primitive artifacts to InputValuePlaceholder.
  • Supported multiple artifacts for Trainer and Tuner's input example Channel.
  • Supported split configuration for Trainer and Tuner.
  • Supported split configuration for Evaluator.
  • Supported split configuration for StatisticsGen, SchemaGen and
    ExampleValidator. SchemaGen will now use all splits to generate schema
    instead of just using train split. ExampleValidator will now validate all
    splits against given schema instead of just validating eval split.
  • Component authors now can create a TFXIO instance to get access to the
    data through tfx.components.util.tfxio_utils. As TFX is going to
    support more data payload formats and data container formats, using
    tfxio_utils is encouraged to avoid dealing directly with each combination.
    TFXIO is the interface of Standardized TFX Inputs.
  • Added experimental BaseStubExecutor and StubComponentLauncher to test TFX
    pipelines.
  • Added experimental TFX Pipeline Recorder to record output artifacts of the
    pipeline.
  • Supported multiple artifacts in an output Channel to match a certain input
    Channel's artifact count. This enables Transform component to process
    multiple artifacts.
  • Transform component's transformed examples output is now optional (enabled
    by default). This can be disabled by specifying parameter
    materialize=False when constructing the component.
  • Supported Version spec in input config for file based ExampleGen.
  • Added custom config to Transform component and made it available to
    pre-processing fn.
  • Supported custom extractors in Evaluator.
  • Deprecated tensorflow dependency from MLMD python client.
  • Supported Date spec in input config for file based ExampleGen.

Bug fixes and other changes

  • Added Tuner component to Iris e2e example.
  • Relaxed the rule that output artifact uris must be newly created. This is a
    temporary workaround to make retry work. We will introduce a more
    comprehensive solution for idempotent execution.
  • Made evaluator output optional (while still recommended) for pusher.
  • Moved BigQueryExampleGen to tfx.extensions.google_cloud_big_query.
  • Moved BigQuery ML Pusher to tfx.extensions.google_cloud_big_query.pusher.
  • Removed Tuner from custom_components/ as it's supported under components/
    now.
  • Added support of non tf.train.Example protos as internal data payload
    format by ImportExampleGen.
  • Used thread local storage for label_utils.scoped_labels() to make it
    thread safe.
  • Requires Bazel to build TFX source code.
  • Upgraded python version in TFX docker images to 3.7. Older version of
    python (2.7/3.5/3.6) is not available anymore in tensorflow/tfx images
    on docker hub. Virtualenv is not used anymore.
  • Stopped requiring avro-python3.
  • Depends on absl-py>=0.7,<0.9.
  • Depends on apache-beam[gcp]>=2.23,<3.
  • Depends on pyarrow>=0.17,<0.18.
  • Depends on attrs>=19.3.0,<20.
  • Depends on ml-metadata>=0.23,<0.24.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3.
    • Note: Dependency like tensorflow-transform might impose a narrower
      range of tensorflow.
  • Depends on tensorflow-data-validation>=0.23,<0.24.
  • Depends on tensorflow-model-analysis>=0.23,<0.24.
  • Depends on tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,<3.
  • Depends on tensorflow-transform>=0.23,<0.24.
  • Depends on tfx-bsl>=0.23,<0.24.

Breaking changes

  • Changed the URIs of the value artifacts to point to files.

For pipeline authors

  • Moved BigQueryExampleGen to tfx.extensions.google_cloud_big_query. The
    previous module path from tfx.components is not available anymore. This is
    a breaking change.
  • Moved BigQuery ML Pusher to tfx.extensions.google_cloud_big_query.pusher.
    The previous module path from tfx.extensions.google_cloud_big_query_ml
    is not available anymore.
  • Updated beam pipeline args, users now need to set both direct_running_mode
    and direct_num_workers explicitly for multi-processing.
  • Added required 'output_data_format' execution property to
    FileBaseExampleGen.
  • Changed ExampleGen to take a string as input source directly instead of a
    Channel of external artifact:
    • Previously deprecated input_base Channel is changed to string type
      instead of Channel. This is a breaking change, users should pass string
      directly to input_base.
  • Fully removed csv_input and tfrecord_input in dsl_utils. This is a breaking
    change, users should pass string directly to input_base.

For component authors

  • Changed GetInputSourceToExamplePTransform interface by removing input_dict.
    This is a breaking change, custom ExampleGens need to follow the interface
    change.
  • Changed ExampleGen to take a string as input source directly instead of a
    Channel of external artifact:
    • input Channel is deprecated. The use of input is valid but
      should change to string type input_base ASAP.

Documentation updates

  • N/A

Deprecations

  • ExternalArtifact and external_input function are deprecated. The use
    of external_input with ExampleGen input is still valid but should change
    to use input_base ASAP.
  • Note: We plan to remove Python 3.5 support after this release.

TFX 0.22.1 Release

17 Jul 20:32
745e5ac
Compare
Choose a tag to compare

Version 0.22.1

Major Features and Improvements

Bug fixes and other changes

  • Depends on 'tensorflowjs>=2.0.1.post1,<3' for [all] dependency.
  • Fixed the name of the usage telemetry when tfx templates are used.
  • Depends on tensorflow-data-validation>=0.22.2,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.2,<0.23.0.
  • Depends on tfx-bsl>=0.22.1,<0.23.0.
  • Depends on ml-metadata>=0.22.1,<0.23.0.

Breaking changes

N/A

For pipeline authors

N/A

For component authors

N/A

Documentation updates

N/A

Deprecations

N/A

TFX 0.22.1-rc1 Release

14 Jul 16:18
9241d9d
Compare
Choose a tag to compare
Pre-release

Version 0.22.1

Major Features and Improvements

Bug fixes and other changes

  • Depends on 'tensorflowjs>=2.0.1.post1,<3' for [all] dependency.
  • Fixed the name of the usage telemetry when tfx templates are used.
  • Depends on tensorflow-data-validation>=0.22.2,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.2,<0.23.0.
  • Depends on tfx-bsl>=0.22.1,<0.23.0.
  • Depends on ml-metadata>=0.22.1,<0.23.0.

Breaking changes

For pipeline authors

For component authors

Documentation updates

Deprecations

TFX 0.22.1-rc0 Release

14 Jul 02:56
573c077
Compare
Choose a tag to compare
Pre-release

Version 0.22.1

Major Features and Improvements

Bug fixes and other changes

  • Depends on 'tensorflowjs>=2.0.1.post1,<3' for [all] dependency.
  • Fixed the name of the usage telemetry when tfx templates are used.
  • Depends on tensorflow-data-validation>=0.22.2,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.2,<0.23.0.
  • Depends on tfx-bsl>=0.22.1,<0.23.0.
  • Depends on ml-metadata>=0.22.1,<0.23.0.

Breaking changes

For pipeline authors

For component authors

Documentation updates

Deprecations

TFX 0.22.0 Release

11 Jun 23:13
83c4806
Compare
Choose a tag to compare

Major Features and Improvements

  • Introduced experimental Python function component decorator (@component
    decorator under tfx.dsl.component.experimental.decorators) allowing
    Python function-based component definition.
  • Added the experimental TemplatedExecutorContainerSpec executor class that
    supports structural placeholders (not Jinja placeholders).
  • Added the experimental function "create_container_component" that
    simplifies creating container-based components.
  • Implemented a TFJS rewriter.
  • Added the scripts/run_component.py script which makes it easy to run the
    component code and executor code. (Similar to scripts/run_executor.py)
  • Added support for container component execution to BeamDagRunner.
  • Introduced experimental generic Artifact types for ML workflows.
  • Added support for float execution properties.

Bug fixes and other changes

  • Migrated BigQueryExampleGen to the new (experimental) ReadFromBigQuery
    PTramsform when not using Dataflow runner.
  • Enhanced add_downstream_node / add_upstream_node to apply symmetric changes
    when being called. This method enables task-based dependencies by enforcing
    execution order for synchronous pipelines on supported platforms. Currently,
    the supported platforms are Airflow, Beam, and Kubeflow Pipelines. Note that
    this API call should be considered experimental, and may not work with
    asynchronous pipelines, sub-pipelines and pipelines with conditional nodes.
  • Added the container-based sample pipeline (download, filter, print)
  • Removed the incomplete cifar10 example.
  • Removed python-snappy from [all] extra dependency list.
  • Tests depends on apache-airflow>=1.10.10,<2;
  • Removed test dependency to tzlocal.
  • Fixes unintentional overriding of user-specified setup.py file for Dataflow
    jobs when running on KFP container.
  • Made ComponentSpec().inputs and .outputs behave more like real dictionaries.
  • Depends on kerastuner>=1,<2.
  • Depends on pyyaml>=3.12,<6.
  • Depends on apache-beam[gcp]>=2.21,<3.
  • Depends on grpcio>=2.18.1,<3.
  • Depends on kubernetes>=10.0.1,<12.
  • Depends on tensorflow>=1.15,!=2.0.*,<3.
  • Depends on tensorflow-data-validation>=0.22.0,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.1,<0.23.0.
  • Depends on tensorflow-transform>=0.22.0,<0.23.0.
  • Depends on tfx-bsl>=0.22.0,<0.23.0.
  • Depends on ml-metadata>=0.22.0,<0.23.0.
  • Fixed a bug in io_utils.copy_dir which prevent it to work correctly for
    nested sub-directories.

Breaking changes

For pipeline authors

  • Changed custom config for the Do function of Trainer and Pusher to accept
    a JSON-serialized dict instead of a dict object. This also impacts all the
    Do functions under tfx.extensions.google_cloud_ai_platform and
    tfx.extensions.google_cloud_big_query_ml. Note that this breaking
    change occurs at the signature of the executor's Do function. Therefore, if
    the user did not customize the Do function, and the compile time SDK version
    is aligned with the run time SDK version, previous pipelines should still
    work as intended. If the user is using a custom component with customized
    Do function, custom_config should be assumed to be a JSON-serialized
    string from next release.
  • For users of BigQueryExampleGen, --temp_location is now a required Beam
    argument, even for DirectRunner. Previously this argument was only required
    for DataflowRunner. Note that the specified value of --temp_location
    should point to a Google Cloud Storage bucket.
  • Revert current per-component cache API (with enable_cache, which was only
    available in tfx>=0.21.3,<0.22), in preparing for a future redesign.

For component authors

  • Converted the BaseNode class attributes to the constructor parameters. This
    won't affect any components derived from BaseComponent.
  • Changed the encoding of the Integer and Float artifacts to be more portable.

Documentation updates

  • Added concept guides for understanding TFX pipelines and components.
  • Added guides to building Python function-based components and
    container-based components.
  • Added BulkInferrer component and TFX CLI documentation to the table of
    contents.

Deprecations

  • Deprecating Py2 support

TFX 0.22.0-rc0

03 Jun 20:07
592d245
Compare
Choose a tag to compare
TFX 0.22.0-rc0 Pre-release
Pre-release

Version 0.22.0

Major Features and Improvements

  • Implemented a TFJS rewriter.
  • Introduced experimental Python function component decorator (@component
    decorator under tfx.dsl.component.experimental.decorators) allowing
    Python function-based component definition.
  • Added the experimental TemplatedExecutorContainerSpec executor class that
    supports structural placeholders (not Jinja placeholders).
  • Migrated BigQueryExampleGen to the new (experimental) ReadFromBigQuery
    PTramsform when not using Dataflow runner.
  • Added the experimental function "create_container_component" that
    simplifies creating container-based components.
  • Removed the incomplete cifar10 example.
  • Enhanced add_downstream_node / add_upstream_node to apply symmetric changes
    when being called. This method enables task-based dependencies by enforcing
    execution order for synchronous pipelines on supported platforms. Currently,
    the supported platforms are Airflow, Beam, and Kubeflow Pipelines. Note that
    this API call should be considered experimental, and may not work with
    asynchronous pipelines, sub-pipelines and pipelines with conditional nodes.
  • Added Tuner component.
  • Added the container-based sample pipeline (download, filter, print)
  • Added the scripts/run_component.py script which makes it easy to run the
    component code and executor code. (Similar to scripts/run_executor.py)
  • Added support for container component execution to BeamDagRunner.
  • Introduced experimental generic Artifact types for ML workflows.

Bug fixes and other changes

  • Removed python-snappy from [all] extra dependency list.
  • Tests depends on apache-airflow>=1.10.10,<2;
  • Removed test dependency to tzlocal.
  • Fixes unintentional overriding of user-specified setup.py file for Dataflow
    jobs when running on KFP container.
  • Made ComponentSpec().inputs and .outputs behave more like real dictionaries.
  • Depends on kerastuner>=1,<2.
  • Depends on pyyaml>=3.12,<6.
  • Depends on apache-beam[gcp]>=2.21,<3.
  • Depends on grpcio>=2.18.1,<3.
  • Depends on kubernetes>=10.0.1,<12.
  • Depends on tensorflow>=1.15,!=2.0.*,<3.
  • Depends on tensorflow-data-validation>=0.22.0,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.1,<0.23.0.
  • Depends on tensorflow-transform>=0.22.0,<0.23.0.
  • Depends on tfx-bsl>=0.22.0,<0.23.0.
  • Depends on ml-metadata>=0.22.0,<0.23.0.

Breaking changes

For pipeline authors

  • Changed custom config for the Do function of Trainer and Pusher to accept
    a JSON-serialized dict instead of a dict object. This also impacts all the
    Do functions under tfx.extensions.google_cloud_ai_platform and
    tfx.extensions.google_cloud_big_query_ml. Note that this breaking
    change occurs at the signature of the executor's Do function. Therefore, if
    the user did not customize the Do function, and the compile time SDK version
    is aligned with the run time SDK version, previous pipelines should still
    work as intended. If the user is using a custom component with customized
    Do function, custom_config should be assumed to be a JSON-serialized
    string from next release.
  • For users of BigQueryExampleGen, --temp_location is now a required Beam
    argument, even for DirectRunner. Previously this argument was only required
    for DataflowRunner. Note that the specified value of --temp_location
    should point to a Google Cloud Storage bucket.
  • Revert current per-component cache API (with enable_cache, which was only
    available in tfx>=0.21.3,<0.22), in preparing for a future redesign.

For component authors

  • Converted the BaseNode class attributes to the constructor parameters. This
    won't affect any components derived from BaseComponent.

Documentation updates

  • N/A

Deprecations

  • Deprecating Py2 support