Release 0.21.0rc0
Pre-release
Pre-release
Version 0.21.0rc0
Major Features and Improvements
- TFX version 0.21.0 will be the last version of TFX supporting Python 2.
- Added support for
RuntimeParameter
s to allow users can specify templated
values at runtime. This is currently only supported in Kubeflow Pipelines.
Currently, only attributes inComponentSpec.PARAMETERS
and the URI of
external artifacts can be parameterized (component inputs / outputs can
not yet be parameterized). See
tfx/examples/chicago_taxi_pipeline/taxi_pipeline_runtime_parameter.py
for example usage. - Users can access the parameterized pipeline root when defining the
pipeline by using thepipeline.ROOT_PARAMETER
placeholder in
KubeflowDagRunner. - Users can pass appropriately encoded Python
dict
objects to specify
protobuf parameters inComponentSpec.PARAMETERS
; these will be decoded
into the proper protobuf type. Users can avoid manually constructing complex
nested protobuf messages in the component interface. - Added support in Trainer for using other model artifacts. This enables
scenarios such as warm-starting. - Updated trainer executor to pass through custom config to the user module.
- Artifact type-specific properties can be defined through overriding the
PROPERTIES
dictionary of atypes.artifact.Artifact
subclass. - Added new example of chicago_taxi_pipeline on Google Cloud Bigquery ML.
- Added support for multi-core processing in the Flink and Spark Chicago Taxi
PortableRunner example. - Added a metadata adapter in Kubeflow to support logging the Argo pod ID as
an execution property. - Added a prototype Tuner component and an end-to-end iris example.
- Created new generic trainer executor for non estimator based model, e.g.,
native Keras. - Updated to support passing
tfma.EvalConfig
in evaluator when calling TFMA. - Users can create a pipeline using a new experimental CLI command,
template
.
Bug fixes and other changes
- Added support for an hparams artifact as an input to Trainer in
preparation for tuner support. - Refactored common dependencies in the TFX dockerfile to a base image to
improve the reliability of image building process. - Fixes missing Tensorboard link in KubeflowDagRunner.
- Depends on
apache-beam[gcp]>=2.17,<3
. - Depends on
ml-metadata>=0.21,<0.22
. - Depends on
tensorflow-data-validation>=0.21,<0.22
. - Depends on
tensorflow-model-analysis>=0.21,<0.22
. - Depends on
tensorflow-transform>=0.21,<0.22
. - Depends on
tfx-bsl>=0.21,<0.22
. - Depends on
pyarrow>=0.14,<0.15
. - Removed
tf.compat.v1
usage for iris and cifar10 examples. - CSVExampleGen: started using the CSV decoding utilities in
tfx-bsl
(tfx-bsl>=0.15.2
) - Fixed problems with Airflow tutorial notebooks.
- Added performance improvements for the Transform Component (for statistics
generation). - Raised exceptions when container building fails.
- Enhanced custom slack component by adding a kubeflow example.
- Allowed windows style paths in Transform component cache.
- Fixed bug in CLI (--engine=kubeflow) which uses hard coded obsolete image
(TFX 0.14.0) as the base image. - Fixed bug in CLI (--engine=kubeflow) which could not handle skaffold
response when an already built image is reused. - Allowed users to specify the region to use when serving with AI Platform.
- Allowed users to give deterministic job id to AI Platform Training job.
- System-managed artifact properties ("name", "state", "pipeline_name" and
"producer_component") are now stored as ML Metadata artifact custom
properties. - Fixed loading trainer and transformation functions from python module files
without the .py extension. - Fixed some ill-formed visualization when running on KFP.
- Removed system info from artifact properties and use channels to hold info
for generating MLMD queries. - Rely on MLMD context for inter-component artifact resolution and execution
publishing. - Added pipeline level context and component run level context.
- Included test data for examples/chicago_taxi_pipeline in package.
- Changed
BaseComponentLauncher
to require the user to pass in an ML
Metadata connection object instead of a ML Metadata connection config. - Capped version of Tensorflow runtime used in Google Cloud integration to
1.15. - Updated Chicago Taxi example dependencies to Beam 2.17.0, Flink 1.9.1, Spark
2.4.4. - Fixed an issue where
build_ephemeral_package()
used an incorrect path to
locate thetfx
directory. - The ImporterNode now allows specification of general artifact properties.
- Added 'tfx_executor', 'tfx_version' and 'tfx_py_version' labels for CAIP,
BQML and Dataflow jobs submitted from TFX components.
Deprecations
Breaking changes
For pipeline authors
- Standard artifact TYPE_NAME strings were reconciled to match their class
names intypes.standard_artifacts
. - The "split" property on multiple artifacts has been replaced with the
JSON-encoded "split_names" property on a single grouped artifact. - The execution caching mechanism was changed to rely on ML Metadata
pipeline context. Existing cached executions will not be reused when running
on this version of TFX for the first time. - The "split" property on multiple artifacts has been replaced with the
JSON-encoded "split_names" property on a single grouped artifact.
For component authors
- Artifact type name strings to the
types.artifact.Artifact
and
types.channel.Channel
classes are no longer supported; usage here should
be replaced with references to the artifact subclasses defined in
types.standard_artfacts.*
or to custom subclasses of
types.artifact.Artifact
.