TFX 1.0.0-rc2
Pre-release
Pre-release
Major Features and Improvements
- Added tfx.v1 Public APIs, please refer to
API doc for details. - Transform component now computes pre-transform and post-transform statistics
and stores them in new, indvidual outputs ('pre_transform_schema',
'pre_transform_stats', 'post_transform_schema', 'post_transform_stats',
'post_transform_anomalies'). This can be disabled by setting
disable_statistics=True
in the Transform component. - BERT cola and mrpc examples now demonstrate how to calculate statistics for
NLP features. - TFX CLI now supports
Vertex Pipelines.
use it with--engine=vertex
flag. - Telemetry: Only first-party tfx component's executor telemetry will be
collected. All other executors will be recorded asthird_party_executor
.
For labels longer than 63, keep first 63 characters (instead of last 63
characters before). - Supports text type (use proto json string format) RuntimeParam for protos.
- Combined/moved taxi's runtime_parameter, kubeflow_local and kubleflow_gcp
example pipelines into one penguin_pipeline_kubeflow example - Transform component now supports passing
stats_options_updater_fn
directly
as well as through the module file. - Placeholders support accessing artifact property and custom property.
- Removed the extra node information in IR for KubeflowDagRunner, to reduce
size of generated IR.
Breaking Changes
- Removed unneccessary default values for required component input Channels.
- The
_PropertyDictWrapper
internal wrapper forcomponent.inputs
and
component.outputs
was removed:component.inputs
andcomponent.outputs
are now unwrapped dictionaries, and the attribute accessor syntax (e.g.
components.outputs.output_name
) is no longer supported. Please use the
dictionary indexing syntax (e.g.components.outputs['output_name']
)
instead.
For Pipeline Authors
- N/A
For Component Authors
- Apache Beam support is migrated from TFX Base Components and Executors to
dedicated Beam Components and Executors.BaseExecutor
will no longer embed
beam_pipeline_args
. Custom executors for Beam powered components should
now extend BaseBeamExecutor instead of BaseExecutor.
Deprecations
- Deprecated nested RuntimeParam for Proto, Please use text type (proto json
string) RuntimeParam instead of Proto dict with nested RuntimeParam in it.
Bug Fixes and Other Changes
- Forces keyword arguments for AirflowComponent to make it compatible with
Apache Airflow 2.1.0 and later. - Fixed issue where passing
analyzer_cache
totfx.components.Transform
before there are any Transform cache artifacts published would fail. - Included type information according to PEP-561. However, protobuf generated
files don't have type information, and you might need to ignore errors from
them. For example, if you are usingmypy
, see
the related doc. - Removed
six
dependency. - Depends on
apache-beam[gcp]>=2.29,<3
. - Depends on
google-cloud-bigquery>=1.28.0,<2.21
- Depends on
ml-metadata>=1.0.0,<1.1.0
. - Depends on
protobuf>=3.13,<4
. - Depends on
struct2tensor>=0.31.0,<0.32.0
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3
. - Depends on
tensorflow-data-validation>=1.0.0,<1.1.0
. - Depends on
tensorflow-hub>=0.9.0,<0.13
. - Depends on
tensorflowjs>=3.6.0,<4
. - Depends on
tensorflow-model-analysis>=0.31.0,<0.32.0
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3
. - Depends on
tensorflow-transform>=1.0.0,<1.1.0
. - Depends on
tfx-bsl>=1.0.0,<1.1.0
.
Documentation Updates
- Update the Guide of TFX to adopt 1.0 API.
- TFT and TFDV component documentation now describes how to
configure pre-transform and post-transform statistics, which can be used for
validating text features.