Auto feature engineering targets to simplify Feature engineering process with enhanced performance via parallel data processing frameworks, automated data processing pipeline and built-in domain-specific feature engineering primitives. This repository provides an end-to-end workflow that automatically analyzes the data based on data type, profiles feature distribution, generates customizable feature engineering pipelines for the data preparation and executes the pipeline parallelly with different backend engines on Intel platform.
Steps explained:
- Feature profile: Analyze raw tabular dataset to infer original feature based on data type and generate FeatureList.
- Feature engineering: Use inferred FeatureList to generate Data Pipeline in Json/Yaml File format.
- Feature transformation: Convert Data Pipeline to executable operations and transform original features to candidate features with selected engine, currently Pandas and Spark were supported.
- Feature Importance Estimator: perform feature importance analysis on candidate features to remove un-important features, generate the transfomred dataset that includes all finalize features that will be used for training.
DEBIAN_FRONTEND=noninteractive apt-get install -y openjdk-8-jre graphviz
pip install pyrecdp[autofe] --pre
Only 3 lines of codes to generate new features for your tabular data. Usually 5x new features can be found with up to 1.2x accuracy boost
from pyrecdp.autofe import AutoFE
pipeline = AutoFE(dataset=train_data, label=target_label, time_series = 'Day')
transformed_train_df = pipeline.fit_transform()
Workflow Name | Description |
---|---|
NYC taxi fare | Fare prediction based on NYC taxi dataset |
Amazon Product Review | Product recommandation based on reviews from Amazon |
IBM Card Transaction Fraud Detect | Recognize fraudulent credit card transactions |
Recsys 2023 | Real-world task of tweet engagement prediction |
Outbrain | Click prediction for recommendation system |
Covid19 TabUtils | integration example with Tabular Utils |
PredictiveAssetsMaintenance | integration example with predictive assets maintenance use case |