Skip to content

Latest commit

 

History

History
101 lines (98 loc) · 11.1 KB

FAQ.md

File metadata and controls

101 lines (98 loc) · 11.1 KB

H2O Driverless AI Bring Your Own Recipes

FAQ

Why do I need to bring my own recipes? Isn't Driverless AI smart enough of the box?

The only way to find out is to try. Most likely you'll be able to improve performance with custom recipes. Domain knowledge and intuition are essential to getting the best possible performance.

What are some example recipes?

  • Look at the examples in this repository. Some illustrative samples:
  • Transformer:
    • Suppose you have a string column that has values like "A:B:10:5", "A:C:4:10", .... It might make sense to split these values by ":" and create four output columns, potentially all numeric, such as [0,1,10,5], [0,2,4,10], ... to encode the information more clearly for the algorithm to learn better from.
    • PyTorch deep learning model for text similarity analysis, computes a similary score for any given two text input columns.
    • ARIMA model for time-series forecasting.
    • Data augmentation, such as replacing a zip code with demographic information, or replacing a date column with a National holiday flag.
  • Model:
  • Scorer:
    • Maybe you want to optimize your predictions for the top decile for a regression problem..
    • Maybe you care about the false discovery rate for a binary classification problem.
  • Explainer:
    • Create custom recipes for model interpretability, fairness, robustness, explanations
    • Create custom plots, charts, markdown reports, etc.

Driverless is good enough for me, I don't want to do recipes.

Perfect. Relax and sit back. We'll keep making Driverless AI better and better with every version, so you don't have to. Several of the recipes in this repository will likely be included in future releases of Driverless AI out of the box, after more performance improvements and hardening.

What's in it for me if I write a recipe?

You will get better at doing data science and you will get better results. Writing code is essential to improving your data science skills. Especially when writing data science code. Recipes are perfect for that.

Who can make recipes?

Anyone who can or wants to. Mostly data scientists or developers. Some of the best recipes are trivial and make a big difference, like custom scorers.

What do I need to make a recipe?

A text editor. All you need is to create a .py text file containing source code.

How do I start?

  • Examine the references below for the API specification and architecture diagrams.
  • Look at the examples in this repository.
  • Clone this repository and make modifications to existing recipes.
  • Start an experiment and upload the recipe in the expert settings of an experiment.

What version of Python does Driverless AI use?

Driverless AI uses Python version 3.6, so all custom recipes will run with Python 3.6 as well.

How do I know whether my recipe works?

Driverless AI will tell you whether it makes the cut:

  • First, it is subjected to acceptance tests. If it passes, great. If not, Driverless AI provides you with feedback on how to improve it.
  • Then, you can choose to include it in your experiment(s). It will decide which recipes are best suited to solve the problem. At worst, you can cause the experiment to slow down.

How can I debug my recipe?

  • The easiest way (for now) is to keep uploading it to the expert settings in Driverless AI until the recipe is accepted.
  • Another way is to do minimal changes as shown in this debugging example and use PyCharm or a similar Python debugger.

What happens if my recipe is rejected during upload?

  • Read the entire error message, it most likely contains the stack trace and helpful information on how to fix the problem.
  • If you can't figure out how to fix the recipe, we suggest you post your questions in the Driverless AI community Slack channel
  • You can also send us your experiment logs zip file, which will contain the recipe source files.

What happens if my transformer recipe doesn't lead to the highest variable importance for the experiment?

That's nothing to worry about. It's unlikely that your features have the strongest signal of all features. Even 'magic' Kaggle grandmaster features don't usually make a massive difference, but they still beat most of the competition.

What happens if my recipe is not used at all by the experiment?

  • Don't give up. You learned something.
  • Check the logs for failures if unsure whether the recipe worked at all or not.
  • Driverless AI will ignore recipe failures unless this robustness feature is specifically disabled. Under Expert Settings, disable skip_transformer_failures and skip_model_failures if you want to fail the experiment on any unexpected errors due to custom recipes.
  • Inside the experiment logs zip file, there's a folder called details and if it contains .stack files with stacktraces referring to your custom code, then you know it bombed.

Can I write recipes in Go, C++, Java or R?

If you can hook it up to Python, then yes. We have many recipes that use Java and C++ backends. Most of Driverless AI uses C++ backends.

Is there a difference between a custom recipe and the recipes shipped with Driverless AI?

No. Same code base. No performance penalty. No calling overhead. Same inputs and outputs.

Why are some models implemented as transformers?

Separating of work. With the transformer API, we can replace only the particular input column(s) with out-of-fold estimates of the target column. All other columns (features) can be processed by other transformers. The combined union of all features is then passed to the model(s) which can yield higher accuracy than a model that only sees the particular input column(s). For more information about the flow of data, see the technical references section.

How can I control which custom recipes are active, and how can I disable all custom recipes?

Recipes are meant to be built by people you trust and each recipe should be code-reviewed before going to production. If you don't want custom code to be executed by Driverless AI, set enable_custom_recipes=false in the config.toml, or add the environment variable DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES=0 at startup of Driverless AI. This will disable all custom transformers, models and scorers. If you want to keep all previously uploaded recipes enabled and disable the upload of any new recipes, set enable_custom_recipes_upload=false or DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES_UPLOAD=0 at startup of Driverless AI.

What if I keep changing the same recipe over and over?

If you upload a new version of a recipe, it will become the new default version for that recipe. Previously run experiments using older versions of that recipe will continue to work, and use the older version. New experiments will use the new version.

Who can see my recipe?

Everyone with access to the Driverless AI instance can run all recipes, even if they were uploaded by someone else. Recipes remains on the instance that runs Driverless AI. Experiment logs may contain relevant information about your recipes (such as their source code), so double-check before you share them.

How do I delete all recipes on my instance?

If you really need to delete all recipes, you can delete the contrib folder inside the data_directory (usually called tmp) and restart Driverless AI. Caution: Previously created experiments using custom recipes will not be able to make predictions any longer, so this is not recommended unless you also delete all related experiments as well.

Are MOJOs supported for experiments that use custom recipes?

In most cases (especially for complex recipes), MOJOs won’t be available out of the box. But, it is possible to get the MOJO. Contact support@h2o.ai for more information about creating MOJOs for custom recipes. (Note: The Python Scoring Pipeline features full support for custom recipes.)

How do I share my recipe with the world?

We encourage you to share your recipe in this repository. If your recipe works, please make a pull request and improve the experience for everyone!

References

Custom Transformers

Custom Models

  • sklearn API
    • Implement fit(X, y) for batch training on the training data
    • Implement predict(X) for row-by-row predictions on validation and test data
  • Custom Model Recipe SDK/API Reference

Custom Scorers

Custom Explainers

Technical Architecture Diagram

Webinar

Webinar: Extending the H2O Driverless AI Platform with Your Recipes Website: H2O Driverless AI Recipes