-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Debugging * Remove the _Slice implementation Instead of creating custom partition functions, use the already implemented __iter__ functions and a common (optimized) partitioning function. The problems with the old implementation was the larger memory usage and a larger number of iterations through the data. * Sort after grouping, which might improve grouping speed * Remove the new unneeded class * style formatting * Changelog * Dask Integration (#736) * Dask integration, unfinished * Fix test * Added dask tests * Improve the feature extraction test * Reworked the documentation for the new features * Changelog * Stylefix * Forget to add a class * Increase test coverage * Update feature_extraction_settings.rst (#740) minimum/maximum are valid feature_calculators instead of min/max https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html?highlight=extract_features#tsfresh.feature_extraction.feature_calculators.maximum * Use a better download library (#741) * Closes #743 (#744) * Closes #743 * Adds issue (#743) info to changelog * Fix the failure with the latest statsmodels installed (#749) * limits lag length to 50% of sample size in `partial_autocorrelation`. * try to fix ut * fix ut Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix #742, while taking into account the differences between Python's indexing of vectors and Matlab's indexing (cf. Bastia et al (2004), Eq. 1) * Update docs/text/data_formats.rst Co-authored-by: HaveF <iamaplayer@gmail.com> Co-authored-by: patrjon <46594327+patrjon@users.noreply.github.com> Co-authored-by: He Kaisheng <heks93@163.com> Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> Co-authored-by: akem134@elan <a.kempa-liehr@auckland.ac.nz> Co-authored-by: HaveF <iamaplayer@gmail.com> Co-authored-by: patrjon <46594327+patrjon@users.noreply.github.com> Co-authored-by: He Kaisheng <heks93@163.com> Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> Co-authored-by: akem134@elan <a.kempa-liehr@auckland.ac.nz>
- Loading branch information
1 parent
babed38
commit 55a1e57
Showing
18 changed files
with
760 additions
and
316 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
.. _large-data-label: | ||
|
||
Large Input Data | ||
================ | ||
|
||
If you are dealing with large time series data, you are facing multiple problems. | ||
Thw two most important ones are | ||
* long execution times for feature extraction | ||
* large memory consumptions, even beyond what a single machine can handle | ||
|
||
To solve only the first problem, you can parallelize the computation as described in :ref:`tsfresh-on-a-cluster-label`. | ||
Please note, that parallelization on your local computer is already turned on by default. | ||
|
||
However, for even larger data you need to handle both problems at once. | ||
You have multiple possibilities here: | ||
|
||
Dask - the simple way | ||
--------------------- | ||
|
||
*tsfresh* accepts a `dask dataframe <https://docs.dask.org/en/latest/dataframe.html>`_ instead of a | ||
pandas dataframe as input for the :func:`tsfresh.extract_features` function. | ||
Dask dataframes allow you to scale your computation beyond your local memory (via partitioning the data internally) | ||
and even to large clusters of machines. | ||
Its dataframe API is very similar to pandas dataframes and might even be a drop-in replacement. | ||
|
||
All arguments discussed in :ref:`data-formats-label` are also valid for the dask case. | ||
The input data will be transformed into the correct format for *tsfresh* using dask methods | ||
and the feature extraction will be added as additional computations to the computation graph. | ||
You can then add additional computations to the result or trigger the computation as usual with ``.compute()``. | ||
|
||
.. NOTE:: | ||
|
||
The last step of the feature extraction is to bring all features into a tabular format. | ||
Especially for very large data samples, this computation can be a large | ||
performance bottleneck. | ||
We therefore recommend to turn the pivoting off, if you do not really need it | ||
and work with the unpivoted data as much as possible. | ||
|
||
For example, to read in data from parquet and do the feature extraction: | ||
|
||
.. code:: | ||
import dask.dataframe as dd | ||
from tsfresh import extract_features | ||
df = dd.read_parquet(...) | ||
X = extract_features(df, | ||
column_id="id", column_sort="time", | ||
pivot=False) | ||
result = X.compute() | ||
Dask - more control | ||
------------------- | ||
|
||
The feature extraction method needs to perform some data transformations, before it | ||
can call the actual feature calculators. | ||
If you want to optimize your data flow, you might want to have more control on how | ||
exactly the feature calculation is added to you dask computation graph. | ||
|
||
Therefore, it is also possible to add the feature extraction directly: | ||
|
||
|
||
.. code:: | ||
from tsfresh.convenience.bindings import dask_feature_extraction_on_chunk | ||
features = dask_feature_extraction_on_chunk(df_grouped, | ||
column_id="id", | ||
column_kind="kind", | ||
column_sort="time", | ||
column_value="value") | ||
In this case however, ``df_grouped`` must already be in the correct format. | ||
Check out the documentation of :func:`tsfresh.convenience.bindings.dask_feature_extraction_on_chunk` | ||
for more information. | ||
No pivoting will be performed in this case. | ||
|
||
PySpark | ||
------- | ||
|
||
Similar to dask, it is also possible to ass the feature extraction into a Spark | ||
computation graph. | ||
You can find more information in the documentation of :func:`tsfresh.convenience.bindings.spark_feature_extraction_on_chunk`. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.