Skip to content

Commit

Permalink
Merge pull request #449 from yzhao062/development
Browse files Browse the repository at this point in the history
v1.0.6
  • Loading branch information
yzhao062 authored Oct 24, 2022
2 parents f6029d5 + 4bcb564 commit 6248475
Show file tree
Hide file tree
Showing 13 changed files with 835 additions and 54 deletions.
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,5 @@ v<1.0.4>, <07/29/2022> -- Add LUNAR (#415).
v<1.0.5>, <07/29/2022> -- Import optimization.
v<1.0.5>, <08/27/2022> -- Code optimization.
v<1.0.5>, <09/14/2022> -- Add ALAD.
v<1.0.6>, <09/23/2022> -- Update ADBench benchmark for NeruIPS 2022.
v<1.0.6>, <10/23/2022> -- ADD KPCA.
41 changes: 28 additions & 13 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,11 @@ Python Outlier Detection (PyOD)

-----

**News**: We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
**For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.

PyOD is the most comprehensive and scalable **Python library** for **detecting outlying objects** in
multivariate data. This exciting yet challenging field is commonly referred as
Expand All @@ -68,7 +71,7 @@ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
`KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
Expand Down Expand Up @@ -114,20 +117,29 @@ If you use PyOD in a scientific publication, we would appreciate
citations to the following paper::

@article{zhao2019pyod,
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
title = {PyOD: A Python Toolbox for Scalable Outlier Detection},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {96},
pages = {1-7},
url = {http://jmlr.org/papers/v20/19-011.html}
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
title = {PyOD: A Python Toolbox for Scalable Outlier Detection},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {96},
pages = {1-7},
url = {http://jmlr.org/papers/v20/19-011.html}
}

or::

Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.

If you want more general insights of anomaly detection and/or algorithm performance comparison, please see our
NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_::

@inproceedings{han2022adbench,
title={ADBench: Anomaly Detection Benchmark},
author={Songqiao Han and Xiyang Hu and Hailiang Huang and Mingqi Jiang and Yue Zhao},
booktitle={Neural Information Processing Systems (NeurIPS)}
year={2022},
}

**Key Links and Resources**\ :

Expand Down Expand Up @@ -238,8 +250,8 @@ Key Attributes of a fitted model:
ADBench Benchmark
^^^^^^^^^^^^^^^^^

We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_ [#Han2022ADBench]_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
We just released a 45-page, the most comprehensive `ADBench: Anomaly Detection Benchmark <https://arxiv.org/abs/2206.09426>`_ [#Han2022ADBench]_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

The organization of **ADBench** is provided below:

Expand Down Expand Up @@ -342,6 +354,7 @@ Probabilistic KDE Outlier Detection with Kernel Density F
Probabilistic Sampling Rapid distance-based outlier detection via sampling 2013 [#Sugiyama2013Rapid]_
Probabilistic GMM Probabilistic Mixture Modeling for Outlier Analysis [#Aggarwal2015Outlier]_ [Ch.2]
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 [#Shyu2003A]_
Linear Model KPCA Kernel Principal Component Analysis 2007 [#Hoffmann2007Kernel]_
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 [#Hardin2004Outlier]_ [#Rousseeuw1999A]_
Linear Model CD Use Cook's distance for outlier detection 1977 [#Cook1977Detection]_
Linear Model OCSVM One-Class Support Vector Machines 2001 [#Scholkopf2001Estimating]_
Expand Down Expand Up @@ -564,6 +577,8 @@ Reference
.. [#He2003Discovering] He, Z., Xu, X. and Deng, S., 2003. Discovering cluster-based local outliers. *Pattern Recognition Letters*\ , 24(9-10), pp.1641-1650.
.. [#Hoffmann2007Kernel] Hoffmann, H., 2007. Kernel PCA for novelty detection. Pattern recognition, 40(3), pp.863-874.
.. [#Iglewicz1993How] Iglewicz, B. and Hoaglin, D.C., 1993. How to detect and handle outliers (Vol. 16). Asq Press.
.. [#Janssens2012Stochastic] Janssens, J.H.M., Huszár, F., Postma, E.O. and van den Herik, H.J., 2012. Stochastic outlier selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.
Expand Down
41 changes: 27 additions & 14 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,20 @@ Welcome to PyOD documentation!

----

**News**: We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
**For graph outlier detection**, please use `PyGOD <https://pygod.org/>`_.

PyOD is the most comprehensive and scalable **Python library** for **detecting outlying objects** in
multivariate data. This exciting yet challenging field is commonly referred as
`Outlier Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2020). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
academic researches and commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
the latest ECOD (TKDE 2022). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
academic researches and commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
`KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
Expand Down Expand Up @@ -121,20 +124,29 @@ If you use PyOD in a scientific publication, we would appreciate
citations to the following paper::

@article{zhao2019pyod,
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
title = {PyOD: A Python Toolbox for Scalable Outlier Detection},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {96},
pages = {1-7},
url = {http://jmlr.org/papers/v20/19-011.html}
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
title = {PyOD: A Python Toolbox for Scalable Outlier Detection},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {96},
pages = {1-7},
url = {http://jmlr.org/papers/v20/19-011.html}
}

or::

Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.

If you want more general insights of anomaly detection and/or algorithm performance comparison, please see our
NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_::

@inproceedings{han2022adbench,
title={ADBench: Anomaly Detection Benchmark},
author={Songqiao Han and Xiyang Hu and Hailiang Huang and Mingqi Jiang and Yue Zhao},
booktitle={Neural Information Processing Systems (NeurIPS)}
year={2022},
}

**Key Links and Resources**\ :

Expand All @@ -148,8 +160,8 @@ or::
Benchmark
=========

We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
We just released a 45-page, the most comprehensive `ADBench: Anomaly Detection Benchmark <https://arxiv.org/abs/2206.09426>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

The organization of **ADBench** is provided below:

Expand Down Expand Up @@ -178,6 +190,7 @@ Probabilistic KDE Outlier Detection with Kernel Density Fun
Probabilistic Sampling Rapid distance-based outlier detection via sampling 2013 :class:`pyod.models.sampling.Sampling` :cite:`a-sugiyama2013rapid`
Probabilistic GMM Probabilistic Mixture Modeling for Outlier Analysis :class:`pyod.models.gmm.GMM` :cite:`a-aggarwal2015outlier` [Ch.2]
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 :class:`pyod.models.pca.PCA` :cite:`a-shyu2003novel`
Linear Model KPCA Kernel Principal Component Analysis 2007 :class:`pyod.models.kpca.KPCA` :cite:`a-hoffmann2007kernel`
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 :class:`pyod.models.mcd.MCD` :cite:`a-rousseeuw1999fast,a-hardin2004outlier`
Linear Model CD Use Cook's distance for outlier detection 1977 :class:`pyod.models.cd.CD` :cite:`a-cook1977detection`
Linear Model OCSVM One-Class Support Vector Machines 2001 :class:`pyod.models.ocsvm.OCSVM` :cite:`a-scholkopf2001estimating`
Expand Down
9 changes: 9 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,15 @@ pyod.models.knn module
:show-inheritance:
:inherited-members:

pyod.models.kpca module
-----------------------

.. automodule:: pyod.models.kpca
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pyod.models.lmdd module
-----------------------

Expand Down
11 changes: 11 additions & 0 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -467,4 +467,15 @@ @inproceedings{zenati2018adversarially
pages={727--736},
year={2018},
organization={IEEE}
}

@article{hoffmann2007kernel,
title={Kernel PCA for novelty detection},
author={Hoffmann, Heiko},
journal={Pattern recognition},
volume={40},
number={3},
pages={863--874},
year={2007},
publisher={Elsevier}
}
Binary file modified examples/ALL.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 8 additions & 3 deletions examples/compare_all_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,15 @@
'Locally Selective Combination (LSCP)': LSCP(
detector_list, contamination=outliers_fraction,
random_state=random_state),
'INNE': INNE(contamination=outliers_fraction),
'GMM': GMM(contamination=outliers_fraction),
'INNE': INNE(
max_samples=2, contamination=outliers_fraction,
random_state=random_state,
),
'GMM': GMM(contamination=outliers_fraction,
random_state=random_state),
'KDE': KDE(contamination=outliers_fraction),
'LMDD': LMDD(contamination=outliers_fraction),
'LMDD': LMDD(contamination=outliers_fraction,
random_state=random_state),
}

# Show all detectors
Expand Down
66 changes: 66 additions & 0 deletions examples/kpca_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# -*- coding: utf-8 -*-
"""Example of outlier detection based on Kernel PCA.
"""
# Author: Akira Tamamori <tamamori5917@gmail.com>
# License: BSD 2 clause

from __future__ import division, print_function

import os
import sys

from pyod.models.kpca import KPCA
from pyod.utils.data import evaluate_print, generate_data
from pyod.utils.example import visualize

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))


if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points
n_features = 2

# Generate sample data
X_train, X_test, y_train, y_test = generate_data(
n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42,
behaviour="new",
)

# train KPCA detector
clf_name = "KPCA"
clf = KPCA()
clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)

# visualize the results
visualize(
clf_name,
X_train,
y_train,
X_test,
y_test,
y_train_pred,
y_test_pred,
show_figure=True,
)
Loading

0 comments on commit 6248475

Please sign in to comment.