Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data analysis steps: data-cleaning, data-outlier-detection #30

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

memona008
Copy link

@memona008 memona008 commented Mar 19, 2024

This pull request introduces two new library steps aimed at enhancing data preprocessing and outlier detection capabilities within our project.

Step 1: Data Cleaning

Implemented a data cleaning step capable of handling various parameters:
remove_null: Removes null values from the dataset if enabled.
null_lookup_columns: Allows specifying columns for null value lookup, providing flexibility in data cleansing.
duplicate_lookup_columns: Facilitates specifying columns for duplicate value lookup, enhancing data integrity checks.
clear_formatting: Offers an option to clear formatting from the dataset for consistency.
output_file_name: Enables customization of the cleaned output file name and path.
remove_duplicate_rows: Incorporates functionality to eliminate duplicate rows for streamlined data processing.

Step 2: Outlier Detection

Developed an outlier detection step employing four methods:

Z-score:

Identifies outliers based on standard deviation from the mean.

IQR (Interquartile Range):

Detects outliers using the range between the first and third quartiles.

Isolation Forest:

Implements an ensemble method for detecting anomalies in data points.

Autoencoder:

Utilizes deep learning techniques to reconstruct input data, flagging outliers based on reconstruction error.


Additionally, the step generates visualizations including

  • Scatter plot
  • Box plot
  • Histogram
    to aid in outlier analysis and interpretation via visualizing the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant