Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vgg16 #25

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions models/ResNet50/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Infant Cry Classification ResNet50 Model

# Overview
This repository contains a ResNet50 model for classifying infant cry sounds. The model achieved an accuracy of 84.273% on the test dataset, showcasing its ability to capture intricate features of infant cry patterns.

# Model Architecture
The ResNet50 architecture is designed to facilitate training of very deep networks. It includes residual blocks that enable the training of deeper networks without the vanishing gradient problem.


Model: "resnet50"
__________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==========================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0 input_1[0][0]
__________________________________________________________________________________________
conv1_conv (Conv2D) (None, 112, 112, 64 9472 conv1_pad[0][0]
...
__________________________________________________________________________________________
dense_3 (Dense) (None, 1) 2049 global_average_pooling2d_1[0][0]
==========================================================================================
Total params: 23,587,713
Trainable params: 23,534,593
Non-trainable params: 53,120

# Dataset
The model was trained on a diverse dataset containing recordings of infant cry sounds. The dataset includes various cry patterns and non-cry sounds to ensure robust classification.

# Training
The ResNet50 model was trained using TensorFlow and Keras with an Adam optimizer. The training process involved data augmentation techniques to enhance model generalization. The training accuracy reached 90%, while the validation accuracy reached 88%.

# Evaluation
The model achieved an accuracy of 84.273% on the test dataset even though dataset is bit imbalanced, highlighting its ability to accurately classify infant cry sounds. The model's precision, recall, and F1-score metrics are commendable.

# Usage
To use the trained ResNet50 model for inference, you can load the model weights using the provided script:


python load_resnet50_model.py --weights path/to/resnet50_weights.h5 --audio path/to/test_audio.wav
Replace path/to/resnet50_weights.h5 with the path to the saved model weights and path/to/test_audio.wav with the path to the audio file you want to classify.

# Acknowledgments
I would like to express our gratitude to the Maintainers and data providers who made this project possible.
1 change: 1 addition & 0 deletions models/ResNet50/cry-analyzer-using-resnet50.ipynb

Large diffs are not rendered by default.

40 changes: 40 additions & 0 deletions models/VGG16/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Infant Cry Classification VGG16 Model

# Overview
This repository contains a VGG16 model for classifying infant cry sounds. The model achieved an accuracy of 84.593% on the test dataset, demonstrating its effectiveness in capturing intricate features of infant cry patterns.

# Model Architecture
VGG16 is a convolutional neural network architecture that gained popularity for its simplicity and effectiveness. It comprises multiple convolutional and max-pooling layers, followed by fully connected layers.

Model: "vgg16"
__________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==========================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 input_1[0][0]
...
__________________________________________________________________________________________
dense_3 (Dense) (None, 1) 513 dense_2[0][0]
==========================================================================================
Total params: 138,357,513
Trainable params: 138,357,513
Non-trainable params: 0

# Dataset
The model was trained on a diverse dataset containing recordings of infant cry sounds. The dataset includes various cry patterns and non-cry sounds to ensure robust classification.

# Training
The VGG16 model was trained using TensorFlow and Keras with an Adam optimizer. The training process involved data augmentation techniques to enhance model generalization. The training accuracy reached 91%, while the validation accuracy reached 89%.

# Evaluation
The model achieved an accuracy of 84.593% on the test dataset, showcasing its ability to accurately classify infant cry sounds. The model's precision, recall, and F1-score metrics are impressive.

# Usage
To use the trained VGG16 model for inference, you can load the model weights using the provided script:

python load_vgg16_model.py --weights path/to/vgg16_weights.h5 --audio path/to/test_audio.wav
Replace path/to/vgg16_weights.h5 with the path to the saved model weights and path/to/test_audio.wav with the path to the audio file you want to classify.

# Acknowledgments
I would like to express my gratitude to the contributors and data providers who made this project possible.
Loading