Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about input and output activation functions changing? #22

Open
picolino opened this issue May 17, 2020 · 4 comments
Open

How about input and output activation functions changing? #22

picolino opened this issue May 17, 2020 · 4 comments

Comments

@picolino
Copy link

picolino commented May 17, 2020

Hello. I was really delighted by this new type of structual optimization of neural networks.
Thank you for your job, it is really awesome. 👏

Currently, I making detailed research about architectures that can be generated by using WANN algorithms within classification tasks. And at some point i've tried to change activation functions in input and output layer and I've got interesting results:

Experiments

Applied hyperparameters for all experiments:

Weights set: -2, -1, -0.5, 0.5, 1, 2
Max generations: 1024
Population size: 512
Rank by performance only \ by network complexity: 20% \ 80%
Add connection probability: 20%
Add neuron probability: 25%
Change activation function probability: 50%
Enable disabled connection probability: 5%
Keep best species in next population (elitism): 20
Destroy bad species in next population (cull): 20

XOR

Experiment 1:

Generated architecture without changing activation functions in input and output layer:

image

Mean error (for all weights): 0

Experiment 2:

Generated architecture with changing activation functions in input and output layer:

image

Mean error (for all weights): 0

Straight lines detection

image
Inputs: 3x3 square images
Outputs: 2 squares on the right side of each set.

If horizontal line only exists - output must be (1, 0).
If vertical line only exists - output must be (0, 1).
If both of it exists - output must be (1, 1).
If noone straight line exists - (0, 0).

Target: Teach neural network to detect straight (black) lines in 3x3 image.

Experiment 1:

Generated architecture without changing activation functions in input and output layer:

image

Mean error (for all weights): 0.0455

Experiment 2:

Generated architecture with changing activation functions in input and output layer:

image

Mean error (for all weights): 0.0469

Conclusions

Changing activation functions in input and output layers could reduce complexity without loss of accuracy.

It may reduce required computations.

I guess this is because connections that goes from input to hidden and from hidden to output nodes. In some tasks they really can interfer optimization, so synthesis algorithm must "destroy" them by adding additional nodes and connections.

P.S. I really hope my investigation could help for improving this awesome neural networks structural synthesis algorithm.

@agaier
Copy link
Collaborator

agaier commented May 26, 2020

Hi!

This is an interesting idea! In the CPPN case (with different activations but trained weights) if the output or input would benefit from having a different activation, it can be found by placing a node with that activation directly in front or behind --- but in the WANN case this complicates things a bit. Using them directly on the inputs is especially interesting. In the swing-up case we saw in the best solutions that the 'd_theta' input was only ever connected to the network through a Gaussian activation:

image

this is a symmetric activation, so it only produced signal based on whether it was moving or not, disregarding any directional information. it is possible directly connecting to the input could also do some useful preprocessing of the inputs.

I have to say I was surprised by how much simpler the 'line detector' is! These experiments, to develop 'kernels' like this for convolution was something we thought about trying and I think is very promising --- nice work!

Also, the plots are beautiful, what did you use to create them?

@picolino
Copy link
Author

Thanks for your reply! 👏

I'm glad you appreciate my research.
I hope it will be useful for someone. 🤗

I used Figma to create these plots.
They are created manually as yours, based on neural network snapshot at the end of the training.

@maraoz
Copy link
Contributor

maraoz commented Jul 14, 2020

I'm just here to say this is the highest quality GitHub issue I ever read. 😮

@Deathgar
Copy link

Hello @picolino . Could you please explain to me how the network works in the first XOR test? I do not understand how you get the right answer.

If we take the first neural network and transfer (1,1) to it and take the weight: -2, we get:
(If the function is inverse, this is f (x) to the -1 power)

First hidden layer (neuron with Inverse func): (1*(-2) + 1*(-2))^-1 = -0.25
Second hidden layer (neuron with Squared func): (-0.25*(-2))^2 = 0.25
Output layer: 0.25*-2 = -0.5
-0.5 != 0;
error = 0.5;

But not 0. Explain, if I misunderstood something, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants