How about input and output activation functions changing? #22

picolino · 2020-05-17T19:48:06Z

Hello. I was really delighted by this new type of structual optimization of neural networks.
Thank you for your job, it is really awesome. 👏

Currently, I making detailed research about architectures that can be generated by using WANN algorithms within classification tasks. And at some point i've tried to change activation functions in input and output layer and I've got interesting results:

Experiments

Applied hyperparameters for all experiments:

Weights set: -2, -1, -0.5, 0.5, 1, 2
Max generations: 1024
Population size: 512
Rank by performance only \ by network complexity: 20% \ 80%
Add connection probability: 20%
Add neuron probability: 25%
Change activation function probability: 50%
Enable disabled connection probability: 5%
Keep best species in next population (elitism): 20
Destroy bad species in next population (cull): 20

XOR

Experiment 1:

Generated architecture without changing activation functions in input and output layer:

Mean error (for all weights): 0

Experiment 2:

Generated architecture with changing activation functions in input and output layer:

Mean error (for all weights): 0

Straight lines detection

Inputs: 3x3 square images
Outputs: 2 squares on the right side of each set.

If horizontal line only exists - output must be (1, 0).
If vertical line only exists - output must be (0, 1).
If both of it exists - output must be (1, 1).
If noone straight line exists - (0, 0).

Target: Teach neural network to detect straight (black) lines in 3x3 image.

Experiment 1:

Generated architecture without changing activation functions in input and output layer:

Mean error (for all weights): 0.0455

Experiment 2:

Generated architecture with changing activation functions in input and output layer:

Mean error (for all weights): 0.0469

Conclusions

Changing activation functions in input and output layers could reduce complexity without loss of accuracy.

It may reduce required computations.

I guess this is because connections that goes from input to hidden and from hidden to output nodes. In some tasks they really can interfer optimization, so synthesis algorithm must "destroy" them by adding additional nodes and connections.

P.S. I really hope my investigation could help for improving this awesome neural networks structural synthesis algorithm.

❤

agaier · 2020-05-26T19:48:45Z

Hi!

This is an interesting idea! In the CPPN case (with different activations but trained weights) if the output or input would benefit from having a different activation, it can be found by placing a node with that activation directly in front or behind --- but in the WANN case this complicates things a bit. Using them directly on the inputs is especially interesting. In the swing-up case we saw in the best solutions that the 'd_theta' input was only ever connected to the network through a Gaussian activation:

this is a symmetric activation, so it only produced signal based on whether it was moving or not, disregarding any directional information. it is possible directly connecting to the input could also do some useful preprocessing of the inputs.

I have to say I was surprised by how much simpler the 'line detector' is! These experiments, to develop 'kernels' like this for convolution was something we thought about trying and I think is very promising --- nice work!

Also, the plots are beautiful, what did you use to create them?

picolino · 2020-05-26T20:18:55Z

Thanks for your reply! 👏

I'm glad you appreciate my research.
I hope it will be useful for someone. 🤗

I used Figma to create these plots.
They are created manually as yours, based on neural network snapshot at the end of the training.

maraoz · 2020-07-14T07:38:18Z

I'm just here to say this is the highest quality GitHub issue I ever read. 😮

Deathgar · 2020-09-18T09:32:29Z

Hello @picolino . Could you please explain to me how the network works in the first XOR test? I do not understand how you get the right answer.

If we take the first neural network and transfer (1,1) to it and take the weight: -2, we get:
(If the function is inverse, this is f (x) to the -1 power)

First hidden layer (neuron with Inverse func): (1*(-2) + 1*(-2))^-1 = -0.25
Second hidden layer (neuron with Squared func): (-0.25*(-2))^2 = 0.25
Output layer: 0.25*-2 = -0.5
-0.5 != 0;
error = 0.5;

But not 0. Explain, if I misunderstood something, please.

plonerma mentioned this issue May 18, 2020

Test full vs empty initial individuals plonerma/wann-genetic#54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How about input and output activation functions changing? #22

How about input and output activation functions changing? #22

picolino commented May 17, 2020 •

edited

Loading

agaier commented May 26, 2020

picolino commented May 26, 2020

maraoz commented Jul 14, 2020

Deathgar commented Sep 18, 2020

How about input and output activation functions changing? #22

How about input and output activation functions changing? #22

Comments

picolino commented May 17, 2020 • edited Loading

Experiments

XOR

Experiment 1:

Experiment 2:

Straight lines detection

Experiment 1:

Experiment 2:

Conclusions

agaier commented May 26, 2020

picolino commented May 26, 2020

maraoz commented Jul 14, 2020

Deathgar commented Sep 18, 2020

picolino commented May 17, 2020 •

edited

Loading