🔥 Latest news
- added Java implementation (@dlidstrom)
- fixed C compilation in Linux (thanks! @LordMhri)
🙌 Seeking assistance! I'm looking for help to add support for missing languages. If you can contribute, I'll gladly accept a PR and give proper credit 💫. It's simpler than you might expect. Just take a look at one of the existing implementations—it's mostly a few for loops. No need to worry about adding tests; I can help with that part.
- 1. Introduction
- 2. Usage
- 3. Training
- 4. Learning
- 5. Implementation Goals
- 6. Reference Implementation
- 7. Using this in your own solution
- 8. References
- 9. Stargazers over time
This repository aims to implement a vanilla neural network in all major programming languages. It is the "hello world" of ai programming. We will implement a fully connected network with a single hidden layer using the sigmoid activation function for both the hidden and the output layer. This kind of network can be used to do hand writing recognition, or other kinds of pattern recognitions, categorizations, or predictions. This is intended as your entry level into ai programming, i.e. for the enthusiast or hobby programmer (you and me). More advanced use cases should look elsewhere as there are infinitely more powerful methods available for the professional.
Disclaimer! Do not expect blazing fast performance. If you have such requirements or expectations then you should definitely look elsewhere. Stay here if you want to learn more about implementing a neural network!
We do not aim to justify the math involved (see [1] if you're interested). We prefer to focus on the code itself and will happily copy a solution from one programming language to another without worrying about the theoretical background.
These usage examples are taken directly from our test implementations. The general flow is to prepare a dataset, create a trainer which contains an empty neural network, and then train the network until a desired prediction accuracy is achived. All of these examples output the final predictions to the console. For any larger dataset you will need to compute the prediction accuracy. One way to do this is to compute the percentage of correct predictions and the average "confidence" of the predictions.
Computing prediction score and confidences
NeuralNetworkInAllLangs/CSharp/Program.cs
Lines 92 to 104 in 4c9c817
Rust
NeuralNetworkInAllLangs/Rust/src/main.rs
Lines 32 to 73 in 4c9c817
F#
NeuralNetworkInAllLangs/FSharp/Program.fs
Lines 38 to 66 in 4c9c817
C#
NeuralNetworkInAllLangs/CSharp/Program.cs
Lines 28 to 58 in 4c9c817
C++
NeuralNetworkInAllLangs/Cpp/main.cpp
Lines 49 to 101 in 4c9c817
C
NeuralNetworkInAllLangs/C/main.c
Lines 46 to 87 in 4c9c817
Kotlin
NeuralNetworkInAllLangs/Kotlin/src/Main.kt
Lines 21 to 60 in 4c9c817
Go
NeuralNetworkInAllLangs/Go/main.go
Lines 67 to 110 in 4c9c817
Java
NeuralNetworkInAllLangs/Java/src/Main.java
Lines 29 to 54 in 572f879
For training and verifying our implementations we will use two datasets.
The first is simple and will be these logical functions: xor, xnor, or, nor,
and, and nand. This truth table represents the values that the network will
learn, given two inputs;
This test is interesting as it shows how flexible a simple neural network can be. There are two inputs, 6 outputs, and it is sufficient to have two hidden neurons. Such a network consists of a total of 24 weights:
- 4 hidden weights (2 inputs * 2 hidden neurons)
- 2 hidden biases (one for each hidden neuron)
- 12 output weights (2 hidden neurons * 6 output neurons)
- 6 output biases (one for each output neuron)
💯 We expect each implementation to learn exactly the same network weights!
The logical functions example can be used as a "lithmus test" of neural network implementations. A proper implementation will be able to learn the 6 functions using the 24 weights as detailed above. An improper implementation (one that doesn't implement biases correctly, for example) likely will need more hidden nodes to learn successfully (if at all). A larger network means more mathematical operations so keep this in mind when you evaluate other implementations. You don't want to waste cpu cycles unnecessarily.
The second dataset consists of thousands of hand written digits. This is actually also a "toy" dataset but training a network to recognize all digits correctly is still a bit of a challenge. This dataset was originally downloaded from https://archive.ics.uci.edu/dataset/178/semeion+handwritten+digit.
Each line consists of 256 inputs (16x16 pixels) corresponding to one hand written digit. At the end of the line are 10 digits which signify the handwritten digit:
0: 1 0 0 0 0 0 0 0 0 0
1: 0 1 0 0 0 0 0 0 0 0
2: 0 0 1 0 0 0 0 0 0 0
3: 0 0 0 1 0 0 0 0 0 0
4: 0 0 0 0 1 0 0 0 0 0
...
9: 0 0 0 0 0 0 0 0 0 1
Parsing this dataset needs to be implemented for each language.
Our code will perform backpropagation to learn the weights. We update the weights after each input. This is called stochastic learning, as opposed to batch learning where multiple inputs are presented before updating weights. Stochastic learning is generally preferred [2]. Note that inputs need to be shuffled for effective learning.
One of our goals is to have as few or no dependencies. These implementations should be easy to integrate and that requires dependency-free code. Another goal is to implement fast code. Nifty, one-liners which look good but have bad performance should be avoided. It is fine to use for loops for matrix multiplication, as an example (i.e. no fancy linear algebra libraries are needed unless this is available in the standard library of the programming language).
We strive for:
- code that is easy to copy/paste for reuse
- dependency-free code
- straight forward code, no excessive object orientation which makes the code look like an OOAD excercise from the 90s
- adequate performance in favour of nifty (but slow) one-liners
- making it easy to serialize weights for storing and loading, but leave it for the users own preference
- implementations in all major languages
- simple tests that verify our implementations and secure them for the future
- having fun exploring neural networks!
Now, a note about random number generation. Training a neural network requires that the initial weights are randomly assigned. We will specify a simple random number generator algorithm that should be used in all implementations. We actually want each implementation to learn the same weights. This makes it easier to verify the implementation. Of course, whoever wants to integrate into their own solution is free to pick another random number generator.
uint p = 2147483647;
uint a = 16807;
uint current = 1;
uint Rand()
{
current = a * current % p;
return current;
}
double Random()
{
return (double)Rand() / p;
}
The first few random numbers are:
7,82636925942561E-06
0,131537788143166
0,755604293083588
0,44134794289309
0,734872750814479
0,00631718803491313
0,172979253424788
0,262310957192588
This was chosen to avoid any complexity! There are widely used algorithms for better random number generation but it isn't important in this case. We simply need some starting values and they don't have to be very random as long as they are all different.
The code samples all contain an extension point where you can plug in your own implementation, should you wish to do so (or just hardcode your choice!).
All code in this repository is licensed under MIT license. This is a permissive license and you can use this code in your personal projects, or commercial as well, without needing to share anything back. MIT license is the most common license on GitHub.
If you would like to contribute to this repository, for example by adding an implemention in another programming language, then you must also license your implementation with MIT license.
All code in this repo must be licensed under the permissive MIT license. Please add license header to every source file. No GPL allowed!
This is the current status of the implementations available. We follow a maturity model based on these criteria:
- Level 0: implement logical functions network
- Level 1: use modules/files to make implementation easy to reuse by copy/paste
- Level 2: implement a unit test to verify level 0 and make the code future safe
- Level 3: implement digit recognition with the Semeion dataset
- Level 4: implement a unit test to verify level 3 and make the code future safe
Language | Level 0 | Level 1 | Level 2 | Level 3 | Level 4 | Contributor |
---|---|---|---|---|---|---|
C# | ⭐️ | ⭐️ | ⭐️ | ⭐️ | ⭐️ | @dlidstrom |
Rust | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
F# | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
C++ | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
C | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
Go | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
Java | ⭐️ | ⭐️ | ⭐️ | @dlidstrom | ||
Kotlin | ⭐️ | ⭐️ | @dlidstrom | |||
Python | ⭐️ | @dlidstrom |
Note! The Python implementation is only here as a reference. If you are using Python you already have access to all ai tools and libraries you need.
Digit recognition is done using only 14 hidden neurons, 10 learning epochs (an epoch is a run through the entire dataset), and a learning rate of 0.5. Using these hyper parameters we are able to recognize 99.1% of the Semeion digits accurately. You may be able to improve by adding more hidden neurons, doing more epochs, and annealing the learning rate (decrease slowly). However we are also at risk of over learning which decreases our network's ability to generalize (it learns too specific, i.e. the noise in the data set).
This output shows accuracy in predicting the correct digit, and average confidence i.e. score of the largest output value:
~/CSharp $ dotnet run --semeion ../semeion.data 14 10 0.5
accuracy: 85.876 % (1368/1593), avg confidence: 68.060 %
accuracy: 91.965 % (1465/1593), avg confidence: 78.090 %
accuracy: 95.041 % (1514/1593), avg confidence: 84.804 %
accuracy: 96.673 % (1540/1593), avg confidence: 86.184 %
accuracy: 97.552 % (1554/1593), avg confidence: 88.259 %
accuracy: 98.242 % (1565/1593), avg confidence: 90.609 %
accuracy: 98.745 % (1573/1593), avg confidence: 92.303 %
accuracy: 98.870 % (1575/1593), avg confidence: 93.385 %
accuracy: 98.870 % (1575/1593), avg confidence: 93.261 %
accuracy: 99.121 % (1579/1593), avg confidence: 94.304 %
*******
****** ***
****** **
***** ****
**** *****
*** ***
** *****
**** **** ***
******* ***
***
***
***
****
***
******
***
Prediction (output from network for the above input):
0: 0.252 %
1: 0.253 %
2: 0.010 %
3: 0.028 %
4: 0.005 %
5: 4.867 %
6: 0.000 %
7: 2.864 %
8: 7.070 %
9: 94.103 % <-- best prediction
Looks good, doesn't it?
For reference we have a Python implementation which uses NumPy, and should be fairly easy to understand. Why Python? Because Python has become the lingua franca of ai programming. It is also easy to modify and fast to re-run, thus ideal for experiments.
We will now go through the reference implementation and include some math diagrams for those that want to know what's going on. You'll see the how but not the why (see references section for that).
Here, one forward and one backward propagation is shown. You can use these
values to verify your own calculations. The example is the logical functions
shown earlier with the inputs being both 1
, i.e. 1 1
. We will use 3 hidden
neurons and 6 outputs (xor, xnor, and, nand, or, nor).
These are the initial values for the input layer and the hidden layer.
First we show forward propagation for the hidden layer.
Now to forward propagation for the output layer. This is the actual prediction of the network.
Now we have calculated output. These are off according to the expected output
and the purpose of the next step, backpropagation, is to correct the weights for
a slightly improved prediction in the next iteration. First step of
backpropagation is to compute the error gradient (
Now compute the error gradient of the hidden layer.
Finally we can apply weight updates.
Now update weights and biases for the input layer.
If you do use any of these implementations in your own solution, then here are some things to keep in mind for good results:
- shuffle inputs
- try to have about the same number of samples for each output to avoid "drowning out" a sample
- try different learning rates (0.1 to 0.5 seems to work well for many problems)
- you may try "annealing" the learning rate, meaning start high (0.5) and slowly decrease over the epochs
[1] http://neuralnetworksanddeeplearning.com/
[2] https://leon.bottou.org/publications/pdf/tricks-1998.pdf
[3] https://cs231n.github.io/neural-networks-2/