An example of semantic segmentation on iOS using CoreML and Keras. Trained Tiramisu 45 weights come from here. A device with a camera is required, preferably a newer one to maintain an acceptable frame-rate from the model.
Predictions from Tiramisu 45 on iPhone XS Video Stream.
- note that the 1280 × 720 input image is scaled (fill) to 480 × 352, explaining the discrepancy in size between the camera stream and segmentation outputs
- iOS >= 12.x
- The Metal Performance Shader for ArgMax feature channel reduction is only available from iOS 12 onward. An iterative CPU implementation of ArgMax results in a 3x slowdown compared to the vectorized GPU one on Metal (on iPhone XS).
The original Keras model file can be found in Tiramisu/Models as Tiramisu45.h5. An accompanying python file, convert.py, handles the conversion from the Keras model into a CoreML model as Tiramisu45.mlmodel using coremltools. The model is trained first on CamVid, then on CityScapes using similar hyperparameters as reported in the original paper. Additional augmentation is performed (brightness adjustment, random rotations) during training to promote a model that is robust against variations in lighting and angle from the camera.
Tiramisu 45 is heavy weight despite few (≈800,000) parameters due to the skip connections in dense blocks and between the encoder and decoder. As a result, the frame-rate suffers. The values reported here are averaged over 30 seconds of runtime after application initialization. Note that because of intense computation, the devices will get hot quickly and begin thermal throttling. The iPhone XS frame-rate drops to ≈6 when this throttling occurs.
Device | Frame Rate |
---|---|
iPhone XS | ≈ 12 |
iPhone 7 | ≈ 2 |
iPad Air | < 1 |