Siamese neural network

From HandWiki

A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.Cite error: Closing </ref> missing for <ref> tag

  • Non-negativity: [math]\displaystyle{ \delta ( x, y ) \ge 0 }[/math]
  • Identity of Non-discernibles: [math]\displaystyle{ \delta ( x, y ) = 0 \iff x=y }[/math]
  • Symmetry: [math]\displaystyle{ \delta ( x, y ) = \delta ( y, x ) }[/math]
  • Triangle inequality: [math]\displaystyle{ \delta ( x, z ) \le \delta ( x, y ) + \delta ( y, z ) }[/math]

In particular, the triplet loss algorithm is often defined with squared Euclidean (which unlike Euclidean, does not have triangle inequality) distance at its core.

Predefined metrics, Euclidean distance metric

The common learning goal is to minimize a distance metric for similar objects and maximize for distinct ones. This gives a loss function like

[math]\displaystyle{ \begin{align} \delta(x^{(i)}, x^{(j)})= \begin {cases} \min \ \| \operatorname{f} \left ( x^{(i)} \right ) - \operatorname{f} \left ( x^{(j)} \right ) \| \, , i = j \\ \max \ \| \operatorname{f} \left ( x^{(i)} \right ) - \operatorname{f} \left ( x^{(j)} \right ) \| \, , i \neq j \end{cases} \end{align} }[/math]
[math]\displaystyle{ i,j }[/math] are indexes into a set of vectors
[math]\displaystyle{ \operatorname{f}(\cdot) }[/math] function implemented by the twin network

The most common distance metric used is Euclidean distance, in case of which the loss function can be rewritten in matrix form as

[math]\displaystyle{ \operatorname{\delta} ( \mathbf{x}^{(i)}, \mathbf{x}^{(j)} ) \approx (\mathbf{x}^{(i)} - \mathbf{x}^{(j)})^{T}(\mathbf{x}^{(i)} - \mathbf{x}^{(j)}) }[/math]

Learned metrics, nonlinear distance metric

A more general case is where the output vector from the twin network is passed through additional network layers implementing non-linear distance metrics.

[math]\displaystyle{ \begin{align} \text{if} \, i = j \, \text{then} & \, \operatorname{\delta} \left [ \operatorname{f} \left ( x^{(i)} \right ), \, \operatorname{f} \left ( x^{(j)} \right ) \right ] \, \text{is small} \\ \text{otherwise} & \, \operatorname{\delta} \left [ \operatorname{f} \left ( x^{(i)} \right ), \, \operatorname{f} \left ( x^{(j)} \right ) \right ] \, \text{is large} \end{align} }[/math]
[math]\displaystyle{ i,j }[/math] are indexes into a set of vectors
[math]\displaystyle{ \operatorname{f}(\cdot) }[/math]function implemented by the twin network
[math]\displaystyle{ \operatorname{\delta}(\cdot) }[/math]function implemented by the network joining outputs from the twin network

On a matrix form the previous is often approximated as a Mahalanobis distance for a linear space as[1]

[math]\displaystyle{ \operatorname{\delta} ( \mathbf{x}^{(i)}, \mathbf{x}^{(j)} ) \approx (\mathbf{x}^{(i)} - \mathbf{x}^{(j)})^{T}\mathbf{M}(\mathbf{x}^{(i)} - \mathbf{x}^{(j)}) }[/math]

This can be further subdivided in at least Unsupervised learning and Supervised learning.

Learned metrics, half-twin networks

This form also allows the twin network to be more of a half-twin, implementing a slightly different functions

[math]\displaystyle{ \begin{align} \text{if} \, i = j \, \text{then} & \, \operatorname{\delta} \left [ \operatorname{f} \left ( x^{(i)} \right ), \, \operatorname{g} \left ( x^{(j)} \right ) \right ] \, \text{is small} \\ \text{otherwise} & \, \operatorname{\delta} \left [ \operatorname{f} \left ( x^{(i)} \right ), \, \operatorname{g} \left ( x^{(j)} \right ) \right ] \, \text{is large} \end{align} }[/math]
[math]\displaystyle{ i,j }[/math] are indexes into a set of vectors
[math]\displaystyle{ \operatorname{f}(\cdot), \operatorname{g}(\cdot) }[/math]function implemented by the half-twin network
[math]\displaystyle{ \operatorname{\delta}(\cdot) }[/math]function implemented by the network joining outputs from the twin network

Twin networks for object tracking

Twin networks have been used in object tracking because of its unique two tandem inputs and similarity measurement. In object tracking, one input of the twin network is user pre-selected exemplar image, the other input is a larger search image, which twin network's job is to locate exemplar inside of search image. By measuring the similarity between exemplar and each part of the search image, a map of similarity score can be given by the twin network. Furthermore, using a Fully Convolutional Network, the process of computing each sector's similarity score can be replaced with only one cross correlation layer.[2]

After being first introduced in 2016, Twin fully convolutional network has been used in many High-performance Real-time Object Tracking Neural Networks. Like CFnet,[3] StructSiam,[4] SiamFC-tri,[5] DSiam,[6] SA-Siam,[7] SiamRPN,[8] DaSiamRPN,[9] Cascaded SiamRPN,[10] SiamMask,[11] SiamRPN++,[12] Deeper and Wider SiamRPN.[13]

See also

Further reading

References

  1. Chandra, M.P. (1936). "On the generalized distance in statistics". Proceedings of the National Institute of Sciences of India. 1 2: 49–55. http://library.isical.ac.in:8080/jspui/bitstream/123456789/6765/1/Vol02_1936_1_Art05-pcm.pdf. 
  2. Fully-Convolutional Siamese Networks for Object Tracking arXiv:1606.09549
  3. "End-to-end representation learning for Correlation Filter based tracking". https://www.robots.ox.ac.uk/~luca/cfnet.html. 
  4. "Structured Siamese Network for Real-Time Visual Tracking". http://openaccess.thecvf.com/content_ECCV_2018/papers/Yunhua_Zhang_Structured_Siamese_Network_ECCV_2018_paper.pdf. 
  5. "Triplet Loss in Siamese Network for Object Tracking". http://openaccess.thecvf.com/content_ECCV_2018/papers/Xingping_Dong_Triplet_Loss_with_ECCV_2018_paper.pdf. 
  6. "Learning Dynamic Siamese Network for Visual Object Tracking". http://openaccess.thecvf.com/content_ICCV_2017/papers/Guo_Learning_Dynamic_Siamese_ICCV_2017_paper.pdf. 
  7. "A Twofold Siamese Network for Real-Time Object Tracking". http://openaccess.thecvf.com/content_cvpr_2018/papers/He_A_Twofold_Siamese_CVPR_2018_paper.pdf. 
  8. "High Performance Visual Tracking with Siamese Region Proposal Network". http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf. 
  9. Zhu, Zheng; Wang, Qiang; Li, Bo; Wu, Wei; Yan, Junjie; Hu, Weiming (2018). "Distractor-aware Siamese Networks for Visual Object Tracking". arXiv:1808.06048 [cs.CV].
  10. Fan, Heng; Ling, Haibin (2018). "Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking". arXiv:1812.06148 [cs.CV].
  11. Wang, Qiang; Zhang, Li; Bertinetto, Luca; Hu, Weiming; Torr, Philip H. S. (2018). "Fast Online Object Tracking and Segmentation: A Unifying Approach". arXiv:1812.05050 [cs.CV].
  12. Li, Bo; Wu, Wei; Wang, Qiang; Zhang, Fangyi; Xing, Junliang; Yan, Junjie (2018). "SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks". arXiv:1812.11703 [cs.CV].
  13. Zhang, Zhipeng; Peng, Houwen (2019). "Deeper and Wider Siamese Networks for Real-Time Visual Tracking". arXiv:1901.01660 [cs.CV].