Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
AATrackT: A deep learning network using attentions for tracking fast-moving and tiny objects: (A)ttention (A)ugmented - (Track)ing on (T)iny objects
Jönköping University, School of Engineering, JTH, Department of Computing.
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Recent advances in deep learning have made it possible to visually track objects from a video sequence. Moreover, as transformers got introduced in computer vision, new state-of-the-art performances were achieved in visual tracking. However, most of these studies have used attentions to correlate the distinguishing factors between target-object and candidate-objects to localise the object throughout the video sequence. This approach is not adequate for tracking tiny objects. Also, conventional trackers in general are often not applicable to tracking extreme small objects, or objects that are moving fast. Therefore, the purpose of this study is to improve current methods to track tiny fast-moving objects, with the help of attentions. A deep neural network, named AATrackT, is built to address this gap by referring to it as a visual image segmentation problem. The proposed method is using data extracted from broadcasting videos of the sport Tennis. Moreover, to capture the global context of images, attention augmented convolutions are used as a substitute to the conventional convolution operation. Contrary to what the authors assumed, the experiment showed an indication that using attention augmented convolutions did not contribute to increasing the tracking performance. Our findings showed that the reason is mainly that the spatial resolution of the activation maps of 72x128 is too large for the attention weights to converge.

Place, publisher, year, edition, pages
2022. , p. 43
Keywords [en]
Machine learning, Computer vision, Visual tracking, Attentions, Tiny fast-moving object
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hj:diva-57260ISRN: JU-JTH-PRU-2-20220299OAI: oai:DiVA.org:hj-57260DiVA, id: diva2:1671547
External cooperation
Padelplay AB
Subject / course
JTH, Computer Engineering
Available from: 2022-06-23 Created: 2022-06-17 Last updated: 2025-10-13Bibliographically approved

Open Access in DiVA

fulltext(3752 kB)726 downloads
File information
File name FULLTEXT01.pdfFile size 3752 kBChecksum SHA-512
272a2ea762b0dbf5b538d1e72a7fa4d3ae88be79fef55373a90224b79154d30cb724ba08fa09f1527f1a03233428c81169713f5e921692e841f815cd56bedcc9
Type fulltextMimetype application/pdf

By organisation
JTH, Department of Computing
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 726 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 744 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf