View on GitHub


Codes and data for paper "Global Temporal Representation based CNNs for Infrared Action Recognition" (IEEE SPL)



Codes for paper “Global Temporal Representation based CNNs for Infrared Action Recognition”


Infrared human action recognition has many advantages, i.e. it is insensitive to illumination change, appearance variability and shadows. Existing methods for infrared action recognition are either based on spatial or local temporal information, however, the global temporal information, which can better describe the movements of body parts across the whole video, is not considered. In this letter, we propose a novel global temporal representation named Optical-Flow Stacked Difference Image (OFSDI) and extract robust and discriminative feature from infrared action data by considering the local, global, and spatial temporal information together. Due to the small size of infrared action dataset, we first apply CNN on local, spatial, and global temporal stream respectively to obtain efficient convolutional feature maps from the raw data rather than train a classifier directly. Then these convolutional feature maps are aggregated into effective descriptors named three-stream trajectory-pooled deep-convolutional descriptors (TSTDDs) by trajectory-constrained pooling. Furthermore, we improve the robustness of these features by using Locality-constrained Linear Coding (LLC) method. With these features, a linear SVM is adopted to classify the action data in our scheme. We conduct experiments on infrared action recognition dataset InfAR and NTU RGB+D. Experimental results show that the proposed approach outperforms the representative state-of-the-art handcrafted features and deep learning features based methods for infrared action recognition.


Image Figure 1: Frameworks of our infrared action recognition method and the conventional method. Compared with the conventional method, our method is different in network input, CNNs structure, feature extraction, and classification strategy.

Network Structure

Image Table I: The details of three-stream CNNs in terms of kernel, stride, channel size, and map size ratio.


Image Figure 3: Example images from video sequences on InfAR (first row) and NTU RGB+D (second row) datasets. (a) fight. (b) handclapping. (c) handshake. (d) hug. (e) jog. (f) jump. (g) drop. (f) pickup. (g) throw. (h) sitting down. (i) standing up. (j) clapping.

InfAR dataset can be downloaded here.

NTU RGB+D dataset can be downloaded here.


(Matlab R2016b or higher version is required to open these files)

  1. Caffe models. We also release the trained model named “InfAR_TDDs_rgb_iter_10000.caffemodel”.
  2. TSTDDs codes (“spatial_v2.caffemodel” and “temporal_v2.caffemodel” must be included in the “TSSDDs\model”).
  3. TSTDDs features for all the videos on the InfAR dataset.
  4. Matlab codes for generating the training and test set.
  5. We release the indices of the splits for all the 5 folds here.

TSTDDs demo code

Here, a matlab demo code for TSTDDs extraction is provided.

Step 1: Improved Trajectory Extraction

You need download our modified iDT feature code and compile it by yourself. Improved Trajectories

Step 2: TVL1 Optical Flow Extraction

You need download our dense flow code and compile it by yourself. Dense Flow

Step 3: Matcaffe

You need download the public caffe toolbox. Our TDD code is compatatible with the latest version of parallel caffe toolbox

Note that you need to download the models in the new proto format:

“Spatial net model (v2)” “Temporal net model (v2)”

Step 4: TSTDDs Extraction

Now you can run the matlab file “TSTDDs_main.m” to extract TSTDDs features.

If you find the work helpful, please kindly consider to cite our paper by:

  title={Global Temporal Representation based CNNs for Infrared Action Recognition},
  author={Yang Liu and Zhaoyang Lu and Jing Li and Tao Yang and Chao Yao},
  journal={IEEE Signal Processing Letters},
  pages = {848-852},
  doi = {10.1109/LSP.2018.2823910}


Yang, Liu; Zhaoyang, Lu; Jing, Li; Tao, Yang; Chao, Yao. Global Temporal Representation based CNNs for Infrared Action Recognition. IEEE Signal Processing Letters, vol. 25, no. 6, pp. 848-852, June 2018. doi:10.1109/LSP.2018.2823910.

If you have any question about this code, feel free to reach me(