Boyuan Jiang(姜博源)

2023

Dynamic Frame Interpolation in Wavelet Domain

Kong, Lingtong, Jiang, Boyuan, Luo, Donghao, Chu, Wenqing, Tai, Ying, Wang, Chengjie, and Yang, Jie

IEEE Transactions on Image Processing 2023

PDF Code

2022

ColorFormer: Image Colorization via Color Memory assisted Hybrid-attention Transformer

Ji, Xiaozhong*, Jiang, Boyuan*, Luo, Donghao, Tao, Guangpin, Chu, Wenqing, Xie, Zhifeng, Wang, Chengjie, and Tai, Ying

European Conference on Computer Vision (ECCV) 2022

Abs PDF

Automatic image colorization is a challenging task that attracts a lot of research interest. Previous methods employing deep neural networks have produced impressive results. However, these colorization images are still unsatisfactory and far from practical applications. The reason is that semantic consistency and color richness are two key elements ignored by existing methods. In this work, we propose an automatic image colorization method via color memory assisted hybrid-attention transformer, namely ColorFormer. Our network consists of a transformer-based encoder and a color memory decoder. The core module of the encoder is our proposed global-local hybrid attention operation, which improves the ability to capture global receptive field dependencies. With the strong power to model contextual semantic information of grayscale image in different scenes, our network can produce semantic-consistent colorization results. In decoder part, we design a color memory module which stores various semantic-color mapping for image-adaptive queries. The queried color priors are used as reference to help the decoder produce more vivid and diverse results. Experimental results show that our method can generate more realistic and semantically matched color images compared with state-of-the-art methods. Moreover, owing to the proposed end-to-end architecture, the inference speed reaches 40 FPS on a V100 GPU, which meets the real-time requirement.
IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation

Kong, Lingtong*, Jiang, Boyuan*, Luo, Donghao, Chu, Wenqing, Huang, Feiyue, Tai, Ying, Wang, Chengjie, and Yang, Jie

Computer Vision and Pattern Recognition (CVPR) 2022

Abs PDF Code

Prevailing video frame interpolation algorithms, that generate the intermediate frames from consecutive inputs, typically rely on complex model architectures with heavy parameters or large delay, hindering them from diverse real-time applications. In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing. It first extracts pyramid features from given inputs, and then refines the bilateral intermediate flow fields together with a powerful intermediate feature until generating the desired output. The gradually refined intermediate feature can not only facilitate intermediate flow estimation, but also compensate for contextual details, making IFRNet do not need additional synthesis or refinement module. To fully release its potential, we further propose a novel task-oriented optical flow distillation loss to focus on learning the useful teacher knowledge towards frame synthesizing. Meanwhile, a new geometry consistency regularization term is imposed on the gradually refined intermediate features to keep better structure layout. Experiments on various benchmarks demonstrate the excellent performance and fast inference speed of proposed approaches.

2021

Learning Comprehensive Motion Representation for Action Recognition

Wu, Mingyu*, Jiang, Boyuan*, Luo, Donghao, Yan, Junchi, Wang, Yabiao, Tai, Ying, Wang, Chengjie, Li, Jilin, Huang, Feiyue, and Yang, Xiaokang

AAAI Conference on Artificial Intellige (AAAI) 2021

Abs HTML PDF Code

For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency. Moreover, the feature enhancement is often only performed by channel or space dimension in action recognition. To address these issues, we first devise a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. The channel gates generated by CME incorporate the information from all the other frames in the video. We further propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps. The intuition is that the change of background is typically slower than the motion area. Both CME and SME have clear physical meaning in capturing action clues. By integrating the two modules into the off-the-shelf 2D network, we finally obtain a Comprehensive Motion Representation (CMR) learning method for action recognition, which achieves competitive performance on Something-Something V1 & V2 and Kinetics-400. On the temporal reasoning datasets Something-Something V1 and V2, our method outperforms the current state-of-the-art by 2.3% and 1.9% when using 16 frames as input, respectively.
Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition

Yan, Jingwei*, Jiang, Boyuan*, Wang, Jingjing, Li, Qiang, Wang, Chunmao, and Pu, Shiliang

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021

Abs HTML PDF

In facial action unit (AU) recognition tasks, regional feature learning and AU relation modeling are two effective aspects which are worth exploring. However, the limited representation capacity of regional features makes it difficult for relation models to embed AU relationship knowledge. In this paper, we propose a novel multi-level adaptive ROI and graph learning (MARGL) framework to tackle this problem. Specifically, an adaptive ROI learning module is designed to automatically adjust the location and size of the predefined AU regions. Meanwhile, besides relationship between AUs, there exists strong relevance between regional features across multiple levels of the backbone network as level-wise features focus on different aspects of representation. In order to incorporate the intra-level AU relation and inter-level AU regional relevance simultaneously, a multi-level AU relation graph is constructed and graph convolution is performed to further enhance AU regional features of each level. Experiments on BP4D and DISFA demonstrate the proposed MARGL significantly outperforms the previous state-of-the-art methods.

2020

Selective transfer with reinforced transfer network for partial domain adaptation

Chen, Zhihong, Chen, Chao, Cheng, Zhaowei, Jiang, Boyuan, Fang, Ke, and Jin, Xinyu

Computer Vision and Pattern Recognition (CVPR) 2020

Abs HTML PDF

One crucial aspect of partial domain adaptation (PDA) is how to select the relevant source samples in the shared classes for knowledge transfer. Previous PDA methods tackle this problem by re-weighting the source samples based on their high-level information (deep features). However, since the domain shift between source and target domains, only using the deep features for sample selection is defective. We argue that it is more reasonable to additionally exploit the pixel-level information for PDA problem, as the appearance difference between outlier source classes and target classes is significantly large. In this paper, we propose a reinforced transfer network (RTNet), which utilizes both high-level and pixel-level information for PDA problem. Our RTNet is composed of a reinforced data selector (RDS) based on reinforcement learning (RL), which filters out the outlier source samples, and a domain adaptation model which minimizes the domain discrepancy in the shared label space. Specifically, in the RDS, we design a novel reward based on the reconstruct errors of selected source samples on the target generator, which introduces the pixel-level information to guide the learning of RDS. Besides, we develope a state containing high-level information, which used by the RDS for sample selection. The proposed RDS is a general module, which can be easily integrated into existing DA models to make them fit the PDA situation. Extensive experiments indicate that RTNet can achieve state-of-the-art performance for PDA tasks on several benchmark datasets.

2019

Stm: Spatiotemporal and motion encoding for action recognition

Jiang, Boyuan, Wang, MengMeng, Gan, Weihao, Wu, Wei, and Yan, Junjie

International Conference on Computer Vision (ICCV) 2019

Abs HTML PDF

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose a STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation

Chen, Chao*, Chen, Zhihong*, Jiang, Boyuan, and Jin, Xinyu

AAAI Conference on Artificial Intellige (AAAI) 2019

Abs HTML PDF Code

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities. However, most of existing work only concentrates on learning shared feature representation by minimizing the distribution discrepancy across different domains. Due to the fact that all the domain alignment approaches can only reduce, but not remove the domain shift, target domain samples distributed near the edge of the clusters, or far from their corresponding class centers are easily to be misclassified by the hyperplane learned from the source domain. To alleviate this issue, we propose to joint domain alignment and discriminative feature learning, which could benefit both domain alignment and final classification. Specifically, an instance-based discriminative feature learning method and a center-based discriminative feature learning method are proposed, both of which guarantee the domain invariant features with better intra-class compactness and inter-class separability. Extensive experiments show that learning the discriminative features in the shared feature space can significantly boost the performance of deep domain adaptation methods.
Optimizing extreme learning machine via generalized hebbian learning and intrinsic plasticity learning

Chen, Chao, Jin, Xinyu, Jiang, Boyuan, and Li, Lanjuan

Neural Processing Letters 2019

Abs HTML PDF

Traditional extreme learning machine (ELM) has random weights between input layer and hidden layer, this kind of random feature mapping brings non-discriminative feature space and unstable classification accuracy, which greatly limits the performance of the ELM networks. Therefore, to get the well-pleasing input weights, two biologically inspired, unsupervised learning methods were introduced to optimize the traditional ELM networks, namely the generalized hebbian algorithm (GHA) and intrinsic plasticity learning (IPL). The GHA is able to extract the principal components of the input data of arbitrary size, while the IPL tunes the probability density of the neuron’s output towards a desired distribution such as exponential distribution or weber distribution, thereby maximizing the networks information transmission. With the incorporation of the GHA and IPL approach, the optimized ELM networks generates a discriminative feature space and preserves much more characteristic of the input data, accordingly, achieving a better task performance. Based on the above two unsupervised methods, a simple, yet effective hierarchical feature mapping extreme learning machine (HFMELM) is further proposed. With almost no information loss in the layer-wise feature mapping process, the HFMELM is able to learn the high-level representation of the input data. To evaluate the effectiveness of the proposed methods, extensive experiments on several datasets are presented, the results show that the proposed methods significantly outperform the traditional ELM networks.
Joint domain matching and classification for cross-domain adaptation via ELM

Chen, Chao*, Jiang, Buyuan*, Cheng, Zhaowei, and Jin, Xinyu

Neurocomputing 2019

Abs HTML PDF Code

Recent years, domain adaptation has attracted much attention in the community of machine learning. In this paper, we mainly focus on the tasks of Joint Domain Matching and Classification (JDMC) under the framework of extreme learning machine (ELM). Specifically, our JDMC method is formulated by optimizing both the output-adapted transformation and the cross-domain classifier, which allows us to simultaneously (1) align the source domain and target domain in the feature space with correlation alignment, (2) minimize the discrepancy between the source and target domain, measured in terms of both marginal and conditional probability distribution in the mapped feature space, (3) select informative features which behave similarly in both domains for knowledge transfer by imposing ℓ2,1-norm on the output weights of ELM. In this respect, the proposed JDMC integrates the feature matching, feature selection and classifier design in a unified framework. Besides, an efficient alternative optimization strategy is exploited to solve the joint learning model. To evaluate the effectiveness of the proposed method, extensive experiments on several commonly used domain adaptation datasets are presented, the results show that the proposed method significantly outperforms the non-transfer ELM networks and consistently outperforms several state-of-art domain adaptation methods.

2018

Parameter transfer extreme learning machine based on projective model

Chen, Chao*, Jiang, Boyuan*, and Jin, Xinyu

International joint conference on neural networks (IJCNN) 2018

Abs HTML PDF Code

Abstract—Recent years, transfer learning has attracted much attention in the community of machine learning. In this paper, we mainly focus on the tasks of parameter transfer under the framework of extreme learning machine (ELM). Unlike the existing parameter transfer approaches, which incorporate the source model information into the target by regularizing the difference between the source and target domain parameters, an intuitively appealing projective-model is proposed to bridge the source and target model parameters. Specifically, we formulate the parameter transfer in the ELM networks by the means of parameter projection, and train the model by optimizing the projection matrix and classifier parameters jointly. Further more, the ‘2,1-norm structured sparsity penalty is imposed on the source domain parameters, which encourages the joint feature selection and parameter transfer. To evaluate the effectiveness of the proposed method, comprehensive experiments on several commonly used domain adaptation datasets are presented. The results show that the proposed method significantly outperforms the non-transfer ELM networks and other classical transfer learning methods.
Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace

Jiang, Boyuan, Chen, Chao, and Jin, Xinyu

Neural computing and applications 2018

Abs HTML PDF Code

Deep neural networks can learn powerful and discriminative representations from a large number of labeled samples. However, it is typically costly to collect and annotate large-scale datasets, which limits the applications of deep learning in many real-world scenarios. Domain adaptation, as an option to compensate for the lack of labeled data, has attracted much attention in the community of machine learning. Although a mass of methods for domain adaptation has been presented, many of them simply focus on matching the distribution of the source and target feature representations, which may fail to encode useful information about the target domain. In order to learn invariant and discriminative representations for both domains, we propose a Cross-Domain Minimization with Deep Autoencoder method for unsupervised domain adaptation, which simultaneously learns label prediction on the source domain and input reconstruction on the target domain using shared feature representations aligned with correlation alignment in a unified framework. Furthermore, inspired by adversarial training and cluster assumption, a task-specific class label discriminator is incorporated to confuse the predicted target class labels with samples draw from categorical distribution, which can be regarded as entropy minimization regularization. Extensive empirical results demonstrate the superiority of our approach over the state-of-the-art unsupervised adaptation methods on both visual and non-visual cross-domain adaptation tasks.