Label Incorporated Graph Neural Networks for Text Classification

Yuan Xin, Linli Xu, Junliang Guo, Jiquan Li, Xin Sheng, Yuanyuan Zhou

Responsive image

Auto-TLDR; Graph Neural Networks for Semi-supervised Text Classification

Slides Poster

Graph Neural Networks (GNNs) have achieved great success on graph-structured data, and their applications on traditional data structures such as natural language processing and semi-supervised text classification have been extensively explored in recent years. While previous works only consider the text information while building the graph, heterogeneous information such as labels is ignored. In this paper, we consider to incorporate the label information while building the graph by adding text-label-text paths, through which the supervision information will propagate among the graph more directly. Specifically, we treat labels as nodes in the graph which also contains text and word nodes, and then connect labels with texts belonging to that label. Through graph convolutions, label embeddings are jointly learned with text embeddings in the same latent semantic space. The newly incorporated label nodes will facilitate learning more accurate text embeddings by introducing the label information, and thus benefit the downstream text classification tasks. Extensive results on several benchmark datasets show that the proposed framework outperforms baseline methods by a significant margin.

Similar papers

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network

Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin

Responsive image

Auto-TLDR; Semantically Extended Graph Convolutional Network for Zero-shot Text Classification

Slides Poster Similar

As a challenging task of Natural Language Processing(NLP), zero-shot text classification has attracted more and more attention recently. It aims to detect classes that the model has never seen in the training set. For this purpose, a feasible way is to construct connection between the seen and unseen classes by semantic extension and classify the unseen classes by information propagation over the connection. Although many related zero-shot text classification methods have been exploited, how to realize semantic extension properly and propagate information effectively is far from solved. In this paper, we propose a novel zero-shot text classification method called Semantically Extended Graph Convolutional Network (SEGCN). In the proposed method, the semantic category knowledge from ConceptNet is utilized to semantic extension for linking seen classes to unseen classes and constructing a graph of all classes. Then, we build upon Graph Convolutional Network (GCN) for predicting the textual classifier for each category, which transfers the category knowledge by the convolution operators on the constructed graph and is trained in a semi-supervised manner using the samples of the seen classes. The experimental results on Dbpedia and 20newsgroup datasets show that our method outperforms the state of the art zero-shot text classification methods.

GCNs-Based Context-Aware Short Text Similarity Model

Xiaoqi Sun

Responsive image

Auto-TLDR; Context-Aware Graph Convolutional Network for Text Similarity

Slides Poster Similar

Semantic textual similarity is a fundamental task in text mining and natural language processing (NLP), which has profound research value. The essential step for text similarity is text representation learning. Recently, researches have explored the graph convolutional network (GCN) techniques on text representation, since GCN does well in handling complex structures and preserving syntactic information. However, current GCN models are usually limited to very shallow layers due to the vanishing gradient problem, which cannot capture non-local dependency information of sentences. In this paper, we propose a GCNs-based context-aware (GCSTS) model that applies iterated GCN blocks to train deeper GCNs. Recurrently employing the same GCN block prevents over-fitting and provides broad effective input width. Combined with dense connections, GCSTS can be trained more deeply. Besides, we use dynamic graph structures in the block, which further extend the receptive field of each vertex in graphs, learning better sentence representations. Experiments show that our model outperforms existing models on several text similarity datasets, while also verify that GCNs-based text representation models can be trained in a deeper manner, rather than being trained in two or three layers.

Reinforcement Learning with Dual Attention Guided Graph Convolution for Relation Extraction

Zhixin Li, Yaru Sun, Suqin Tang, Canlong Zhang, Huifang Ma

Responsive image

Auto-TLDR; Dual Attention Graph Convolutional Network for Relation Extraction

Slides Poster Similar

To better learn the dependency relationship between nodes, we address the relationship extraction task by capturing rich contextual dependencies based on the attention mechanism, and using distributional reinforcement learning to generate optimal relation information representation. This method is called Dual Attention Graph Convolutional Network (DAGCN), to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of GCN, which model the semantic interdependencies in spatial and relational dimensions respectively. The position attention module selectively aggregates the feature at each position by a weighted sum of the features at all positions of nodes internal features. Meanwhile, the relation attention module selectively emphasizes interdependent node relations by integrating associated features among all nodes. We sum the outputs of the two attention modules and use reinforcement learning to predict the classification of nodes relationship to further improve feature representation which contributes to more precise extraction results. The results on the TACRED and SemEval datasets show that the model can obtain more useful information for relational extraction tasks, and achieve better performances on various evaluation indexes.

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, Rong Xiao

Responsive image

Auto-TLDR; PICK: A Graph Learning Framework for Key Information Extraction from Documents

Slides Poster Similar

Computer vision with state-of-the-art deep learning models have achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently. However, Key Information Extraction (KIE) from documents as the downstream task of OCR, having a large number of use scenarios in real-world, remains a challenge because documents not only have textual features extracting from OCR systems but also have semantic visual features that are not fully exploited and play a critical role in KIE. Too little work has been devoted to efficiently make full use of both textual and visual features of the documents. In this paper, we introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Extensive experiments on real-world datasets have been conducted to show that our method outperforms baselines methods by significant margins.

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

Jiehui Deng, Sheng Wan, Xiang Wang, Enmei Tu, Xiaolin Huang, Jie Yang, Chen Gong

Responsive image

Auto-TLDR; EAGAT: Edge-Aware Graph Attention Network for Automatic REU Estimation in Mobile Networks

Slides Poster Similar

Estimating the Ratio of Edge-Users (REU) is an important issue in mobile networks, as it helps the subsequent adjustment of loads in different cells. However, existing approaches usually determine the REU manually, which are experience-dependent and labor-intensive, and thus the estimated REU might be imprecise. Considering the inherited graph structure of mobile networks, in this paper, we utilize a graph-based deep learning method for automatic REU estimation, where the practical cells are deemed as nodes and the load switchings among them constitute edges. Concretely, Graph Attention Network (GAT) is employed as the backbone of our method due to its impressive generalizability in dealing with networked data. Nevertheless, conventional GAT cannot make full use of the information in mobile networks, since it only incorporates node features to infer the pairwise importance and conduct graph convolutions, while the edge features that are actually critical in our problem are disregarded. To accommodate this issue, we propose an Edge-Aware Graph Attention Network (EAGAT), which is able to fuse the node features and edge features for REU estimation. Extensive experimental results on two real-world mobile network datasets demonstrate the superiority of our EAGAT approach to several state-of-the-art methods.

Efficient Sentence Embedding Via Semantic Subspace Analysis

Bin Wang, Fenxiao Chen, Yun Cheng Wang, C.-C. Jay Kuo

Responsive image

Auto-TLDR; S3E: Semantic Subspace Sentence Embedding

Slides Poster Similar

A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we represent words that lie in the same semantic group using the intra-group descriptor. Second, we characterize the interaction between multiple semantic groups with the inter-group descriptor. The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks. Experimental results show that it offers comparable or better performance than the state-of-the-art. The complexity of our S3E method is also much lower than other parameterized models.

A General Model for Learning Node and Graph Representations Jointly

Chaofan Chen

Responsive image

Auto-TLDR; Joint Community Detection/Dynamic Routing for Graph Classification

Slides Poster Similar

This paper focuses on two fundamental graph recognition tasks: node classification and graph classification. Existing methods usually learn the node and graph representations for these two tasks separately, and ignore modeling the relations between the local and global structures. In this paper, we propose a general approach to learn the local and global features collaboratively: (1) in order to characterize the correlation among nodes and communities (a set of nodes), we employ the joint community detection/dynamic routing modules to generate the clustering assignment matrices at first and then utilize these matrices to cluster nodes to capture the global information of graphs (locally relevant graph representations). Inspired by the success of spectral clustering, we minimize the ratiocut loss to help optimize the learned assignment matrices. (2) We maximize the mutual information between local and global representations to help learn the globally relevant node representations. Experimental results on a variety of node and graph classification benchmarks show that our model can achieve superior performance over the state-of-the-art approaches.

AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network

Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Cheng-Zhong Xu

Responsive image

Auto-TLDR; Adjacency Matrix for Graph Convolutional Network in Non-Euclidean Space

Slides Poster Similar

Graph Convolutional Network (GCN) is adopted to tackle the problem of the convolution operation in non-Euclidean space. Although previous works on GCN have made some progress, one of their limitations is that their input Adjacency Matrix (AM) is designed manually and requires domain knowledge, which is cumbersome, tedious and error-prone. In addition, entries of this fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect more complex relationship between nodes. However, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed Adjacency Matrix. To this end, there are few works focusing on designing a more flexible Adjacency Matrix. In this paper, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method that called node information entropy to update the matrix. Then, we analyze the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the demerit of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the adjacency matrix without artificial knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better value for the network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, respectively, with the accuracy of 84.6% and 81.6%.

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

Yue Wang, Zhuo Xu, Yao Wan, Lu Bai, Lixin Cui, Qian Zhao, Edwin Hancock, Philip Yu

Responsive image

Auto-TLDR; Joint-Event-extraction from Unstructured corpora using Structural Information Network

Slides Poster Similar

Joint-event-extraction, which extracts structural information (i.e., entities or triggers of events) from unstructured real-world corpora, has attracted more and more research attention in natural language processing. \revised{Most existing works do not fully address the sparse co-occurred relationships between entities and triggers. This exacerbates the error-propagation problem} which may degrade the extraction performance. To mitigate this issue, we first define the joint-event-extraction as a sequence-to-sequence labeling task with a tag set which is composed of tags of triggers and entities. Then, to incorporate the missing information in the aforementioned co-occurred relationships, we propose a \underline{C}ross-\underline{S}upervised \underline{M}echanism (CSM) to alternately supervise the extraction of either triggers or entities based on the type distribution of each other. Moreover, since the connected entities and triggers naturally form a heterogeneous information network (HIN), we leverage the latent pattern along meta-paths for a given corpus to further improve the performance of our proposed method. To verify the effectiveness of our proposed method, we conduct extensive experiments on real-world datasets as well as compare our method with state-of-the-art methods. Empirical results and analysis show that our approach outperforms the state-of-the-art methods in both entity and trigger extraction.

Region and Relations Based Multi Attention Network for Graph Classification

Manasvi Aggarwal, M. Narasimha Murty

Responsive image

Auto-TLDR; R2POOL: A Graph Pooling Layer for Non-euclidean Structures

Slides Poster Similar

Graphs are non-euclidean structures that can represent many relational data efficiently. Many studies have proposed the convolution and the pooling operators on the non-euclidean domain. The graph convolution operators have shown astounding performance on various tasks such as node representation and classification. For graph classification, different pooling techniques are introduced, but none of them has considered both neighborhood of the node and the long-range dependencies of the node. In this paper, we propose a novel graph pooling layer R2POOL, which balances the structure information around the node as well as the dependencies with far away nodes. Further, we propose a new training strategy to learn coarse to fine representations. We add supervision at only intermediate levels to generate predictions using only intermediate-level features. For this, we propose the concept of an alignment score. Moreover, each layer's prediction is controlled by our proposed branch training strategy. This complete training helps in learning dominant class features at each layer for representing graphs. We call the combined model by R2MAN. Experiments show that R2MAN the potential to improve the performance of graph classification on various datasets.

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

Peilin Zhou, Zhiqi Huang, Fenglin Liu, Yuexian Zou

Responsive image

Auto-TLDR; Parallel Interactive Network for Spoken Language Understanding

Slides Poster Similar

Spoken Language Understanding (SLU) is an essential part of the spoken dialogue system, which typically consists of intent detection (ID) and slot filling (SF) tasks. Recently, recurrent neural networks (RNNs) based methods achieved the state-of-the-art for SLU. It is noted that, in the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them. However, we noted that, so far, the efforts to obtain better performance by supporting bidirectional and explicit information exchange between ID and SF are not well studied. In addition, few studies attempt to capture the local context information to enhance the performance of SF. Motivated by these findings, in this paper, Parallel Interactive Network (PIN) is proposed to model the mutual guidance between ID and SF. Specifically, given an utterance, a Gaussian self-attentive encoder is introduced to generate the context-aware feature embedding of the utterance which is able to capture local context information. Taking the feature embedding of the utterance, Slot2Intent module and Intent2Slot module are developed to capture the bidirectional information flow for ID and SF tasks. Finally, a cooperation mechanism is constructed to fuse the information obtained from Slot2Intent and Intent2Slot modules to further reduce the prediction bias. The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach, which achieves a competitive result with state-of-the-art models. More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches.

Sketch-SNet: Deeper Subdivision of Temporal Cues for Sketch Recognition

Yizhou Tan, Lan Yang, Honggang Zhang

Responsive image

Auto-TLDR; Sketch Recognition using Invariable Structural Feature and Drawing Habits Feature

Slides Poster Similar

Sketch recognition is a central task in sketchrelated researches. Different from the natural image, the sparse pixel distribution of sketch destroys the visual texture which encourages researchers to explore the temporal information of sketch. With the release of million-scale datasets, we explore the invariable structure of sketch and specific order of strokes in sketch. Prior works based on Recurrent Neural Network (RNN) trend to output different features with changed stroke orders. In particular, we adopt a novel method by employing a Graph Convolutional Network (GCN) to extract invariable structural feature under any orders of strokes. Compared to traditional comprehension of sketch, we further split the temporal information of sketch into two types of feature (invariable structural feature (ISF) and drawing habits feature (DHF)) which aim to reduce the confusion in temporal information. We propose a two-branch GCN-RNN network to extract two types of feature respectively, termed Sketch-SNet. The GCN branch is encouraged to extract the ISF through receiving various shuffled strokes of an input sketch. The RNN branch takes the original input to extract DHF by learning the pattern of strokes’ order. Meanwhile, we introduce semantic information to generate soft-labels owing to the high abstractness of sketch. Extensive experiments on the Quick-Draw dataset demonstrate that our further subdivision of temporal information improves the performance of sketch recognition which surpasses state-of-the-art by a large margin.

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Yaning Li, Liu Yang

Responsive image

Auto-TLDR; Fully Associative Network for Fully Exploiting Correlation Information in Multi-Label Classification

Slides Poster Similar

Recent researches demonstrate that correlation modeling plays a key role in high-performance multi-label classification methods. However, existing methods do not take full advantage of correlation information, especially correlations in feature and label spaces of each image, which limits the performance of correlation-based multi-label classification methods. With more correlations considered, in this study, a Fully Associative Network (FAN) is proposed for fully exploiting correlation information, which involves both visual feature and label correlations. Specifically, FAN introduces a robust covariance pooling to summarize convolution features as global image representation for capturing feature correlation in the multi-label task. Moreover, it constructs an effective label correlation matrix based on a re-weighted scheme, which is fed into a graph convolution network for capturing label correlation. Then, correlation between covariance representations (i.e., feature correlation ) and the outputs of GCN (i.e., label correlation) are modeled for final prediction. Experimental results on two datasets illustrate the effectiveness and efficiency of our proposed FAN compared with state-of-the-art methods.

Constructing Geographic and Long-term Temporal Graph for Traffic Forecasting

Yiwen Sun, Yulu Wang, Kun Fu, Zheng Wang, Changshui Zhang, Jieping Ye

Responsive image

Auto-TLDR; GLT-GCRNN: Geographic and Long-term Temporal Graph Convolutional Recurrent Neural Network for Traffic Forecasting

Slides Poster Similar

Traffic forecasting influences various intelligent transportation system (ITS) services and is of great significance for user experience as well as urban traffic control. It is challenging due to the fact that the road network contains complex and time-varying spatial-temporal dependencies. Recently, deep learning based methods have achieved promising results by adopting graph convolutional network (GCN) to extract the spatial correlations and recurrent neural network (RNN) to capture the temporal dependencies. However, the existing methods often construct the graph only based on road network connectivity, which limits the interaction between roads. In this work, we propose Geographic and Long-term Temporal Graph Convolutional Recurrent Neural Network (GLT-GCRNN), a novel framework for traffic forecasting that learns the rich interactions between roads sharing similar geographic or long-term temporal patterns. Extensive experiments on a real-world traffic state dataset validate the effectiveness of our method by showing that GLT-GCRNN outperforms the state-of-the-art methods in terms of different metrics.

On the Global Self-attention Mechanism for Graph Convolutional Networks

Chen Wang, Deng Chengyuan

Responsive image

Auto-TLDR; Global Self-Attention Mechanism for Graph Convolutional Networks

Slides Similar

Applying Global Self-Attention (GSA) mechanism over features has achieved remarkable success on Convolutional Neural Networks (CNNs). However, it is not clear if Graph Convolutional Networks (GCNs) can similarly benefit from such a technique. In this paper, inspired by the similarity between CNNs and GCNs, we study the impact of the Global Self-Attention mechanism on GCNs. We find that consistent with the intuition, the GSA mechanism allows GCNs to capture feature-based vertex relations regardless of edge connections; As a result, the GSA mechanism can introduce extra expressive power to the GCNs. Furthermore, we analyze the impacts of the GSA mechanism on the issues of overfitting and over-smoothing. We prove that the GSA mechanism can alleviate both the overfitting and the over-smoothing issues based on some recent technical developments. Experiments on multiple benchmark datasets illustrate both superior expressive power and less significant overfitting and over-smoothing problems for the GSA-augmented GCNs, which corroborate the intuitions and the theoretical results.

Multi-Graph Convolutional Network for Relationship-Driven Stock Movement Prediction

Jiexia Ye, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu

Responsive image

Auto-TLDR; Multi-GCGRU: A Deep Learning Framework for Stock Price Prediction with Cross Effect

Slides Poster Similar

Stock price movement prediction is commonly accepted as a very challenging task due to the volatile nature of financial markets. Previous works typically predict the stock price mainly based on its own information, neglecting the cross effect among involved stocks. However, it is well known that an individual stock price is correlated with prices of other stocks in complex ways. To take the cross effect into consideration, we propose a deep learning framework, called Multi-GCGRU, which comprises graph convolutional network (GCN) and gated recurrent units (GRU) to predict stock movement. Specifically, we first encode multiple relationships among stocks into graphs based on financial domain knowledge and utilize GCN to extract the cross effect based on the pre-defined graphs. The cross-correlation features produced by GCN are concatenated with historical records and fed into GRU to model the temporal pattern in stock price. To further get rid of prior knowledge, we explore an adaptive stock graph learned by data automatically. Experiments on two stock indexes in China market show that our model outperforms other baselines. Note that our model is rather feasible to incorporate more effective pre-defined stock relationships. What's more, it can also learn a data-driven relationship without any domain knowledge.

Revisiting Graph Neural Networks: Graph Filtering Perspective

Hoang Nguyen-Thai, Takanori Maehara, Tsuyoshi Murata

Responsive image

Auto-TLDR; Two-Layers Graph Convolutional Network with Graph Filters Neural Network

Slides Poster Similar

In this work, we develop quantitative results to the learnability of a two-layers Graph Convolutional Network (GCN). Instead of analyzing GCN under some classes of functions, our approach provides a quantitative gap between a two-layers GCN and a two-layers MLP model. From the graph signal processing perspective, we provide useful insights to some flaws of graph neural networks for vertex classification. We empirically demonstrate a few cases when GCN and other state-of-the-art models cannot learn even when true vertex features are extremely low-dimensional. To demonstrate our theoretical findings and propose a solution to the aforementioned adversarial cases, we build a proof of concept graph neural network model with different filters named Graph Filters Neural Network (gfNN).

What Nodes Vote To? Graph Classification without Readout Phase

Yuxing Tian, Zheng Liu, Weiding Liu, Zeyu Zhang, Yanwen Qu

Responsive image

Auto-TLDR; node voting based graph classification with convolutional operator

Slides Poster Similar

In recent years, many researchers have started to construct Graph Neural Networks (GNNs) to deal with graph classification task. Those GNNs can fit into a framework named Message Passing Neural Networks (MPNNs), which consists of two phases: a Message Passing phase used for updating node embeddings and a Readout phase. In Readout phase, node embeddings are aggregated to extract graph feature used for classification. However, the above operation may obscure the affect of the node embedding of each node on graph classification. Therefore, a node voting based graph classification model is proposed in this paper, called Node Voting net (NVnet). Similar to the MPNNs, NVnet also contains the Message Passing phase. The main differences between NVnet and MPNNs are: 1, a decoder for graph reconstruction is added to NVnet to make node embeddings contain as much graph structure information as possible; 2, NVnet replaces the Readout phase with a new phase called Node Voting phase. In the Node Voting phase, an attention layer based on the gate mechanism is constructed to help each node observe the node embeddings of other nodes in the graph, and each node predicts the graph class from its own perspective. The above process is called node voting. After voting, the results of all nodes are aggregated to get the final graph classification result. In addition, considering that aggregation operation may also obscure the difference between node voting results, our solution is to add a regularization term to drive node voting results to reach group consensus. We evaluate the performance of the NVnet on 4 benchmark datasets. The experimental results show that compared with other 10 baselines, NVnet can achieve higher graph classification accuracy on datasets by using appropriate convolutional operator.

Adversarial Training for Aspect-Based Sentiment Analysis with BERT

Akbar Karimi, Andrea Prati, Leonardo Rossi

Responsive image

Auto-TLDR; Adversarial Training of BERT for Aspect-Based Sentiment Analysis

Slides Poster Similar

Aspect-Based Sentiment Analysis (ABSA) studies the extraction of sentiments and their targets. Collecting labeled data for this task in order to help neural networks generalize better can be laborious and time-consuming. As an alternative, similar data to the real-world examples can be produced artificially through an adversarial process which is carried out in the embedding space. Although these examples are not real sentences, they have been shown to act as a regularization method which can make neural networks more robust. In this work, we fine-tune the general purpose BERT and domain specific post-trained BERT (BERT-PT) using adversarial training. After improving the results of post-trained BERT with different hyperparameters, we propose a novel architecture called BERT Adversarial Training (BAT) to utilize adversarial training for the two major tasks of Aspect Extraction and Aspect Sentiment Classification in sentiment analysis. The proposed model outperforms the general BERT as well as the in-domain post-trained BERT in both tasks. To the best of our knowledge, this is the first study on the application of adversarial training in ABSA. The code is publicly available on a GitHub repository at https://github.com/IMPLabUniPr/Adversarial-Training-fo r-ABSA

Learning Neural Textual Representations for Citation Recommendation

Thanh Binh Kieu, Inigo Jauregi Unanue, Son Bao Pham, Xuan-Hieu Phan, M. Piccardi

Responsive image

Auto-TLDR; Sentence-BERT cascaded with Siamese and triplet networks for citation recommendation

Slides Poster Similar

With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming. While several approaches for automated citation recommendation have been proposed in the recent years, effective document representations for citation recommendation are still elusive to a large extent. For this reason, in this paper we propose a novel approach to citation recommendation which leverages a deep sequential representation of the documents (Sentence-BERT) cascaded with Siamese and triplet networks in a submodular scoring function. To the best of our knowledge, this is the first approach to combine deep representations and submodular selection for a task of citation recommendation. Experiments have been carried out using a popular benchmark dataset -- the ACL Anthology Network corpus -- and evaluated against baselines and a state-of-the-art approach using metrics such as the MRR and F1@k score. The results show that the proposed approach has been able to outperform all the compared approaches in every measured metric.

CKG: Dynamic Representation Based on Context and Knowledge Graph

Xunzhu Tang, Tiezhu Sun, Rujie Zhu

Responsive image

Auto-TLDR; CKG: Dynamic Representation Based on Knowledge Graph for Language Sentences

Slides Poster Similar

Recently, neural language representation models pre-trained on large corpus can capture rich co-occurrence information and be fine-tuned in downstream tasks to improve the performance. As a result, they have achieved state-of-the-art results in a large range of language tasks. However, there exists other valuable semantic information such as similar, opposite, or other possible meanings in external knowledge graphs (KGs). We argue that entities in KGs could be used to enhance the correct semantic meaning of language sentences. In this paper, we propose a new method CKG: Dynamic Representation Based on \textbf{C}ontext and \textbf{K}nowledge \textbf{G}raph. On the one side, CKG can extract rich semantic information of large corpus. On the other side, it can make full use of inside information such as co-occurrence in large corpus and outside information such as similar entities in KGs. We conduct extensive experiments on a wide range of tasks, including QQP, MRPC, SST-5, SQuAD, CoNLL 2003, and SNLI. The experiment results show that CKG achieves SOTA 89.2 on SQuAD compared with SAN (84.4), ELMo (85.8), and BERT$_{Base}$ (88.5).

Learning Connectivity with Graph Convolutional Networks

Hichem Sahbi

Responsive image

Auto-TLDR; Learning Graph Convolutional Networks Using Topological Properties of Graphs

Slides Poster Similar

Learning graph convolutional networks (GCNs) is an emerging field which aims at generalizing convolutional operations to arbitrary non-regular domains. In particular, GCNs operating on spatial domains show superior performances compared to spectral ones, however their success is highly dependent on how the topology of input graphs is defined. In this paper, we introduce a novel framework for graph convolutional networks that learns the topological properties of graphs. The design principle of our method is based on the optimization of a constrained objective function which learns not only the usual convolutional parameters in GCNs but also a transformation basis that conveys the most relevant topological relationships in these graphs. Experiments conducted on the challenging task of skeleton-based action recognition shows the superiority of the proposed method compared to handcrafted graph design as well as the related work.

Kernel-based Graph Convolutional Networks

Hichem Sahbi

Responsive image

Auto-TLDR; Spatial Graph Convolutional Networks in Recurrent Kernel Hilbert Space

Slides Poster Similar

Learning graph convolutional networks (GCNs) is an emerging field which aims at generalizing deep learning to arbitrary non-regular domains. Most of the existing GCNs follow a neighborhood aggregation scheme, where the representation of a node is recursively obtained by aggregating its neighboring node representations using averaging or sorting operations. However, these operations are either ill-posed or weak to be discriminant or increase the number of training parameters and thereby the computational complexity and the risk of overfitting. In this paper, we introduce a novel GCN framework that achieves spatial graph convolution in a reproducing kernel Hilbert space. The latter makes it possible to design, via implicit kernel representations, convolutional graph filters in a high dimensional and more discriminating space without increasing the number of training parameters. The particularity of our GCN model also resides in its ability to achieve convolutions without explicitly realigning nodes in the receptive fields of the learned graph filters with those of the input graphs, thereby making convolutions permutation agnostic and well defined. Experiments conducted on the challenging task of skeleton-based action recognition show the superiority of the proposed method against different baselines as well as the related work.

Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

Manuel Carbonell, Pau Riba, Mauricio Villegas, Alicia Fornés, Josep Llados

Responsive image

Auto-TLDR; Graph Neural Network for Entity Recognition and Relation Extraction in Semi-Structured Documents

Slides Similar

The use of administrative documents to communicate and leave record of business information requires of methods able to automatically extract and understand the content from such documents in a robust and efficient way. In addition, the semi-structured nature of these reports is specially suited for the use of graph-based representations which are flexible enough to adapt to the deformations from the different document templates. Moreover, Graph Neural Networks provide the proper methodology to learn relations among the data elements in these documents. In this work we study the use of Graph Neural Network architectures to tackle the problem of entity recognition and relation extraction in semi-structured documents. Our approach achieves state of the art results on the three tasks involved in the process. Moreover, the experimentation with two datasets of different nature demonstrates the good generalization ability of our approach.

Graph Convolutional Neural Networks for Power Line Outage Identification

Jia He, Maggie Cheng

Responsive image

Auto-TLDR; Graph Convolutional Networks for Power Line Outage Identification

Poster Similar

In this paper, we consider the power line outage identification problem as a graph signal classification problem, where the signal at each vertex is given as a time series. We propose graph convolutional networks (GCNs) for the task of classifying signals supported on graphs. An important element of the GCN design is filter design. We consider filtering signals in either the vertex (spatial) domain, or the frequency (spectral) domain. Two basic architectures are proposed. In the spatial GCN architecture, the GCN uses a graph shift operator as the basic building block to incorporate the underlying graph structure into the convolution layer. The spatial filter directly utilizes the graph connectivity information. It defines the filter to be a polynomial in the graph shift operator to obtain the convolved features that aggregate neighborhood information of each node. In the spectral GCN architecture, a frequency filter is used instead. A graph Fourier transform operator first transforms the raw graph signal from the vertex domain to the frequency domain, and then a filter is defined using the graph's spectral parameters. The spectral GCN then uses the output from the graph Fourier transform to compute the convolved features. There are additional challenges to classify the time-evolving graph signal as the signal value at each vertex changes over time. The GCNs are designed to recognize different spatiotemporal patterns from high-dimensional data defined on a graph. The application of the proposed methods to power line outage identification shows that these GCN architectures can successfully classify abnormal signal patterns and identify the outage location.

Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering

Yaoyuan Liang, Xin Wang, Xuguang Duan, Wenwu Zhu

Responsive image

Auto-TLDR; Multi-modal Contextual Graph Neural Network for Text Visual Question Answering

Slides Poster Similar

Text visual question answering (TextVQA) targets at answering the question related to texts appearing in the given images, posing more challenges than VQA by requiring a deeper recognition and understanding of various shapes of human-readable scene texts as well as their meanings in different contexts. Existing works on TextVQA suffer from two weaknesses: i) scene texts and non-textual objects are processed separately and independently without considering their mutual interactions during the question understanding and answering process, ii) scene texts are encoded only through word embeddings without taking the corresponding visual appearance features as well as their potential relationships with other non-textual objects in the images into account. To overcome the weakness of exiting works, we propose a novel multi-modal contextual graph neural network (MCG) model for TextVQA. The proposed MCG model can capture the relationships between visual features of scene texts and non-textual objects in the given images as well as utilize richer sources of multi-modal features to improve the model performance. In particular, we encode the scene texts into richer features containing textual, visual and positional features, then model the visual relations between scene texts and non-textual objects through a contextual graph neural network. Our extensive experiments on real-world dataset demonstrate the advantages of the proposed MCG model over baseline approaches.

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Yanan Li, Yuetan Lin, Hongrui Zhao, Donghui Wang

Responsive image

Auto-TLDR; TextVQA: An End-to-End Visual Question Answering Model for Text-Based VQA

Slides Similar

As a typical cross-modal problem, visual question answering (VQA) has received increasing attention from the communities of computer vision and natural language processing. Reading and reasoning about texts and visual contents in the images is a burgeoning and important research topic in VQA, especially for the visually impaired assistance applications. Given an image, it aims to predict an answer to a provided natural language question closely related to its textual contents. In this paper, we propose a novel end-to-end textual content based VQA model, which grounds question answering both on the visual and textual information. After encoding the image, question and recognized text words, it uses multi-modal factorized high-order modules and the attention mechanism to fuse question-image and question-text features respectively. The complex correlations among different features can be captured efficiently. To ensure the model's extendibility, it embeds candidate answers and recognized texts in a semantic embedding space and adopts semantic embedding based classifier to perform answer prediction. Extensive experiments on the newly proposed benchmark TextVQA demonstrate that the proposed model can achieve promising results.

Privacy Attributes-Aware Message Passing Neural Network for Visual Privacy Attributes Classification

Hanbin Hong, Wentao Bao, Yuan Hong, Yu Kong

Responsive image

Auto-TLDR; Privacy Attributes-Aware Message Passing Neural Network for Visual Privacy Attribute Classification

Slides Poster Similar

Visual Privacy Attribute Classification (VPAC) identifies privacy information leakage via social media images. These images containing privacy attributes such as skin color, face or gender are classified into multiple privacy attribute categories in VPAC. With limited works in this task, current methods often extract features from images and simply classify the extracted feature into multiple privacy attribute classes. The dependencies between privacy attributes, e.g., skin color and face typically co-exist in the same image, are usually ignored in classification, which causes performance degradation in VPAC. In this paper, we propose a novel end-to-end Privacy Attributes-aware Message Passing Neural Network (PA-MPNN) to address VPAC. Privacy attributes are considered as nodes on a graph and an MPNN is introduced to model the privacy attribute dependencies. To generate representative features for privacy attribute nodes, a class-wise encoder-decoder is proposed to learn a latent space for each attribute. An attention mechanism with multiple correlation matrices is also introduced in MPNN to learn the privacy attributes graph automatically. Experimental results on the Privacy Attribute Dataset demonstrate that our framework achieves better performance than state-of-the-art methods on visual privacy attributes classification.

Tackling Contradiction Detection in German Using Machine Translation and End-To-End Recurrent Neural Networks

Maren Pielka, Rafet Sifa, Lars Patrick Hillebrand, David Biesner, Rajkumar Ramamurthy, Anna Ladi, Christian Bauckhage

Responsive image

Auto-TLDR; Contradiction Detection in Natural Language Inference using Recurrent Neural Networks

Slides Poster Similar

Natural Language Inference, and specifically Contradiction Detection, is still an unexplored topic with respect to German text. In this paper, we apply Recurrent Neural Network (RNN) methods to learn contradiction-specific sentence embeddings. Our data set for evaluation is a machine-translated version of the Stanford Natural Language Inference (SNLI) corpus. The results are compared to a baseline using unsupervised vectorization techniques, namely tf-idf and Flair, as well as state-of-the art transformer-based (MBERT) methods. We find that the end-to-end models outperform the models trained on unsupervised embeddings, which makes them the better choice in an empirical use case. The RNN methods also perform superior to MBERT on the translated data set.

Open Set Domain Recognition Via Attention-Based GCN and Semantic Matching Optimization

Xinxing He, Yuan Yuan, Zhiyu Jiang

Responsive image

Auto-TLDR; Attention-based GCN and Semantic Matching Optimization for Open Set Domain Recognition

Slides Poster Similar

Open set domain recognition has got the attention in recent years. The task aims to specifically classify each sample in the practical unlabeled target domain, which consists of all known classes in the manually labeled source domain and target-specific unknown categories. The absence of annotated training data or auxiliary attribute information for unknown categories makes this task especially difficult. Moreover, exiting domain discrepancy in label space and data distribution further distracts the knowledge transferred from known classes to unknown classes. To address these issues, this work presents an end-to-end model based on attention-based GCN and semantic matching optimization, which first employs the attention mechanism to enable the central node to learn more discriminating representations from its neighbors in the knowledge graph. Moreover, a coarse-to-fine semantic matching optimization approach is proposed to progressively bridge the domain gap. Experimental results validate that the proposed model not only has superiority on recognizing the images of known and unknown classes, but also can adapt to various openness of the target domain.

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

Zhuo Chen, Fei Yin, Xu-Yao Zhang, Qing Yang, Cheng-Lin Liu

Responsive image

Auto-TLDR; Cross-Lingual Text Image Recognition with Multi-task Learning

Slides Poster Similar

This paper considers recognizing texts shown in a source language and translating into a target language, without generating the intermediate source language text image recognition results. We call this problem Cross-Lingual Text Image Recognition (CLTIR). To solve this problem, we propose a multi-task system containing a main task of CLTIR and an auxiliary task of Mono-Lingual Text Image Recognition (MLTIR) simultaneously. Two different sequence to sequence learning methods, a convolution based attention model and a BLSTM model with CTC, are adopted for these tasks respectively. We evaluate the system on a newly collected Chinese-English bilingual movie subtitle image dataset. Experimental results demonstrate the multi-task learning framework performs superiorly in both languages.

Boundary-Aware Graph Convolution for Semantic Segmentation

Hanzhe Hu, Jinshi Cui, Jinshi Hongbin Zha

Responsive image

Auto-TLDR; Boundary-Aware Graph Convolution for Semantic Segmentation

Slides Poster Similar

Recent works have made great progress in semantic segmentation by exploiting contextual information in a local or global manner with dilated convolutions, pyramid pooling or self-attention mechanism. However, few works have focused on harvesting boundary information to improve the segmentation performance. In order to enhance the feature similarity within the object and keep discrimination from other objects, we propose a boundary-aware graph convolution (BGC) module to propagate features within the object. The graph reasoning is performed among pixels of the same object apart from the boundary pixels. Based on the proposed BGC module, we further introduce the Boundary-aware Graph Convolution Network(BGCNet), which consists of two main components including a basic segmentation network and the BGC module, forming a coarse-to-fine paradigm. Specifically, the BGC module takes the coarse segmentation feature map as node features and boundary prediction to guide graph construction. After graph convolution, the reasoned feature and the input feature are fused together to get the refined feature, producing the refined segmentation result. We conduct extensive experiments on three popular semantic segmentation benchmarks including Cityscapes, PASCAL VOC 2012 and COCO Stuff, and achieve state-of-the-art performance on all three benchmarks.

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Hyunjin Choi, Judong Kim, Seongho Joe, Youngjune Gwon

Responsive image

Auto-TLDR; Sentence Embedding Models for BERT and ALBERT: A Comparison and Evaluation

Slides Poster Similar

Contextualized representations from a pre-trained language model are central to achieve a high performance on downstream NLP task. The pre-trained BERT and A Lite BERT (ALBERT) models can be fine-tuned to give state-of-the-art results in sentence-pair regressions such as semantic textual similarity (STS) and natural language inference (NLI). Although BERT-based models yield the [CLS] token vector as a reasonable sentence embedding, the search for an optimal sentence embedding scheme remains an active research area in computational linguistics. This paper explores on sentence embedding models for BERT and ALBERT. In particular, we take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT). We also experiment with an outer CNN sentence-embedding network for SBERT and SALBERT. We evaluate performances of all sentence-embedding models considered using the STS and NLI datasets. The empirical results indicate that our CNN architecture improves ALBERT models substantially more than BERT models for STS benchmark. Despite significantly fewer model parameters, ALBERT sentence embedding is highly competitive to BERT in downstream NLP evaluations.

Classification of Intestinal Gland Cell-Graphs Using Graph Neural Networks

Linda Studer, Jannis Wallau, Heather Dawson, Inti Zlobec, Andreas Fischer

Responsive image

Auto-TLDR; Graph Neural Networks for Classification of Dysplastic Gland Glands using Graph Neural Networks

Slides Poster Similar

We propose to classify intestinal glands as normal or dysplastic using cell-graphs and graph-based deep learning methods. Dysplastic intestinal glands can lead to colorectal cancer, which is one of the three most common cancer types in the world. In order to assess the cancer stage and thus the treatment of a patient, pathologists analyse tissue samples of affected patients. Among other factors, they look at the changes in morphology of different tissues, such as the intestinal glands. Cell-graphs have a high representational power and can describe topological and geometrical properties of intestinal glands. However, classical graph-based methods have a high computational complexity and there is only a limited range of machine learning methods available. In this paper, we propose Graph Neural Networks (GNNs) as an efficient learning-based approach to classify cell-graphs. We investigate different variants of so-called Message Passing Neural Networks and compare them with a classical graph-based approach based on approximated Graph Edit Distance and k-nearest neighbours classifier. A promising classification accuracy of 94.1% is achieved by the proposed method on the pT1 Gland Graph dataset, which is an increase of 11.5% over the baseline result.

Assessing the Severity of Health States Based on Social Media Posts

Shweta Yadav, Joy Prakash Sain, Amit Sheth, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya

Responsive image

Auto-TLDR; A Multiview Learning Framework for Assessment of Health State in Online Health Communities

Slides Poster Similar

The unprecedented growth of Internet users has resulted in an abundance of unstructured information on social media including health forums, where patients request health-related information or opinions from other users. Previous studies have shown that online peer support has limited effectiveness without expert intervention. Therefore, a system capable of assessing the severity of health state from the patients' social media posts can help health professionals (HP) in prioritizing the user’s post. In this study, we inspect the efficacy of different aspects of Natural Language Understanding (NLU) to identify the severity of the user’s health state in relation to two perspectives(tasks) (a) Medical Condition (i.e., Recover, Exist, Deteriorate, Other) and (b) Medication (i.e., Effective, Ineffective, Serious Adverse Effect, Other) in online health communities. We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user’s health state. Specifically, our model utilizes the NLU views such as sentiment, emotions, personality, and use of figurative language to extract the contextual information. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user’s health.

Equation Attention Relationship Network (EARN) : A Geometric Deep Metric Framework for Learning Similar Math Expression Embedding

Saleem Ahmed, Kenny Davila, Srirangaraj Setlur, Venu Govindaraju

Responsive image

Auto-TLDR; Representational Learning for Similarity Based Retrieval of Mathematical Expressions

Slides Poster Similar

Representational Learning in the form of high dimensional embeddings have been used for multiple pattern recognition applications. There has been a significant interest in building embedding based systems for learning representationsin the mathematical domain. At the same time, retrieval of structured information such as mathematical expressions is an important need for modern IR systems. In this work, our motivation is to introduce a robust framework for learning representations for similarity based retrieval of mathematical expressions. Given a query by example, the embedding can find the closest matching expression as a function of euclidean distance between them. We leverage recent advancements in image-based and graph-based deep learning algorithms to learn our similarity embeddings. We do this first, by using uni-modal encoders in graph space and image space and then, a multi-modal combination of the same. To overcome the lack of training data, we force the networks to learn a deep metric using triplets generated with a heuristic scoring function. We also adopt a custom strategy for mining hard samples to train our neural networks. Our system produces rankings similar to those generated by the original scoring function, but using only a fraction of the time. Our results establish the viability of using such a multi-modal embedding for this task.

Object Detection Using Dual Graph Network

Shengjia Chen, Zhixin Li, Feicheng Huang, Canlong Zhang, Huifang Ma

Responsive image

Auto-TLDR; A Graph Convolutional Network for Object Detection with Key Relation Information

Slides Similar

Most object detection methods focus only on the local information near the region proposal and ignore the object's global semantic relation and local spatial relation information, resulting in limited performance. To capture and explore these important relations, we propose a detection method based on a graph convolutional network (GCN). Two independent relation graph networks are used to obtain the global semantic information of the object in labels and the local spatial information in images. Semantic relation networks can implicitly acquire global knowledge, and by constructing a directed graph on the dataset, each node is represented by the word embedding of labels and then sent to the GCN to obtain high-level semantic representation. The spatial relation network encodes the relation by the positional relation module and the visual connection module, and enriches the object features through local key information from objects. The feature representation is further improved by aggregating the outputs of the two networks. Instead of directly disseminating visual features in the network, the dual-graph network explores more advanced feature information, giving the detector the ability to obtain key relations in labels and region proposals. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that key relation information significantly improve the performance of detection with better ability to detect small objects and reasonable boduning box. The results on COCO dataset demonstrate our method obtains around 32.3% improvement on AP in terms of small objects.

Recurrent Graph Convolutional Networks for Skeleton-Based Action Recognition

Guangming Zhu, Lu Yang, Liang Zhang, Peiyi Shen, Juan Song

Responsive image

Auto-TLDR; Recurrent Graph Convolutional Network for Human Action Recognition

Slides Poster Similar

Human action recognition is one of the challenging and active research fields due to its wide applications. Recently, graph convolutions for skeleton-based action recognition have attracted much attention. Generally, the adjacency matrices of the graph are fixed to the hand-crafted physical connectivity of the human joints, or learned adaptively via deep learining. The hand-crafted or learned adjacency matrices are fixed when processing each frame of an action sequence. However, the interactions of different subsets of joints may play a core role at different phases of an action. Therefore, it is reasonable to evolve the graph topology with time. In this paper, a recurrent graph convolution is proposed, in which the graph topology is evolved via a long short-term memory (LSTM) network. The proposed recurrent graph convolutional network (R-GCN) can recurrently learn the data-dependent graph topologies for different layers, different time steps and different kinds of actions. Experimental results on the NTU RGB+D and Kinetics-Skeleton datasets demonstrate the advantages of the proposed R-GCN.

Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

Xunzhu Tang, Rujie Zhu, Tiezhu Sun

Responsive image

Auto-TLDR; Moto: Enhancing Embedding with Multiple J\textbf{o}int Fac\textBF{to}rs

Slides Poster Similar

Recently, language representation techniques have achieved great performances in text classification. However, most existing representation models are specifically designed for English materials, which may fail in Chinese because of the huge difference between these two languages. Actually, few existing methods for Chinese text classification process texts at a single level. However, as a special kind of hieroglyphics, radicals of Chinese characters are good semantic carriers. In addition, Pinyin codes carry the semantic of tones, and Wubi reflects the stroke structure information, \textit{etc}. Unfortunately, previous researches neglected to find an effective way to distill the useful parts of these four factors and to fuse them. In our works, we propose a novel model called Moto: Enhancing Embedding with \textbf{M}ultiple J\textbf{o}int Fac\textbf{to}rs. Specifically, we design an attention mechanism to distill the useful parts by fusing the four-level information above more effectively. We conduct extensive experiments on four popular tasks. The empirical results show that our Moto achieves SOTA 0.8316 ($F_1$-score, 2.11\% improvement) on Chinese news titles, 96.38 (1.24\% improvement) on Fudan Corpus and 0.9633 (3.26\% improvement) on THUCNews.

VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching

Hui Yuan, Yan Huang, Dongbo Zhang, Zerui Chen, Wenlong Cheng, Liang Wang

Responsive image

Auto-TLDR; Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching

Slides Poster Similar

Image-text matching has made great progresses recently, but there still remains challenges in fine-grained matching. To deal with this problem, we propose an Improved Visual Semantic Reasoning model (VSR++), which jointly models 1) global alignment between images and texts and 2) local correspondence between regions and words in a unified framework. To exploit their complementary advantages, we also develop a suitable learning strategy to balance their relative importance. As a result, our model can distinguish image regions and text words in a fine-grained level, and thus achieves the current stateof-the-art performance on two benchmark datasets.

Automatic Student Network Search for Knowledge Distillation

Zhexi Zhang, Wei Zhu, Junchi Yan, Peng Gao, Guotong Xie

Responsive image

Auto-TLDR; NAS-KD: Knowledge Distillation for BERT

Slides Poster Similar

Pre-trained language models (PLMs), such as BERT, have achieved outstanding performance on multiple natural language processing (NLP) tasks. However, such pre-trained models usually contain a huge number of parameters and are computationally expensive. The high resource demand hinders their application on resource-restricted devices like mobile phones. Knowledge distillation (KD) is an effective compression approach, aiming at encouraging a light-weight student network to imitate the teacher network, and accordingly latent knowledge is transferred from the teacher to student. However, the great majority of student networks in previous KD methods are manually designed, normally a subnetwork of the teacher network. Transformer is generally utilized as the student for compressing BERT but still contains masses of parameters. Motivated by this, we propose a novel approach named NAS-KD, which automatically generates an optimal student network using neural architecture search (NAS) to enhance the distillation for BERT. Experiment on 7 classification tasks in NLP domain demonstrates that NAS-KD can substantially reduce the size of BERT without much performance sacrifice.

Context for Object Detection Via Lightweight Global and Mid-Level Representations

Mesut Erhan Unal, Adriana Kovashka

Responsive image

Auto-TLDR; Context-Based Object Detection with Semantic Similarity

Slides Poster Similar

We propose an approach for explicitly capturing context in object detection. We model visual and geometric relationships between object regions, but also model the global scene as a first-class participant. In contrast to prior approaches, both the context we rely on, as well as our proposed mechanism for belief propagation over regions, is lightweight. We also experiment with capturing similarities between regions at a semantic level, by modeling class co-occurrence and linguistic similarity between class names. We show that our approach significantly outperforms Faster R-CNN, and performs competitively with a much more costly approach that also models context.

Adaptive Word Embedding Module for Semantic Reasoning in Large-Scale Detection

Yu Zhang, Xiaoyu Wu, Ruolin Zhu

Responsive image

Auto-TLDR; Adaptive Word Embedding Module for Object Detection

Slides Poster Similar

In recent years, convolutional neural networks have achieved rapid development in the field of object detection. However, due to the imbalance of data, high costs in labor and uneven level of data labeling, the overall performance of the previous detection network has dropped sharply when dataset extended to the large-scale with hundreds and thousands categories. We present the Adaptive Word Embedding Module, extracting the adaptive semantic knowledge graph to reach semantic consistency within one image. Our method endows the ability to infer global semantic of detection networks without other attribute or relationship annotations. Compared with Faster RCNN, the algorithm on the MSCOCO dataset was significantly improved by 4.1%, and the mAP value has reached 32.8%. On the VG1000 dataset, it increased by 0.9% to 6.7% compared with Faster RCNN. Adaptive Word Embedding Module is lightweight, general-purpose and can be plugged into diverse detection networks. Code will be made available.

TreeRNN: Topology-Preserving Deep Graph Embedding and Learning

Yecheng Lyu, Ming Li, Xinming Huang, Ulkuhan Guler, Patrick Schaumont, Ziming Zhang

Responsive image

Auto-TLDR; TreeRNN: Recurrent Neural Network for General Graph Classification

Slides Poster Similar

General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are learned to direct the feature integration from local to global. To this end, we apply the breadth first search (BFS) to construct trees from the graphs, which adds direction to the graph edges from the center node to the peripheral nodes. In addition, we proposed a novel projection scheme that transfer the trees to image representations, which is suitable for conventional convolution neural networks (CNNs) and recurrent neural networks (RNNs). To best learn the patterns from the graph-tree-images, we propose TreeRNN, a 2D RNN architecture that recurrently integrates the image pixels by rows and columns to help classify the graph categories. We evaluate the proposed method on several graph classification datasets, and manage to demonstrate comparable accuracy with the state-of-the-art on MUTAG, PTC-MR and NCI1 datasets.

A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition

Qianhui Men, Edmond S. L. Ho, Shum Hubert P. H., Howard Leung

Responsive image

Auto-TLDR; Two-Stream Recurrent Neural Network for Human-Human Interaction Recognition

Slides Poster Similar

This paper addresses the problem of recognizing human-human interaction from skeletal sequences. Existing methods are mainly designed to classify single human action. Many of them simply stack the movement features of two characters to deal with human interaction, while neglecting the abundant relationships between characters. In this paper, we propose a novel two-stream recurrent neural network by adopting the geometric features from both single actions and interactions to describe the spatial correlations with different discriminative abilities. The first stream is constructed under pairwise joint distance (PJD) in a fully-connected mesh to categorize the interactions with explicit distance patterns. To better distinguish similar interactions, in the second stream, we combine PJD with the spatial features from individual joint positions using graph convolutions to detect the implicit correlations among joints, where the joint connections in the graph are adaptive for flexible correlations. After spatial modeling, each stream is fed to a bi-directional LSTM to encode two-way temporal properties. To take advantage of the diverse discriminative power of the two streams, we come up with a late fusion algorithm to combine their output predictions concerning information entropy. Experimental results show that the proposed framework achieves state-of-the-art performance on 3D and comparable performance on 2D interaction datasets. Moreover, the late fusion results demonstrate the effectiveness of improving the recognition accuracy compared with single streams.

Geographic-Semantic-Temporal Hypergraph Convolutional Network for Traffic Flow Prediction

Kesu Wang, Jing Chen, Shijie Liao, Jiaxin Hou, Qingyu Xiong

Responsive image

Auto-TLDR; Geographic-semantic-temporal convolutional network for traffic flow prediction

Similar

Traffic flow prediction is becoming an increasingly important part for intelligent transportation control and management. This task is challenging due to (1) complex geographic and non-geographic spatial correlation; (2) temporal correlations between time slices; (3) dynamics of semantic high-order correlations along temporal dimension. To address those difficulties, commonly-used methods apply graph convolutional networks for spatial correlations and recurrent neural networks for temporal dependencies. In this work, We distinguish the two aspects of spatial correlations and propose the two types of spatial graphes, named as geographic graph and semantic hypergraph. We extend the traditional convolution and propose geographic-temporal graph convolution to jointly capture geographic-temporal correlations and semantic-temporal hypergraph convolution to jointly capture semantic-temporal correlations. Then We propose a geographic-semantic-temporal convolutional network (GST-HCN) that combines our graph convolutions and GRU units hierarchically in a unified end-to-end network. The experiment results on the Caltrans Performance Measurement System (PeMS) dataset show that our proposed model significantly outperforms other popular spatio-temporal deep learning models and suggest the effectiveness to explore geographic-semantic-temporal dependencies on deep learning models for traffic flow prediction.

Using Scene Graphs for Detecting Visual Relationships

Anurag Tripathi, Siddharth Srivastava, Brejesh Lall, Santanu Chaudhury

Responsive image

Auto-TLDR; Relationship Detection using Context Aligned Scene Graph Embeddings

Slides Poster Similar

In this paper we solve the problem of detecting relationships between pairs of objects in an image. We develop spatially aware word embeddings using scene graphs and use joint feature representations containing visual, spatial and semantic embeddings from the input images to train a deep network on the task of relationship detection. Further, we propose to utilize context aligned scene graph embeddings from the train set, without requiring explicit availability of scene graphs at test time. We show that the proposed method outperforms the state-of-the-art methods for predicate detection and provides competing results on relationship detection. We also show the generalization ability of the proposed method by performing predictions under zero shot settings. Further, we also provide an exhaustive empirical evaluation on each component of the proposed network.

KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding

Hyunjae Lee, Jaewoong Yun, Bongkyu Hwang, Seongho Joe, Seungjai Min, Youngjune Gwon

Responsive image

Auto-TLDR; KoreALBERT: A monolingual ALBERT model for Korean language understanding

Slides Poster Similar

Abstract—A Lite BERT (ALBERT) has been introduced to scale-up deep bidirectional representation learning for natural languages. Due to the lack of pretrained ALBERT models for Korean language, the best available practice is the multilingual model or resorting back to the any other BERT-based model. In this paper, we develop and pretrain KoreALBERT, a monolingual ALBERT model specifically for Korean language understanding. We introduce a new training objective, namely Word Order Prediction (WOP), and use alongside the existing MLM and SOP criteria to the same architecture and model parameters. Despite having significantly fewer model parameters (thus, quicker to train), our pretrained KoreALBERT outperforms its BERT counterpart on KorQuAD 1.0 benchmark for machine reading comprehension. Consistent with the empirical results in English by Lan et al., KoreALBERT seems to improve downstream task performance involving multi-sentence encoding for Korean language. The pretrained KoreALBERT is publicly available to encourage research and application development for Korean NLP.