Bidirectional Long Short-Term Memory Neural Networks for Linear Sum Assignment Problems

: Many resource allocation problems can be modeled as a linear sum assignment problem (LSAP) in wireless communications. Deep learning techniques such as the fully-connected neural network and convolutional neural network have been used to solve the LSAP. We herein propose a new deep learning model based on the bidirectional long short-term memory (BDLSTM) structure for the LSAP. In the proposed method, the LSAP is divided into sequential sub-assignment problems, and BDLSTM extracts the features from sequential data. Simulation results indicate that the proposed BDLSTM is more memory efﬁcient and achieves a higher accuracy than conventional techniques.


Introduction
The linear sum assignment problem (LSAP) is a special case of the linear programming problem that aims to minimize the cost of assigning multiple tasks to a number of agents on a one-to-one basis. In wireless communications, many resource allocation tasks can be modeled as the LSAP, such as joint relay selection and resource allocation for device-to-device communications [1], joint resource allocation with multi-cell cooperation in virtual multiple-input multiple-output (MIMO) systems [2], and unlicensed channel allocation for LTE systems [3].
For LSAPs, the Hungarian algorithm has been proposed to obtain the optimal solution without an exhaustive search [4]. Furthermore, heuristic algorithms such as deep greedy switching [5], the interior point [6], or the dual forest algorithm [7] have been investigated to estimate near-optimal solutions in real-world problems with time constraints.
In recent years, deep neural networks (DNNs) have recently achieved state-of-the-art performance in different fields such as audio processing [8,9], visual object recognition [10], and other domains [11][12][13]. Neural networks (NNs) can extract features from input implicitly and approximate an arbitrary math function. Several approaches have applied DNNs to solve mathematical optimization problems [14,15]. In [2], a DNN was used to approximate an optimization algorithm in interference management. In [16], an LSAP was decomposed into several sub-assignment problems and two types of DNNs, i.e., the feed-forward neural network (FNN) and convolutional neural network (CNN) [17] were applied to address sub-assignment problems. However, the DNNs discussed in [16] have not achieved much success owing to the increasing size of the problem.
Hence, a bidirectional long short-term memory neural network (BDLSTM) is proposed to solve LSAPs. In the proposed method, an LSAP is decomposed into a sequence of smaller sub-assignment tasks that become the model input, in which each layer consists of two separated smaller LSTM layers. The intermediate outputs of these layers are combined to process data in both the forward and backward directions. The BDLSTM structure is better at remembering the LSAP than other structures by combining the hidden layer outputs of each sub-assignment problem together.
Finally, a collision-avoidance algorithm is applied to the output to obtain the final result. Simulation results indicate that the proposed BDLSTM outperforms the FNN and CNN models when the size of the problem increases, while requiring less parameters to model the patterns.

Problem Formulation
In LSAPs [4], given a set of n jobs I = {1, ..., n} and n agents J = {1, ..., n}, we wish to minimize the cost of assigning each job to each agent. Assigning job i to agent j results in a cost value of U ij , and the problem can be formulated as follows: subject to: where x ij is the decision indicator value. x ij = 1 when job i is assigned to agent j; otherwise, x ij = 0. We can rewrite the cost values as an n by n matrix U = [u 1 , ..., u n ] T and decision indicator values as an n by n permutation matrix X = [x 1 , ..., In the communication field, many optimization problems are similar or can be formulated as LSAP [1][2][3].
In [16], the LSAP problem was firstly decomposed into multiple sub-assignment problems to implement DNNs. However, the DNNs in [16] did not address the sequential information in the cost matrix, as multiple FNN and CNN models must be applied to solve all sub-assignment problems simultaneously. Here, a single neural network model is considered to extract the necessary features to solve LSAPs.

Bidirectional LSTM
The structure of a conventional LSTM model is illustrated in Figure 1. Here, we provide a short introduction to the LSTM model. For further details, we refer the interested reader to [18]. LSTM [18] was designed with special memory cells to store temporal information. This structure allows LSTM to remember long-range features better than conventional recurrent neural networks. With a multilayer model, components of a cell at time step i at layer l in the forward direction can be implemented by the following functions: − → h l i are the input gate, forget gate, output gate, candidate gate, cell state, and hidden state, respectively, all of which are of the size of an N l -dimensional vector. In (6)-(9), the W l terms are the weight matrices between cells of layer (l − 1)-l, the V l terms are the weight matrices between consecutive cells of layer l, and the b l terms are the bias vector at each layer. The weight matrices and bias values in a cell are shared along the length of the sequence, thus reducing the total number of weights and hidden neurons in the network. The sigmoid function σ and hyperbolic tangent function are used as activation functions, and denote element-wise multiplication. A bidirectional LSTM can process the data in both the forward and backward directions using two separate LSTM layers. The forward hidden state, − → h l i , calculated by the formulas above, and the backward state, ← − h l i , calculated similarly, are concatenated and then fed forward to the next layer, as demonstrated in Figure 2: where l = 0 is the input layer. BDLSTM is better at obtaining the relations among elements in a whole sequence by utilizing information in both directions, instead of remembering the features in only one direction such as that of the conventional LSTM. Furthermore, by using the parameter-sharing technique in a layer, the BDLSTM model requires less memory to solve the problem compared to the conventional FNN and CNN.

Proposed Method
This section presents the proposed BDLSTM model for LSAPs. The architecture and details of the proposed model are elaborated in Section 3.1. In Section 3.2, the training process is explained. Figure 3 shows the proposed system architecture for solving an LSAP. A deep-learning-based BDLSTM model is first applied to the cost matrix of the LSAP. After that, a collision-avoidance algorithm is applied to solve any interference in the outputs of the proposed BDLSTM model. In the proposed BDLSTM model, the constraint in (3) is not taken into consideration, which guarantees that one job can be assigned to only one agent. This is because the proposed BDLSTM model solves each sub-assignment task as a separate classification problem. The collision-avoidance algorithm prevents one job from being assigned to multiple agents simultaneously.

Proposed BDLSTM
Collision avoidance Output LSAP After L consecutive BDLSTM layers to extract the features, all hidden states at the last layer ← → h L i will be applied in an identical fully-connected layer with the softmax activation function to obtain the output matrix Y = [y 1 , ..., y n ] T : where W y and b y are the weight matrix and bias of the output layer, respectively. After the BDLSTM model, a decision matrix can be obtained by Y = [y 1 , ..., y n ] T , where y i is obtained by one-hot encoding of y i . In collision avoidance, the collision cases in Y are corrected using the Hungarian algorithm by the following steps:

•
Step 1: Find jobs that are assigned to multiple or no agents.

•
Step 2: Extract the cost values of jobs from Step 1 as a small LSAP.

•
Step 3: Solve the LSAP from Step 2 using the Hungarian algorithm.

•
Step 4: Y is modified into a decision matrix Z using Step 3, where Z is an n by n permutation matrix. For sequential data, FNNs, CNNs, and especially recurrent structures such as LSTM and BDLSTM are widely used and have achieved many successes. FNNs can be applied to every problem as the baseline architectures. CNNs are more complicated and have good performance in computer vision and some natural language processing tasks. On the other hand, LSTMs and BDLSTMs are recurrent structures built specifically for problems relating to sequential data. In the proposed method, BDLSTM is used to extract the features from the cost matrix C, and each classification is based on the information of the whole cost matrix. Compared to the FNN and CNN in [16], BDLSTM has memory cells, which can remember the relationship between the cost vectors of consecutive sub-assignment tasks.

Training Stage
The training dataset consists of M data entries, each of which is a pair of cost and decision matrices {U, X}. First, we generate the cost matrix U, where each cost value u ij is a real value and is generated from a uniform distribution on (0, 1]. Then, the optimal decision matrix X according to U is obtained for the training dataset using the Hungarian algorithm [4]. After repeating this step M times, we can generate the whole dataset. For this experiment, we used M = 500,000 samples. Among these, 10% of this dataset was sampled randomly for validation at the end of each training epoch to select the hyperparameters while preventing the overfitting problem. After completing the training step, 50,000 unseen samples were used as the test dataset to verify the performance of the trained model.
In the training process, the mini-batch gradient descent procedure was applied to optimize the parameters of the models to minimize the following total loss: where B is a minibatch sampled from the dataset with size |B| and L(m) is the loss computed from samples m ∈ B. For this classification task, we chose the cross-entropy as the loss function, which can be formulated as follows: The weights were updated based on the gradient information of the loss function. We chose the Adam algorithm [19] as the gradient descent method as it required only the first-order gradient to be computed, thus reducing the calculation complexity.

Performance Evaluation
In this section, we evaluate the simulation results of the proposed scheme using a BDLSTM model with different configurations for the LSAP. To obtain the optimized hyper-parameters, such as batch size, number of epochs, and learning rate, the proposed model was trained using several combinations of parameters using grid search, and the best combination was selected. The BDLSTM model with L = 4 layers was chosen, and each layer contained N l = 16 hidden units at each gate. Figure 5 shows the convergence of the BDLSTM over each training epoch on the training and validation set with n = 16 and a batch size |B| = 1024. The loss of the model on both datasets decreased steadily after each epoch, and the gap was minimal, suggesting no overfitting during training.  Table 1 shows the accuracy performance comparisons between the proposed BDLSTM model and other neural network structures, where the FNN and CNN were based on [16], and a uni-directional LSTM model was used with the same number of layers and hidden units for the proposed model. The FNN consisted of n smaller models, each having four layers with 32, 64, 256, and n hidden neurons.
The CNN consisted of five convolutional layers, each containing 32, 32, 32, 32, and n kernels, and n output layers, each with n neurons. The accuracy was defined as the number of jobs correctly assigned to their optimal agents divided by the total number of jobs according to decision matrices X in the dataset. We can see that the proposed BDLSTM performed better than LSTM, FNN, and CNN for all values of n. Because of the bidirectional structure, the proposed BDLSTM exhibited performance improvements between 18% (n = 16) and 27% (n = 8) compared with the conventional LSTM. The performance gap became more significant as n, the size of the LSAP, increased.  Figure 6 shows the complexity comparison of deep neural networks with a size n for the LSAP. Here, we focused on the number of trainable parameters to determine the complexity of the network, where the trainable parameters were the weight and bias values. The proposed BDLSTM was robust with respect to the complexity owing to the change in n compared to FNN and CNN. This was because the FNN model required n different fully-connected layers for n sub-assignment problems, and the sizes of the final layer and output layers of CNN scaled according to n. Because the weights in a layer of the BDLSTM were shared among all sub-assignment problems, the number of trainable parameters was stable to the change in n, resulting in low memory requirements. When n = 16, the BDLSTM achieved the best result while requiring 18-and seven-times fewer parameters than FNN and CNN, respectively. Following the proposed BDLSTM model, the collision-avoidance algorithm was applied to obtain the final output. Table 2 shows the performance and operating time after collision avoidance with n = 8, where all the tests were conducted on the same system with CPU E5-2620 v4, having eight cores at 2.10 GHz, and 32 GB of RAM. Table 2 shows the accuracy performances and operating time after collision avoidance with n = 8, where the accuracy after the collision avoidance could be obtained using X and Z. As shown in Table 2, by achieving the highest classification accuracy, the proposed method reduced the percentage of collisions that occurred in the prediction result, hence limiting the usage of the Hungarian algorithm to solve the collision cases.

Conclusions
We herein proposed a BDLSTM technique to solve LSAPs. In the proposed method, the LSAP was decomposed into sub-assignment problems for classification. The BDLSTM addressed a series of sub-assignment problems, and collision avoidance was performed. The proposed BDLSTM was trained by connecting the information among the sub-assignment problems. Simulation results indicated that the proposed method achieved a higher accuracy compared to the FNN and CNN while requiring less memory for trainable parameters. The proposed method exhibited great potential for solving the LSAP as the model scaled better with the complexity of the problem, requiring less memory for the parameters compared to conventional DNN structures.