Online Facility Location in Evolving Metrics

: The Dynamic Facility Location problem is a generalization of the classic Facility Location problem, in which the distance metric between clients and facilities changes over time. Such metrics that develop as a function of time are usually called “evolving metrics”, thus Dynamic Facility Location can be alternatively interpreted as a Facility Location problem in evolving metrics. The objective in this time-dependent variant is to balance the trade-off between optimizing the classic objective function and the stability of the solution, which is modeled by charging a switching cost when a client’s assignment changes from one facility to another. In this paper, we study the online variant of Dynamic Facility Location. We present a randomized O ( log m + log n ) -competitive algorithm, where m is the number of facilities and n is the number of clients. In the ﬁrst step, our algorithm produces a fractional solution, in each timestep, to the objective of Dynamic Facility Location involving a regularization function. This step is an adaptation of the generic algorithm proposed by Buchbinder et al. in their work “Competitive Analysis via Regularization.” Then, our algorithm rounds the fractional solution of this timestep to an integral one with the use of exponential clocks. We complement our result by proving a lower bound of Ω ( m ) for deterministic algorithms and lower bound of Ω ( log m ) for randomized algorithms. To the best of our knowledge, these are the ﬁrst results for the online variant of the Dynamic Facility Location problem.


Introduction
The Facility Location problem is an extensively studied combinatorial optimization problem, which has many practical applications. In this problem, we are given a single metric on a set of clients and facilities, where each facility is associated with an opening cost. The goal is to find a set of facility locations that minimize the opening cost for all facilities plus the connection cost for all clients, where a client's connection cost is its distance to the nearest facility.
In many natural location and network design settings, client locations are not known in advance. Motivated by this fact, Ref. [1] introduced online facility location problems, where clients arrive one-by-one and must be irrevocably assigned to a facility upon arrival. In practical settings related to online data clustering, new data points arrive and the decision of clustering some data points together should not be regarded as irrevocable [2].
Understanding the dynamics of temporally evolving social or infrastructure networks has been the central question in many applied areas recently. Eisenstat et al. [3] introduced the Dynamic Facility Location problem, which models the temporal aspects of such networks. In this time-dependent variant of Facility Location problem, clients or facilities may change their location over time and the goal is to achieve the best trade-off between the optimal connections of clients to facilities and the stability of solutions between consecutive timesteps. The temporal aspect of the Dynamic Facility Location problem is modeled by T metrics given on the same set of clients and facilities, each representing the metric at time step t ∈ {1, . . . , T}. This is the key difference between the classic Facility Location problem, which is solved in a fixed metric space M that does not develop over time. Therefore, the Dynamic Facility Location is a Facility Location problem in an evolving metric M(t) = {M 1 , . . . , M T }, which changes at each timestep t ∈ {1, . . . , T}.
In this paper, we study the online variant of the Dynamic Facility Location problem, denoted as ODFL, where the metrics on clients and facilities are revealed one by one at each round. The online algorithm must make its decision before the metric of the next round is revealed and without knowing the total number of rounds. This is the key difference between ODFL and the offline variant of Dynamic Facility Location in [3], where the T metrics are known beforehand. Therefore, ODFL attempts to capture realistic settings, where the input data are revealed piece-by-piece and the algorithm must make its decision before upcoming input pieces are revealed. The online aspect of the problem poses new challenges and precludes many algorithmic techniques used to deal with offline problems. Next, we give a formal definition of ODFL.
Model. In Online Dynamic Facility Location, we are given a set of facilities F, |F| = m, a set of clients C, |C| = n, a switching cost g and a facility opening cost f . At each round t ∈ {1, . . . , T}, a new metric between clients and facilities is revealed with the form of a n × m dimensional vector d t , which has entries corresponding to distances over F × C. We denote by d t (i, j) the distance between client j and facility i at time t. At each round t, the goal is to find a subset A t ⊆ F of open facilities and an assignment φ t : C −→ A t of clients to open facilities so as to minimize the objective: where 1{p} is the indicator function of proposition p. The assignment φ t of round t is chosen without knowing the distance vectors d t+1 , . . . , d T of upcoming rounds. The objective function is the sum of the hourly opening costs for each open facility plus the connection costs of each client plus the switching costs (g per change of facility per client). We note that any solution pays switching cost gn at round t = 1, since it switches to an initial assignment of the clients to facilities.
We measure the performance of our online algorithm using the notion of the competitive ratio. Given a request sequence σ, let ALG(σ) denote the cost paid by an online algorithm on σ and let OPT(σ) denote the cost paid on σ by an optimal offline algorithm, which knows σ in advance. The online algorithm is c-competitive if there exists a constant a such that: for all request sequences σ. The factor c is called the competitive ratio.
Related work. The offline and online variants of Facility Location have been studied extensively in the literature. For the offline Facility Location problem, the approximability is Θ(log n) [4] for the non-metric case while, for the metric case, the best lower bound is 1.463 [5], and the best algorithm has an approximation ratio 1.488 [6]. Online Facility Location is known to have a competitive ratio of Θ(log n/ log log n) in the adversarial case for both deterministic and randomized algorithms [7] and a constant competitive ratio if the clients are drawn from a known distribution [8].
The study of Dynamic Facility Location so far concerns the offline case where the changes between distances of clients and facilities are known in advance. Eisenstat et al. [3] showed an upper bound of O(log nT) for the most interesting variant of Dynamic Facility Location with hourly facility costs, where facilities can be closed and are paid for all rounds in which they remain open. This result was later improved by [9], which gave an O(1)-approximation algorithm.
In [10], they provided a framework for designing competitive online algorithms using regularization, which is a widely used technique in online learning. They designed a O(log m)-competitive deterministic algorithm for generating a fractional solution that satisfies a time-varying set of constraints, where m is the number of variables. Then, they provided an O(log m log n)-competitive randomized algorithm for the online set cover problem with a service cost, where m is the number of sets and n is the number of elements. The first step of our online algorithm, which provides a O(log m)-competitive fractional solution, where m is the number of facilities, is inspired by the approach of [10] and a large part of the analysis follows their proof. In the second step of our online algorithm, we show a rounding scheme that works favorably with the fractional solution to obtain a non-trivial additive competitive ratio of O(log m + log n), where n is the number clients. To the best of knowledge, this is the first upper bound for ODFL.

Results
In this work, we study the competitive ratio of ODFL. We start by proving lower bounds for deterministic and randomized algorithms for ODFL. Our first result is the following. Theorem 1. The competitive ratio of any deterministic online algorithm is Ω(m) and the competitive ratio of any randomized online algorithm against the oblivious adversary is Ω(log m) for the Online Dynamic Facility Location problem, where m is the number of facilities.
Our second result is a randomized algorithm, which is O(log m + log n)-competitive. In order to achieve this, we express the offline Dynamic Facility Location problem as a linear program P (Figure 1a). Then, we apply the following two algorithms at each round t.

1.
Algorithm 1 (Regularization algorithm): It solves a linear program minimizing the objective function of P modified to include a smooth convex regularization term and obtains the fractional solution Sol(t).

2.
Algorithm 2 (Rounding algorithm): It rounds the fractional solution Sol(t) of Algorithm 1 to an integral solution using competing exponential clocks.

Algorithm 1 The regularization algorithm
Parameters: > 0, η = ln(1 + n/ ) . At each round t: Let d t ∈ R m×n + be the distance cost vector and let S be the set of feasible solutions. Solve the following linear program (P * ) to obtain the fractional solution (y t , x t ):  Algorithm 1 solves online a linear program to produce a fractional solution at each round t involving the current distance vector d t . This algorithm is essentially the general algorithm presented in [10], which we adapt to ODFL. The performance of the general regularization algorithm is proved by Theorem 1.1 in [10] for the case of time varying covering constraints. Although we follow the same steps to prove the existence of a O(log m)-competitive fractional solution for ODFL, wherem is the number of facilities, we must also address the presence of both covering and precedence constraints in ODFL.
Algorithm 2 is the randomized procedure that rounds the fractional solution provided from Algorithm 1 to an integral solution. Our contribution here is that we use an appropriate rounding which works favorably with Algorithm 1 so as to produce a solution, which is O(log m + log n)-competitive for ODFL. The rounding algorithm makes use of competing exponential clocks, which have been applied in many similar problems like the Dynamic Facility Location problem [9] and the Online Set Cover with a Service Cost problem [10].
There is a randomized algorithm which is O(log m + log n)-competitive for the Online Dynamic Facility Location problem, where m denotes the number of facility locations and n denotes the number of clients.
Organization. In Section 3, we present lower bounds on the competitive ratio of deterministic and randomized algorithms for ODFL. Then, we present Algorithm 1 in Section 4 and Algorithm 2 in Section 5 and prove their guarantees in the respective sections.
Version February 19, 2021 submitted to Algorithms 6 of 13 (a) The linear program P for offline Dynamic Facility Location.
The dual D of the linear program P.
We will prove Theorem 3 by showing that the set of dual variables of the solutions that P * returns is a 174 feasible solution for D within a factor of (1 + (1 + ) ln(1 + m )) of the optimal offline solution, where 175 is a small constant. Specifically, we will use the KKT optimality conditions of P * (the regularized LP) 176 in each round. The constraints define dual variables, which will be plugged in the formulation of the 177 dual D in Figure 1b. This way we will construct a dual solution to the original online problem, which 178 will serve as a lower bound on the optimal offline solution.
179 [10] remarks that their technique can be generalized to facility location problems, without 180 providing any further technical details. In the next lemmas, we verify their claim, by adjusting 181 their approach and proof techniques to ODFL. Recall that the constraint z t in P * . In order to define a feasible solution for D, we introduce the variable b t ij corresponding to this 183 constraint and we let e * ij , a * j be the optimal dual variables of D * corresponding to the precedence and 184 covering constraints respectively.  Proof. Let x * ,t be the optimal solution of P * at round t. Set the variables of D at time t to be: To prove that the solution above is feasible for D, we prove that it satisfies its constraints one by one. This is achieved using the following KKT conditions that hold for P * and its dual:

Lower Bounds
In this section, we prove lower bounds of deterministic and randomized algorithms for ODFL. In both cases, the metric space is a star graph with a client lying on the center of the star for all rounds.
The core idea of the proofs is to force the online algorithm to pay the switching cost at each round. By carefully selecting the parameters of ODFL, we can prove that any deterministic online algorithm is O(m)-competitive. For the randomized lower bound, we use Yao's principle (see examples in Chapter 8 in [11]). Specifically, we choose a randomized instance such that the expected performance of any deterministic algorithm against the optimal offline algorithm is Ω(log m). By Yao's principle, any randomized algorithm has the same lower bound.
Proof. Let OPT denote the optimal cost and ALG denote the cost of an online algorithm. The instance consists of a star graph with m edges and the number of rounds is T = m. Facilities can only be opened in the leaves (a total of m leaves), and there is one client (n = 1) sitting at the center of a graph for all rounds. The distance of every leaf j to the center is initially d j = d. Then, the adversary has the following simple strategy at each round 1 ≤ t ≤ T − 1: For every leaf j, such that the online algorithm connects the client to the facility in j, d j becomes arbitrarily large. At round T, the distances remain the same as in the previous round.
Observe that there is only one leaf with distance d from the center of a star for all rounds. The optimal offline solution just opens a facility at this leaf and connects the client to it for all rounds, thus paying g + T f + Td. On the other side, any competitive online algorithm will prefer to open a new facility at distance d and connect the client to this facility at the start of each round instead of paying the large distance. Therefore, the cost incurred by any online algorithm is at least Tg + T f + Td. By setting g d = f , we have that Turning to the randomized case, the instance consists of the same metric as the deterministic case and the only difference is that we will use randomized adversarial requests. Then, by showing that any deterministic algorithm has a competitive ratio of at least log m and by Yao's principle, we will prove the lower bound for randomized algorithms. Now, the adversary chooses uniformly at random an edge e, which has length d (has not yet become arbitrarily large) at each round 1 ≤ t ≤ T − 1 and makes its length arbitrarily large. At round T, where only one leaf has distance d, the distances remain the same as in the previous round. Again, the optimal solution uses the leaf in distance d at all rounds and pays g + T f + Td. However, the expected switching cost of any competitive algorithm is: Pr[switches at round t] · g = g + since at each round t the edge that the algorithm uses becomes arbitrarily large with probability 1/(m − t + 1). By setting g > Td = T f , This concludes this section with the lower bounds. In the following sections, we present a randomized algorithm for the ODFL problem with a nearly matching bound of O(log m + log n).

The Regularization Algorithm
In this section, we show that the regularization algorithm of [10] can be applied to ODFL and that it produces a fractional solution at each round, which is O(log m)competitive, where m is the number of facility locations. We will prove the following theorem: Before proceeding to the details of Algorithm 1, we first express the offline Dynamic Facility Location as a linear program, denoted as P (Figure 1a). Algorithm 1 will solve a linear program P * at each round, which will be constructed from P combined with a regularization function. Finally, we will show that the fractional solution of P * is O(log m)-competitive with respect to the solution of the dual program D (Figure 1b) of P, which serves as lower bound on the optimal offline solution. Now, we express offline Dynamic Facility Location as an integer program, which will be relaxed to obtain the linear program P. Recall that T, n, m are the number of rounds, clients, and facility locations, respectively. The first term of the objective function is the total facility opening cost, where f is the cost to open a facility. The second term is the total connection cost, where d t (i, j) is the distance between facility i and client j in round t, and the third term is the total switching cost, where each change of a client's connection to a facility costs g.
We use the decision variables y t i , x t ij and z t ij , where i ∈ [m], j ∈ [n], t ∈ [T]; y t i = 1 if facility i is open at round t and y t i = 0 otherwise, x t ij = 1 if client j is connected to facility i at round t and x t ij = 0, otherwise, z t ij = 1 if client j was connected to facility i at round t but not connected to the same facility i at round t − 1 and z t ij = 0, otherwise. The value of the variable z t ij is imposed from the third constraint, which expresses the switching cost. The first constraint (x t ij ≤ y t i ) ensures that, whenever a client j is connected to a facility i, the facility i is open. The second constraint (∑ m i=1 x ij ≥ 1) guarantees that every client is connected to a facility. Finally, relaxing the decision variables to take non-negative real values, we obtain the LP of Figure 1a, denoted as P.
Next, we are ready to present Algorithm 1. The algorithm is given at each round t a distance vector d t ∈ R m×n + containing the distances between clients and facilities. Then, Algorithm 1 finds the minimizer (y t , x t ) of the linear program P * at each round t, which has two differences from P. The first one is that the last term of the objective function in P (the switching cost) is substituted by the regularization function in P * . The second is that the constraint relative to the switching cost (z t ij ≥ x t ij − x t−1 ij ) in P is omitted in P * . We note that the regularized objective function includes both the previous solution as well as the current cost vector. Thus, the solution in each round is determined greedily and independently of rounds prior to t − 1.
To analyze the performance of Algorithm 1, we will need to construct a lower bound on the optimal offline solution. Therefore, we derive the dual D of P (Figure 1b), which has the following variables (corresponding to the primal constraints on the left): We will prove Theorem 3 by showing that the set of dual variables of the solutions that P * returns is a feasible solution for D within a factor of (1 + (1 + ) ln(1 + m )) of the optimal offline solution, where is a small constant. Specifically, we will use the KKT optimality conditions of P * (the regularized LP) in each round. The constraints define dual variables, which will be plugged in the formulation of the dual D in Figure 1b. This way, we will construct a dual solution to the original online problem, which will serve as a lower bound on the optimal offline solution. Ref. [10] mentions that their technique can be generalized to facility location problems, without providing any further technical details. In the next lemmas, we verify their claim, by adjusting their approach and proof techniques to ODFL. Recall that the constraint z t ij ≥ x t ij − x t−1 ij is omitted in P * . In order to define a feasible solution for D, we introduce the variable b t ij corresponding to this constraint and we let e * ij , a * j be the optimal dual variables of D * corresponding to the precedence and covering constraints, respectively. Lemma 1. The set of optimal solutions for each round t of the dual LP D * of P * (a * ,t , e * ,t ), which satisfy the KKT conditions for an appropriate b t ij , consist of a feasible solution for D.
Proof. Let x * ,t be the optimal solution of P * at round t. Set the variables of D at time t to be: a t j = a * ,t j , e t ij = e * ,t ij and b t+1 ij = g η ln 1 + n x * ,t ij + n To prove that the solution above is feasible for D, we prove that it satisfies its constraints one by one. This is achieved using the following KKT conditions that hold for P * and its dual: The first group of constraints of the dual D (Figure 1b) (∑ n j=1 e t ij ≤ f ) follows easily from KKT condition (3). The same holds for the last two groups of constraints (e t ij ≥ 0 and a t j ≥ 0) due to KKT conditions (1) and (2). Furthermore, by (4) and the construction of b t ij , we have that: The above inequalities prove that the second, third, and fourth group of constraints of D also hold, thus completing the proof of the lemma.
We are now ready to prove Theorem 3, by showing that the dual we constructed can pay for the facility, connection, and switching cost of Algorithm 1. Since we bound together the facility cost and connection cost, we will simply refer to them as the service cost. Throughout the proofs, we will use the following relations: Equalities (5)-(8) are the KKT conditions of P * and its dual and the remaining two inequalities are standard logarithmic inequalities. Theorem 3 will follow from the next two lemmas. The analysis is similar to that of Theorem 1.1 in [10] adapted to the objective of ODFL and also dealing with the presence of precedence constraints. Proof. Let M t be the switching cost of Algorithm 1 at round t. The summation below is taken over increasing values of connection variables, i.e., x * ,t ij > x * ,t−1 ij , since decreasing values only decrease the fractional switching cost: (5)) Hence, This concludes the proof of the lemma.

Lemma 3.
The total service cost S of Algorithm 1 is less than the cost of the dual feasible solution of Lemma 1: Proof. (7) and (6) Notice that that the two terms in the bracket of the right-hand side of the inequality above cancel each other out, since: Therefore, it holds that We can now easily prove the performance of Algorithm 1 stated in Theorem 3.

Proof of Theorem 3.
Let OPT(D) and OPT(P) denote the optimal solutions of the D and P, respectively. By Lemmas 2 and 3, the total cost of Algorithm 1 is: a * ,t j ≤ 1 + ln 1 + n 1 + m n OPT(D) (by Lemma 1) The proof of Theorem 3 concludes this section.

The Rounding Algorithm
In this section, we present Algorithm 2, which makes use of the exponential distribution to round the fractional solution to an integral solution at each round. The analysis shows that the fractional solution grows up to a factor logarithmic in n regarding the facility cost and up to constant factors regarding the switching and connection cost. Before proceeding to the details of Algorithm 2, we give the definition of an exponential random variable and some of their properties.
The rounding algorithm samples independently a total of n · m (one for each clientfacility connection) random variables Z ij from the exponential distribution with rate λ = 1 at the beginning of its execution, which will be used throughout all rounds. Then, at each round t, it chooses for each client j the connection {i, j} minimizing the ratio x t ij is the fractional variable of this connection obtained by Algorithm 1. Notice that, by the properties of Definition 1, the ratio Z ij x t ij is also an exponential random variable. This technique is referred to as competing exponential clocks, since a random variable wins the competition if it has the smallest value among all others (minimizes the ratio Z ij x t ij in our case).
The high level idea of the analysis is that connection and switching cost of the rounded solution add only constant factors to the cost of the connection and switching cost of the fractional solution at each round t. The reason is that they favor connections to facilities that are dependent on the increase/decrease of the fractional variables x t ij . This fact combined with the properties of the exponential distribution leads to a rounding of the right connections indicated by the fractional solution. On the other side, this leads to more open facilities, since we prove that the rounding adds a factor logarithmic in n to the cost of the fractional solution.
Next, we will analyze the performance of Algorithm 2 by bounding separately the facility, connection, and switching cost. We will simply calculate the probabilities of opening any facility, connecting a client to a facility and changing a connection.
Facility cost . We start with the facility cost of the rounding algorithm, which is O(log n)-competitive with respect to the facility cost of the fractional solution.
Proof. Let E ij denote the event that i = arg min i Z i j x i j for some client j and let a > 0 be chosen later. The probability of E ij equals: By choosing a = log n, we have the result.
Connection cost. Next, we show that the connection cost of the rounding algorithm is O(1)-competitive with the connection cost of the fractional solution. Again, let a > 0 be chosen later.
Proof. Similar arguments to the previous proof show that the probability to choose connection ij is: (1 − e −x ≤ x, ∀x and by Definition 1) ≤ a x ij + e −a x ij .
By choosing a sufficiently small a (for example a = 1), we have the result.
Switching cost . Finally, we show that every step that incurs a fractional switching cost of d in a connection variable x ij incurs an expected increase of at most d d+1 in the randomized solution. Thus, the expected number of new connections is O(1).
Proof. We break down the total movement from time t − 1 to t in the fractional solution into m × n intermediate steps, on each of which only the value of exactly one x ij is changed. We take first all the x ij 's whose value increases and then all the x ij 's whose value decreases, thus managing to preserve a feasible solution in all the intermediate steps. This way, the total switching cost from time t − 1 to time t of the fractional solution does not change while the integral switching cost could only increase due to possible changes in the intermediate steps.
First, we will prove the bound in the case the connection variable decreases: , another connection could turn minimal that had not been chosen in the previous time step. This is the only case, when a switching cost is incurred. The probability of this event is bounded by: This expression is maximized when x t−1 ij − d = 0, λ = 1, and therefore is less than Now, we turn to the case where x t ij = x t−1 ij + d. When x ij increases, the ratio Z ij x ij decreases. Therefore, if facility i was chosen in the previous step, it will be chosen again in this step, thus not incurring switching cost. If facility was not chosen in the previous step, it will be chosen in this step with probability Pr Z ij which is no more than d d+1 , following the exact same analysis with the case of the decreasing connection variables.
Finally, it is easy to provide the proof of Theorem 2, which concludes this section.