Next Article in Journal
Information-Based Trade in German Real Estate and Equity Markets
Previous Article in Journal / Special Issue
Production Flexibility and Hedging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stochastic Optimal Control for Online Seller under Reputational Mechanisms

1
Mathematics of Networks and Systems, Bell Labs, 600 Mountain Avenue, Murray Hill, NJ 07974, USA
2
Department of Mathematics, Kettering University, Flint, MI 48504, USA
3
Department of Statistics and Probability and Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
*
Author to whom correspondence should be addressed.
Risks 2015, 3(4), 553-572; https://doi.org/10.3390/risks3040553
Submission received: 28 August 2015 / Accepted: 25 November 2015 / Published: 4 December 2015
(This article belongs to the Special Issue Recent Advances in Mathematical Modeling of the Financial Markets)

Abstract

:
In this work we propose and analyze a model which addresses the pulsing behavior of sellers in an online auction (store). This pulsing behavior is observed when sellers switch between advertising and processing states. We assert that a seller switches her state in order to maximize her profit, and further that this switch can be identified through the seller’s reputation. We show that for each seller there is an optimal reputation, i.e., the reputation at which the seller should switch her state in order to maximize her total profit. We design a stochastic behavioral model for an online seller, which incorporates the dynamics of resource allocation and reputation. The design of the model is optimized by using a stochastic advertising model from [1] and used effectively in the Stochastic Optimal Control of Advertising [2]. This model of reputation is combined with the effect of online reputation on sales price empirically verified in [3]. We derive the Hamilton-Jacobi-Bellman (HJB) differential equation, whose solution relates optimal wealth level to a seller’s reputation. We formulate both a full model, as well as a reduced model with fewer parameters, both of which have the same qualitative description of the optimal seller behavior. Coincidentally, the reduced model has a closed form analytical solution that we construct.

1. Introduction

In this work, we consider sellers in an online market like Amazon without a bidding mechanism. The setting may include concurrent online auctions on a site like eBay, for a product or service with only a “buy-it-now” option, under the assumption that buyers can compare reputation among online platforms. In this online environment, the buyer and seller have a certain amount of anonymity, which has implications on the fairness of the auction [4]. One of the only counter measures to ensure fairness is the use of feedback forums, such as Amazon, eBay, and Yahoo!, at online auction sites. The feedback provided by customers provides a ranking system for the sellers, which in turn rewards fair sellers and penalizes unfair or unreliable sellers [5,6,7]. Since it is in the sellers best interest to maximize their profits, it follows that a seller will seek to use reputation as a means to optimize their decision making [8,9]; e.g., for the optimal bidding strategies in sequential auctions, see [10].
Online auction sellers can be broadly categorized by two types of behavior: those focused on advertising and customer service, which includes following up with past customers; and those who instead focus only on processing existing orders. While the latter behavior will likely lower the seller reputation on average, this immediate increase in wealth may offset the long-term damage from a lower reputation. With this in mind, it is compelling to ask whether there exists an optimal long term strategy, in which a seller attempts to maximize her profit by achieving a fixed reputation.
We propose a simple behavioral model to study the relationship between a seller’s wealth and reputation in the auctions environment with only the “buy-it-now” option. Following the standard Nerlove-Arrow construction [11], which was extended to a stochastic setting by Sethi [1], the wealth and growth of the seller are described by continuous stochastic processes. Note that our model is a continuous approximation of a discrete auctioning process. The discrete model of the process may be less tractable but still very interesting and useful. We propose an optimal infinite horizon strategy, and incorporate the wealth-growth model of Mink and Seifert [3], based on empirical studies of ebay sellers. Also, we use the model for reputation proposed by Sethi [1] and Rao [12] which has been adopted by many others in optimal advertising models, such as Raman’s recent work in Boundary Value Problems in the Stochastic Optimal Control of Advertising [2]. This work incorporates the idea that humans do not multi task well, but rather switch, see [13]. In other words, switching costs do not factor in for an individual agent, but only for large online retailers who have marketing department reorganization costs. Additionally, the model in this work allows all sellers to have a little extra capacity/money for either shipping or advertising (e.g., writing a letter or similar individual campaign), which is rather small and not unlimited.
The rest of the paper is laid out as follows. In Section 2 we define the stochastic Nerlove-Arrow model, along with the growth rate due to Mink and Seifert. Using the principal of optimality we derive the corresponding Hamilton Jacobi Bellman (HJB) boundary value problem satisfied by the value function. In Section 3, we propose several important modifications to the existing methodology, and in doing so derive a new model which is normalized with respect to the seller reputation, and has a linear growth rate. In making these modifications, we also find a reduced version of our new model, which is governed by a geometric Brownian motion. We prove that both models admit unique piecewise continuous solutions, and are qualitatively similar. However the reduced model, surprisingly, has a closed form piecewise defined solution. This allows us to prove that pulsing behavior is an optimal strategy for sellers and also serves as a benchmark to our numerical solution of the full model in Section 4, which does not have a closed form analytic solution. Our results are compelling, and warrant further investigation for the finite horizon case, which we discuss with concluding remarks in Section 5.

2. Explicit Resource Allocation Mechanism

Consider a seller in an online marketplace as a triple (W, R, μ) representing her wealth, reputation, and excess rate, respectively. Here, the excess rate is extra effort that can be allocated to either expedite payment or increase her reputation. In this setting, the reputation R is a positive number reflecting customer satisfaction. As an initial approach, we consider only the effects of reputation on the optimal way to grow her wealth, and leave the more general model with shipping costs for future work. Also, in our mathematical formulation, we consider the reputation mechanism first suggested by the work of Nerlove and Arrow [11] and later generalized to stochastic settings in [1,2,12].
The seller is one of many competing in the online marketplace, and has her transactions verified and processed automatically via the marketplace. All sellers competing in this marketplace have an expected speed for mailing out completed orders to be able to sell in the marketplace, with some time allowed for delayed shipping before the marketplace intervenes. In return, the marketplace guarantees the transaction. Payment is modeled to be released to the seller once the purchased item is mailed to the buyer.
However, the seller can choose to go beyond these expected standards. For example, she may prepare packages and drive to the post office earlier than expected to expedite the payment from the marketplace. Or she may spend time and money communicating with buyers in a bid to increase their favorable ratings. This may include sending a small unexpected gift in the package to be mailed out or spending extra time to individually craft and send emails out to purchasers. Our model assumes that at most only one of these extra actions can happen with noticeable return at any given time. The seller may also decide to do nothing extra, and in this case μ = 0.
For example, if she is focused on getting her payment released, then she is not able to focus on creative ways to engage her buyers post sale. We also assume that her extra capacity to do either extra action is limited by absolute minimum mailing rates and her pool of resources she is willing to contribute to going beyond expected standards. By choosing to shift up her resources from promotion to expedited mailing and back, the seller can influence her wealth and reputation levels via the excess rate μ. Positive μ corresponds to a promotional state, while negative μ corresponds to an expedited mailing state.
We note that if our seller is in fact a large company with many such agents that can work on either processing or sales, then the company can choose to shift the ratio of workers who work in processing or sales, and this model is the subject of future work. The work we present here is limited to a one agent seller. We note here that our model also assumes no cost for the individual to switch behavior from promotion to expedited mailing. In a large company, one would expect such costs to arise in restricting the size of advertising and shipping departments. One related paper that addresses such switching costs, albeit in a real options investment setting, is the paper by Duckworth et al. [14].
We utilize a continuous time formulation for wealth and reputation, as well as control of these processes. The seller will be determined to behave optimally if she maximizes her expected present value of total earnings, until her reputational value reaches R = 0. This produces a fully consistent formulation of both the value function V, and the strategy to achieve this optimal value.

2.1. Mathematical Formulation

Following the framework in [15,16], reputation R is observed in O : = ( 0 , ) and the control μ is constrained to U : = [ ϵ , ϵ ] . Since this process will be stochastic, we introduce a probability space ( Ω , F , P ) and corresponding reference probability systems ν = Ω , F s , P , B where F s F and B is an F s -adapted Brownian motion. The evolution of wealth and reputation is hence modeled by a controlled two-state Markov process ( W . μ . , R . μ . ) , where wealth W grows at a rate proportional to a function h(R) that links reputation to revenue per sale, subject to a controlled extra processing rate μ per unit time chosen from the admissible class A ν ,
A ν : = μ F s μ is F s -p.m. , μ . U on [ 0 , ) , E 0 τ O e ρ s ( 1 μ s ) h ( R s ( μ ) ) ds <
Here, p.m. denotes progressively measurable, ρ is the constant discount factor to account for the time value of money, and τ O is the first exit time of reputation R from O , or if R O for all s 0 . We set as an absorbing state for reputation. Symbolically,
τ O : = inf t 0 R t ( μ ) O
Formally, our two state controlled Markov process is
d W t ( μ ) = ( 1 μ t ) h ( R t ( μ ) ) dt d R t ( μ ) = ( μ t κ R t ( μ ) ) dt + σ dB t
where κ is a proportionality constant which accounts for mean reversion [11] and σ is the constant volatilty accounting for random effects to reputation.
The major difficulty is in defining the growth rate h(R). However, an explicit mechanism has been proposed by Mink and Seifert [3]. There the authors not only propose, but empirically justify a growth rate of
h ( R ) = A + C 1 1 ln ( e + R ) = A 1 + C A 1 1 ln ( e + R )
where A relates to the inherent value of the object for sale and C is a parameter to be fitted. This is accomplished by obtaining data using an auction robot and then computing a single regression, which gives C = 2.50 in Equation (3). To the best of our knowledge, the model in [3] is among the first to give an explicit relationship between reputation and price. We also note that the parameter C A represents the maximal reputational effect of a seller as a fraction of inherent value A, and is what we choose in the numerical examples below. The model also suggests a multiple regression formula where other factors, such as shipping costs and whether a “buy-it-now" price is offered, are considered as well, in which case C = 1.93. The authors in [3] also comment that highly experienced sellers have higher feedback scores and design the auction more favorably, which reflects to their higher revenue. In fact, they show that the coefficient attributed to shipping costs is larger than one, implying that customers put a high value on shipping when deciding on their bids, and that savvy agents take this into consideration. Notice that since h is bounded on O , our admissible class reduces to
A ν : = μ F s μ is F s -p.m. , μ . U on [ 0 , ) .
Finally, the work in [3] posits that the horizon does not affect the revenue stream as much as the shipping cost and reputation factors, and so we consider an infinite horizon model here.

2.2. Hamilton-Jacobi-Bellman Formulation

With the stochastic dynamics for reputation proposed in [1], a growth rate model for reputational effect on sales [3], and an infinite horizon, we expect that switching would depend only on the current reputational state. We now seek a twice-continuously differentiable, polynomially growing function V C 2 [ 0 , ) C p [ 0 , ) which is a candidate solution of the optimal control problem
V ¯ ( R ) = sup ν sup μ A ν E 0 τ 0 e ρ s ( 1 μ s ) h ( R s ( μ ) ) ds R 0 = R = sup ν sup μ A ν E 0 τ O e ρ s ( 1 μ s ) h ( R s ( μ ) ) ds + 1 τ O < e ρ τ O 1 + ϵ ρ h ( ) 1 R τ O ( μ ) = R 0 = R h ( R ) = A + C 1 1 ln ( e + R ) τ 0 : = inf t 0 R t = 0
One approach to finding V ¯ would be to solve Equation (5) directly. For example, by the definition of V ¯ , it follows that
V ¯ ( 0 ) = 0 V ¯ ( ) = sup ν sup μ A ν E 0 e ρ s ( 1 μ s ) h ( R s ( μ ) ) ds R 0 = = 1 + ϵ ρ h ( )
However, we shall instead apply the principle of optimality, and in doing so arrive at the following nonlinear Hamilton-Jacobi-Bellman (HJB) boundary value problem (suppressing the explicit dependence of R on μ)
0 = max ϵ μ ϵ ( μ κ R ) V R + σ 2 2 2 V R 2 + ( 1 μ ) h ( R ) ρ V V ( 0 ) = 0 V ( ) = 1 + ϵ ρ h ( ) h ( R ) = A + C 1 1 ln ( e + R ) = A + C ln 1 + R e 1 + ln 1 + R e ,
which simplifies to
ρ V = h ( R ) κ R V R + σ 2 2 2 V R 2 + ϵ | V R h ( R ) | V ( 0 ) = 0 V ( ) = 1 + ϵ ρ ( A + C ) h ( R ) = A + C ln 1 + R e 1 + ln 1 + R e
We therefore instead solve the HJB problem, which is justified by the following standard theorem (Corollary IV.5.1 in [15]):
Theorem 1. 
Let V C 2 ( O ) C b ( O ¯ ) be a twice-continuously differentiable and bounded solution to an associated HJB equation of a control problem
V PM ( x ) : = inf ν inf A ν E 0 τ O e ρ s G ( R s μ , μ s ) ds
for a process R whose SDE has drift and volatility that are Lipschitz in μ and R, and a G ( R , μ ) that is continuous and polynomially growing for all R O and continuous for all μ U . Moreover, assume either that β > 0 or that τ O < with probability 1 for every admissible progressively measurable control process μ. Then V ( x ) = V PM ( x ) for an optimal control u * ( s ) argmin [ L R [ V ] + G ( R , u ) ] , where L is the generator of the process R.
We now have our verification theorem:
Theorem 2. 
Let V C 2 ( O ) C b ( O ¯ ) be a twice-continuously differentiable and bounded solution to the boundary value problem Equation (8). Then V is an optimal solution to the optimal control problem Equation (5).
Proof. 
By Corollary IV.5.1 in [15], since
  • (i) the drift and volatility are (uniformly) Lipschitz in R and μ,
  • (ii) ρ > 0, and
  • (iii) we impose an additional boundary condition on the solution as R → ∞ as our indefinite horizon is now [ 0 , τ O ) = [ 0 , τ 0 ) = [ 0 , ) .
it follows directly that our bounded classical solution V = V ¯ and the control
μ * ( R ) argmax μ [ ϵ , ϵ ] ( μ κ R ) V R + σ 2 2 2 V R 2 + ( 1 μ ) h ( R ) ρ V
leads to an optimal, stationary Markov control μ * ( R s ( μ * ) ) .
Remark 1. 
We note here that it is sufficient to require polynomial growth on V, not bounded growth. However, as h is bounded above by A+C on O , we can restrict our attention to bounded V. Furthermore, an upper bound (and in fact limit as R → ∞) for V is the value obtained by a seller earning maximum value A+C on her items on the time interval [0, ∞] while always processing orders.

3. Market Share Based Pricing Mechanism

We observe that under its current definition, the reputation R O . In this section, we employ the mapping
Y : = f ( R ) = R 1 + R
which results in a normalized market share reputation Y Q : = ( 0 , 1 ) . This is consistent with prior assumptions [3] that the value function h ( R ) be concave, and bounded. We therefore also simplify (3) accordingly, replacing logarithms with a rational function
h ¯ ( R ) = A + C R 1 + R = A 1 + C A R 1 + R
This modification, when combined with the mapping from a reputational score to a market share score, R Y , produces a more intuitive growth rate, h ˜ ( Y ) which grows linearly with respect to market share reputation. Recall that is an absorbing state for R; consequently, we have d Y t 0 if Y t 1 . For Y t < 1 , it follows from Ito calculus that d Y t = f ( R ) d R t + 1 2 f ( R ) d R t d R t and so
d Y t = ( 1 Y ) 2 μ κ Y 1 Y dt + σ dB t + 1 2 ( 2 ( 1 Y ) 3 ) σ 2 dt = μ ( 1 Y ) 2 κ Y ( 1 Y ) σ 2 ( 1 Y ) 3 dt + σ ( 1 Y ) 2 dB t h ˜ ( Y ) = A + C Y V ˜ ( y ) = sup ν sup μ A ν E 0 τ 0 e ρ s ( 1 μ s ) h ˜ ( Y s ) ds Y 0 = y
Note that a function that is bounded for market share Y ( 0 , 1 ) is also correspondingly bounded for all reputation R ( 0 , ) . We therefore study the market share model below, confident that our results will hold for the full reputation model by virtue of the inverse map of Equation (11).

3.1. Rescaled HJB Model

After market share rescaling, the HJB now takes a form which is in fact degenerate at both endpoints y = 0 , 1 :
σ 2 2 ( 1 y ) 4 V ( y ) + ρ V = h ˜ ( y ) κ y ( 1 y ) + σ 2 ( 1 y ) 3 V ( y ) + ϵ | ( 1 y ) 2 V ( y ) h ˜ ( y ) | V ( 0 ) = 0 , V ( 1 ) = 1 + ϵ ρ h ˜ ( 1 )
In theory, a closed form solution for this boundary value problem can be formally constructed piecewise, to the left and right of the special point y * , for which h ˜ ( y * ) = ( 1 y * ) 2 V ( y * ) . We shall refer to y * as the switching point below, since it is the reputational value at which the seller switches from advertising to processing. In practice, we shall resort to numerical computation to approximate this solution, and in doing so estimate the value y * .
Theorem 3. 
If V * * C 2 ( Q ) is a monotonically increasing solution to the HJB problem Equation (14), then V * * = V ˜ .
Proof. 
Enforcing the boundary condition on V * * ( 1 ) enforces bounded growth on our monotonically increasing V * * . By the inverse map of Equation (11), the result follows as a corollary to Theorem 2. Note that Q = ( 0 , 1 ) is open, but y = 1 is not reached in finite time from anywhere in Q and the diffusion Y is in fact absorbed for all time in state Y = 1 . Hence, the exact value of V * * ( 1 ) imposed.

3.2. Reduced Model

With the previous mapping, note that the market (reputation) share Y is absorbed at Y = 1 , which corresponds to R . Moreover, as in the Stochastic Nerlove evolution, we have that the probability that Y t < 0 for some positive t is greater than 0, and the drift term is a third-order polynomial in Y. Based on these observations, we propose a reduced model for Stochastic Reputation Share, expressed by the following stochastic differential equation and associated stochastic optimal control problem
V ˇ ( y ) = sup ν sup μ A ν E 0 τ 0 e ρ s ( 1 μ s ) ( A + C Y s ) ds Y 0 = y d Y t 1 Y t = μ dt + σ dB t
This leads to a corresponding nonlinear HJB boundary value problem,
0 = max ϵ μ ϵ μ ( 1 y ) V ( y ) + σ 2 2 ( 1 y ) 2 V ( y ) + ( 1 μ ) ( A + C y ) ρ V V ( 0 ) = 0 V ( 1 ) = sup ν sup μ A ν E 0 e ρ s ( 1 μ s ) h ˜ ( 1 ) ds = ρ 1 ( 1 + ϵ ) h ˜ ( 1 )
which can be recast as
σ 2 2 ( 1 y ) 2 V ( y ) + ρ V = h ˜ ( y ) + ϵ | ( 1 y ) V ( y ) h ˜ ( y ) | V ( 0 ) = 0 V ( 1 ) = 1 + ϵ ρ h ˜ ( 1 )
One of the most attractive features of the reduced model (17) is that it has a closed form, piecewise defined analytic solution, which we derive in the Appendix. Here we state the following theorem.
Theorem 4. 
There exists a solution V * C 2 ( Q ) C b ( Q ¯ ) to the reduced model (17). Furthermore, V * = V ˇ , and is given by the piecewise solution
V * ( y ) = V ( y ) , 0 y y * V r ( y ) , y * y 1
where
(19) V = c 1 ( 1 y ) γ + c 2 ( 1 y ) γ + + α + β y (20) V r = c 3 ( 1 y ) γ + r + α r + β r y
with constants defined in the Appendix.
Proof. 
The construction of this piecewise solution is found in the Appendix. The solution constructed is monotonically increasing in y. Enforcing the boundary condition on V * ( 1 ) uniformly bounds V * on Q . The proof of equality V * = V ˇ then follows from Corollary IV.5.1 in [15], as in our Theorem 2 above.
Remark 2. 
Since the reduced model has a known closed form solution, we can measure the error made in constructing numerical solutions in Section 4, which in turn acts as a benchmark for the code we use to study the full model. It is uncommon to find a closed form solution for most optimal control problems.

4. Numerical Results

In this section, we construct numerical solutions of the market share scaled model Equation (14), as well as our proposed reduced model Equation (8). In Sections A and A.1, the piecewise analytic solution and a nonlinear equation for the reduced switching point y * are found. These will be used to validate the numerical results for the reduced model, which act as a benchmark for the full problem. Please note that in what follows, the symbol y * is used to denote the switching point in reputational share for both the reduced and full model and their corresponding ODEs.

4.1. Numerical Results for the Reduced Model

The boundary value problem Equation (8) is discretized using a standard finite difference scheme. Let y k = k / N y , and V k = V ( y k ) , for k = 0 , 1 , , N y . Then we solve a linear system of N y + 1 equations of the form M v = f , where
M v ρ V σ 2 2 ( 1 y ) 2 V , and f h + ϵ | ( 1 y ) V h | .
Since f depends on V, the solution is implicit, and therefore must be obtained by using a fixed point iteration M v ( k + 1 ) = f ( k ) . In our numerical experiments, we let N y = 1000 , and set the maximum iteration count at K = 20 . Convergence is observed in all tested cases. In Figure 1, several numerical solutions V ( y ) are shown, for fixed A = 1 , C = 0.15 , ϵ = 0.02 , and various values of ρ = 0.1 , 0.2 , 0.5 , 2.0 and σ = 0.2 , 0.5 , 1.0 , 5.0 . The values of A and C are chosen via C A = 0.15 to reflect a maximal 15% reputational premium for sellers above the inherent value A.
Figure 1. A plot of several numerical solutions V ( y ) for the reduced boundary value problem Equation (17) are shown with A = 1 , C = 0.15 , and ϵ = 0.02 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 1. A plot of several numerical solutions V ( y ) for the reduced boundary value problem Equation (17) are shown with A = 1 , C = 0.15 , and ϵ = 0.02 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g001
It follows that both ρ and σ have a dramatic effect not only the shape of the solution, but also, as is demonstrated in Figure 2, on the switching point y * . These numerical results suggest that both the discount rate ρ and the volatility σ of a seller’s reputation can “dramatically” affect her behavior, particularly the critical reputation at which she switches from the processing to advertising mode.
Figure 2. A plot of the quantity ( 1 y ) V ( y ) , where V ( y ) is a numerical solution of the the reduced boundary value problem Equation (17). The value y * is defined as the intersection of these curves with h ( y ) (the solid line). (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 2. A plot of the quantity ( 1 y ) V ( y ) , where V ( y ) is a numerical solution of the the reduced boundary value problem Equation (17). The value y * is defined as the intersection of these curves with h ( y ) (the solid line). (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g002

4.2. Numerical Results for the Full Model

The same numerical discretization is employed to solve the full Nerlove-Arrow model Equation (14), where we solve a linear system of the form M v = f , with
M v ρ V + κ y ( 1 y ) + σ 2 ( 1 y ) 3 V ( y ) σ 2 2 ( 1 y ) 4 V ( y ) , f h + ϵ | ( 1 y ) 2 V h |
We first set κ = 0 , and hold all remaining parameters fixed, and plot the results in Figure 3 and Figure 4. The full solutions have more curvature than those of the reduced model, but nonetheless remain monotone, and have a single unique switching point y * . The additional curvature is due to the appearance of V terms in the differential operator, which now depend on σ, as well as κ, which we recall incorporates mean reversion. That is, independent of the sellers strategy, reputation will tend to decrease to a smaller amount of the market share, with constant rate κ.
In Figure 5 and Figure 6, the same solutions are shown with κ = 1 . Here it becomes apparent that both the rate of mean reversion, as well as the volatility will affect the seller’s optimal strategy.
Remark 3. 
Our numerical results illustrate that a wide range of seller behaviors can be described by varying the parameters, and therefore that reputational value is a strong indicator of optimal seller behavior.
Figure 3. A plot of several numerical solutions V ( y ) for the full boundary value problem Equation (14) are shown with A = 1 , C = 0.15 , ϵ = 0.02 , and κ = 0 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 3. A plot of several numerical solutions V ( y ) for the full boundary value problem Equation (14) are shown with A = 1 , C = 0.15 , ϵ = 0.02 , and κ = 0 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g003
Figure 4. A plot of the quantity ( 1 y ) 2 V , where V is a numerical solution of the the full boundary value problem Equation (14). The value y * is defined as the intersection of these curves with h ( y ) (black solid line). (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 4. A plot of the quantity ( 1 y ) 2 V , where V is a numerical solution of the the full boundary value problem Equation (14). The value y * is defined as the intersection of these curves with h ( y ) (black solid line). (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g004
Figure 5. The same as Figure 3, but with κ = 1 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 5. The same as Figure 3, but with κ = 1 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g005
Figure 6. The same as Figure 4, but with κ = 1 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Figure 6. The same as Figure 4, but with κ = 1 . (a) ρ = 0.1 ; (b) ρ = 0.2 ; (c) ρ = 0.5 ; (d) ρ = 2.0 .
Risks 03 00553 g006

5. Conclusions and Future Work

In this work, we have proposed a model which describes the behavior of sellers participating in online auctions (markets). Our model is based on the premise that buyer feedback greatly impacts the sales rate, which motivates our assumption that the revenue per sale (or the price per unit) is solely dependent upon seller reputation. We assumed that a seller has a fixed amount of resources that must be allocated either to (i) advertising to new buyers or (ii) processing orders for current customers. In doing so, we were able to design an optimal selling strategy, wherein the seller switches their behavior when an optimal market share reputation is reached, which depends on statistical modeling parameters.
These modeling parameters were introduced through the wealth-reputation mechanisms in [11], which have been generalized to stochastic settings in [1,2,12]. We have additionally modified the empirical model h ( R ) found in [3], relating the price per unit to the reputation. Rather than viewing reputation R as an unbounded quantity, we instead introduced a market scaled reputation Y = R / ( 1 + R ) , where Y [ 0 , 1 ) , which reduces the price per unit to a simple linear model h ( Y ) = A + C Y .
We then optimized the stochastic model over the control of the excess rate μ using the Hamilton-Jacobi-Bellman equation, yielding a deterministic equation relating the value per unit to the seller reputation. The resulting boundary value problem is then solved numerically, and we qualitatively see that the value of sellers goods increase monotonically with reputation, but that a unique optimal reputation y = y * determines when the seller should switch from advertising to processing. The numerical scheme was validated with a reduced model, which has a closed form piecewise analytic solution, and permits direct determination of y * . The numerical results validate our modeling assumptions, and provide a framework for studying seller behavior based on seller reputation. Although the used techniques are standard, we believe that the optimal strategy presented both analytically and numerically has implications on the way reputational information can be used to predict the behavior of an individual seller in an online market.
This work can naturally be generalized in a variety of ways. For instance, the model could be developed in the case of finite horizon time T, since a more realistic assumption is that a seller strategy depends on both time and the current reputation state. Unlike our current infinite horizon model, the resulting HJB equation will lead to a time-dependent partial differential equation, which will be the subject of future work.

Acknowledgments

The authors thank Andrew Christlieb, Nir Gavish, Song Yao, John Chadam, and Steven Shreve for useful discussions, as well as anonymous reviewers whose suggestions greatly improved this paper. Part of this work was done while Milan Bradonjić was at UCLA and LANL.

Author Contributions

The authors equally contributed.

Conflicts of Interest

The authors declare no conflict of interest.

A. Piecewise Analytic Solution of Reduced Model

We now construct the exact solution to the reduced HJB boundary value problem Equation (8), which incidentally can be found analytically. The principal difficulty that must be overcome is the appearance of an absolute value, which contains the unknown variable V . We therefore consider a piecewise defined solution V ( y ) , which remains C 2 across the switching point y * , defined by the vanishing of the expression in the absolute value
( 1 y * ) V ( y * ) = h ˜ ( y * )
We now separately consider the regions y < y * and y > y * in the reduced model Equation (8), and define the corresponding linear differential operators L ± as
L ± [ V ] = σ 2 2 ( 1 y ) 2 d 2 d y 2 ± ϵ ( 1 y ) d d y + ρ V
The boundary value problem can then be decomposed into two smaller problems for V and V r . The boundary conditions provide one condition for each of V and V r . The remaining two conditions are provided by enforcing C 2 smoothness of the solution at the switching point Equation (21). Hence we now formulate two (well-posed) boundary value problems:
L [ V ] = ( 1 ϵ ) h ˜ ( y ) 0 < y < y *
V ( 0 ) = 0 , ( 1 y * ) V ( y * ) = h ˜ ( y * )
and
L + [ V r ] = ( 1 + ϵ ) h ˜ ( y ) y * < y < 1
V r ( 1 ) = 1 + ϵ ρ h ˜ ( 1 ) , ( 1 y * ) V r ( y * ) = h ˜ ( y * )
Once solved, enforcing continuity uniquely defines the value of the switching point y * :
V ( y * ) = V r ( y * )

A.1. Constructing the Solution

We shall construct the solutions V and V r separately first, although the approach for both solutions will be the same. Following standard methods, we first decompose the full solution into a homogeneous and particular solution, where the homogeneous part satisfies L ± [ u ] = 0 . Since the operators L ± are equi-dimensional, we seek solutions of the form
u = c ( 1 y ) γ
and the application of the differential operator yields
L ± [ u ] = c ( 1 y ) γ σ 2 2 ( γ 2 γ ) ϵ γ + ρ = 0
We set the parenthetical term of this expression to zero in order to solve for the admissible exponents γ
(30) γ ± = 1 2 + ϵ σ 2 ± 1 2 + ϵ σ 2 2 + 2 ρ σ 2 ( from L ) (31) γ ± r = 1 2 ϵ σ 2 ± 1 2 ϵ σ 2 2 + 2 ρ σ 2 ( from L + )
Because γ r < 0 for ρ > 0 , the homogeneous solution ( 1 y ) γ r will be unbounded as y 1 , and so we exclude it from further consideration below. Furthermore, since h ˜ ( y ) = A + C y is a linear function, the particular solution will also be linear. We therefore have a general solution of the form
(32) V = c 1 ( 1 y ) γ + c 2 ( 1 y ) γ + + α + β y (33) V r = c 3 ( 1 y ) γ + r + α r + β r y
The particular solution is fixed by enforcing that the differential Equations (23) and (25) are satisfied, which yields
(34) α = 1 ϵ ρ ( A + C ) 1 ϵ ρ + ϵ C , β = 1 ϵ ρ + ϵ C (35) α r = 1 + ϵ ρ ( A + C ) 1 + ϵ ρ ϵ C , β r = 1 + ϵ ρ ϵ C
The homogeneous coefficients c 1 and c 2 are then found by applying the boundary conditions (24), and the resulting linear system
1 1 γ ( 1 y * ) γ γ + ( 1 y * ) γ + c 1 c 2 = α β ( 1 y * ) h ( y * )
is solved by
c 1 = α γ + ( 1 y * ) γ + + β ( 1 y * ) h ˜ ( y * ) γ + ( 1 y * ) γ + γ ( 1 y * ) γ
c 2 = β ( 1 y * ) h ˜ ( y * ) + α γ ( 1 y * ) γ γ + ( 1 y * ) γ + γ ( 1 y * ) γ
The boundary condition Equation (26) at y = 1 is automatically satisfied by the particular solution, and so the final coefficient c 3 is determined by the condition at y = y * ,
c 3 = β r ( 1 y * ) h ˜ ( y * ) γ + r ( 1 y * ) γ + r
Finally, having determined the general solutions V and V r , the solution V is obtained by enforcing continuity. This also fixes the value y * , which must now satisfy the nonlinear transcendental equation
c 1 ( 1 y * ) γ + c 2 ( 1 y * ) γ + + α + β y * = c 3 ( 1 y * ) γ + r + α r + β r y *
By expanding the expression, it follows that
0 = ( 1 y * ) ( γ + γ ) 1 β ( 1 y * ) h ˜ ( y * ) α ( γ + γ ) ( 1 y * ) γ + + γ + ( 1 y * ) ( γ + γ ) γ α α r + ( β β r ) y * β r ( 1 y * ) h ˜ ( y * ) γ + r

A.2. Comparison With μ ≡ 0.

If the seller decides to take no action to enhance her reputation and long term wealth growth by letting μ 0 , then her value function solves
σ 2 2 ( 1 y ) 2 V 0 ( y ) + ρ V 0 ( y ) = h ˜ ( y ) 0 < y < 1 V 0 ( 0 ) = 0 V 0 ( 1 ) = 1 ρ h ˜ ( 1 )
which leads to a solution
V 0 ( y ) = h ˜ ( y ) ρ C ρ ( 1 y ) 1 2 + 1 4 + 2 ρ σ 2
It follows that for V which solves (3.15) that
A + C ρ ϵ = V ( 1 ) V 0 ( 1 ) sup 0 y 1 | V ( y ) V 0 ( y ) |
Hence, she enhances her overall expected wealth by order ϵ by pulsing strategies.

References

  1. S.P. Sethi. “Deterministic and stochastic optimization of a dynamic advertising model.” Optim. Control Appl. Methods 4 (1983): 179–184. [Google Scholar] [CrossRef]
  2. K. Raman. “Boundary value problems in stochastic optimal control of advertising.” Automatica 42 (2006): 1357–1362. [Google Scholar] [CrossRef]
  3. M. Mink, and S. Seifert. “Reputation on eBay and its Impact on Sales Prices.” In Proceedings of The Group Decision and Negotiation International Conference, Karlsruhe, Germany, 25–28 June 2006; pp. 253–255.
  4. C. Beam, and A. Segev. Auctions on the Internet: A Field Study. Oxford, UK: Blackwell Publishers Ltd, 1998. [Google Scholar]
  5. P. Bajari, and A. Hortacsu. “Economic insights from internet auctions.” J. Econ. Lit. 42 (2004): 457–486. [Google Scholar] [CrossRef]
  6. P. Resnick, R. Zeckhauser, J. Swanson, and K. Lockwood. “The Value of Reputation on eBay: A Controlled Experiment.” Exp. Econ. 9 (2006): 79–101. [Google Scholar] [CrossRef]
  7. P. Resnick, and R. Zeckhauser. “Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System.” In The Economics of the Internet and E-Commerce. Edited by M.R. Baye. Volume 11 of Advances in Applied Microeconomics; Amsterdam, The Netherlands: Elsevier Science, 2002, pp. 127–157. [Google Scholar]
  8. D. Houser, and J. Wooders. “Reputation in Auctions: Theory, and Evidence from eBay.” J. Econ. Manag. Strateg. Blackwell Publ. Vol. 15 (2006): 353–369. [Google Scholar] [CrossRef]
  9. C. Dellarocas. “Reputation Mechanism Design in Online Trading Environments with Pure Moral Hazard.” Inf. Syst. Res. 16 (2005): 209–230. [Google Scholar] [CrossRef]
  10. K.R. Apt, and E. Markakis. “Optimal Strategies in Sequential Bidding.” In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, Budapest, Hungary, 10–15 May 2009; pp. 1189–1190.
  11. M. Nerlove, and K.J. Arrow. “Optimal Advertising Policy under Dynamic Conditions.” Economica 29 (1962): 129–142. [Google Scholar] [CrossRef]
  12. R.C. Rao. “Estimating Continuous Time Advertising-Sales Models.” Market. Sci. 5 (1986): 125–142. [Google Scholar] [CrossRef]
  13. D.J. Levitin. Extracted from The Organized Mind: Thinking Straight in the Age of Information Overload. New York, NY, USA: Dutton Penguin Random House, 2014. [Google Scholar]
  14. K. Duckworth, and M. Zervos. “A Model for Investment Decisions with Switching Costs.” Ann. Appl. Probab. 11 (2001): 239–260. [Google Scholar] [CrossRef]
  15. W. Fleming, and H.M. Soner. Controlled Markov Processes and Viscosity Solutions. Stochastic modelling and applied probability; New York, NY, USA: Springer, 2006. [Google Scholar]
  16. W.H. Fleming, and R.W. Rishel. Deterministic and Stochastic Optimal Control. Berlin, Germany; New York, NY, USA: Springer-Verlag, 1975. [Google Scholar]

Share and Cite

MDPI and ACS Style

Bradonjić, M.; Causley, M.; Cohen, A. Stochastic Optimal Control for Online Seller under Reputational Mechanisms. Risks 2015, 3, 553-572. https://doi.org/10.3390/risks3040553

AMA Style

Bradonjić M, Causley M, Cohen A. Stochastic Optimal Control for Online Seller under Reputational Mechanisms. Risks. 2015; 3(4):553-572. https://doi.org/10.3390/risks3040553

Chicago/Turabian Style

Bradonjić, Milan, Matthew Causley, and Albert Cohen. 2015. "Stochastic Optimal Control for Online Seller under Reputational Mechanisms" Risks 3, no. 4: 553-572. https://doi.org/10.3390/risks3040553

APA Style

Bradonjić, M., Causley, M., & Cohen, A. (2015). Stochastic Optimal Control for Online Seller under Reputational Mechanisms. Risks, 3(4), 553-572. https://doi.org/10.3390/risks3040553

Article Metrics

Back to TopTop