An Electronic Marketplace Based on Reputation and Learning

In this paper, we propose a market model which is based on reputation and reinforcement learning algorithms for buying and selling agents. Three important factors: quality, price and delivery-time are considered in the model. We take into account the fact that buying agents can have different priorities on quality, price and delivery-time of their goods and selling agents adjust their bids according to buying agents preferences. Also we have assumed that multiple selling agents may offer the same goods with different qualities, prices and delivery-times. In our model, selling agents learn to maximize their expected profits by using reinforcement learning to adjust product quality, price and delivery-time. Also each selling agent models the reputation of buying agents based on their profits for that seller and uses this reputation to consider discount for reputable buying agents. Buying agents learn to model the reputation of selling agents based on different features of goods: reputation on quality, reputation on price and reputation on delivery-time to avoid interaction with disreputable selling agents. The model has been implemented with Aglet and tested in a large-sized marketplace. The results show that selling/buying agents that model the reputation of buying/selling agents obtain more satisfaction rather than selling/buying agents who only use the reinforcement learning.


Introduction
With the advent of mobile and intelligent agent technology, e-commerce has been entered in a new era of its life [28]. Also agent architecture provides a flexible environment to model the other fields of research [8], [12], [20]. Agent-Based e-Marketplace is one of the most important results of using agent technology over e-Commerce. Electronic marketplace provides a single location for many buyers and sellers to congregate electronically and complete their own transactions. In the recent years, the extensive research is focused on designing agent-based e-Marketplaces [2], [6], [14], [15], [19]. Moreover, there are some research on personal intelligent agents for e-commerce applications [5], [7], [8], [10], [29]. But the most important problem that can be mentioned in these works is poor intelligence of trading agents.
In addition, reinforcement learning [17] has been studied for various multi-agent problems [4], [16], [21], [22]. However, these efforts are not directly modeled as economic agents and market environments. There are some research on reputation and trust modeling which do not use reinforcement learning [3], [9], [11], [18], [30]. A number of agent models for electronic market environments have been proposed. Jango [10] is a shopping agent that assists customers in getting product information. Given a specific product by a customer, Jango simultaneously queries multiple online merchants (from a list maintained by NetBot, Inc.) for the product availability, price, and important product features. Jango then displays the query results to the customer. Although Jango provides customers with useful information for merchant comparison, at least three shortcomings may be identified: (i) The task of analyzing the resultant information and selecting appropriate merchants is completely left for customers, (ii) The algorithm underlying its operation does not consider product quality which is of great importance for the merchant selection task, (iii) Jango is not equipped with any learning capability to help customers choose more and more appropriate merchants. Another interesting agent model is Kasbah [5], designed by the MIT Media Lab. Kasbah is a multi-agent electronic marketplace where selling and buying agents can negotiate with one another to find the "best possible deal" for their users. The main advantage of Kasbah is that its agents are autonomous in making decisions, thus freeing users from having to find and negotiate with buyers and sellers. However, as admitted in [5], Kasbah's agents are not very smart as they do not make use of any AI learning techniques.
Vidal and Durfee [27] address the problem of how buying and selling agents should behave in an information economy such as the University of Michigan Digital Library. They divide agents into classes corresponding to the agents' capabilities of modeling other agents: Zero-level agents are the agents that learn from the observations they make about their environment, and from any environmental rewards they receive. One-level agents are those agents that model agents as zero-level agents. Two-level agents are those that model agents as one-level agents. Higher level agents are recursively defined in the same manner. It should be intuitive that the agents with more complete models of others will always do better. However, because of the computational costs associated with maintaining deeper (i.e., more complex) models, there should be a level at which the gains and the costs of having deeper models balance out for each agent. The main problem addressed in this model is to answer the question of when an agent benefits from having deeper models of others. Also reinforcement learning has been applied in market environments for buying and selling agents, but reputation has not been used as a means to protect buyers from purchasing low quality goods. Moreover, selling agents do not consider altering the quality of their products while learning to maximize their profits.
Tran and Cohen in [23]- [26] exploit reinforcement learning for buying agents to model the reputation of selling agents to protect buyers from communicating with non-reputable sellers. Nevertheless, buyers in this model should have fixed priorities on quality and price of their desired goods. In this way, they can not change their preferences to buy a good in a sequence of purchases. That is, a buying agent can not purchase a good in an auction with priority on quality and willing to buy the same good in another auction with priority on price. In addition, selling agents do not model the reputation of buyers to consider discount and just only focuses on two factors of quality and price.
In our proposed learning algorithms, each selling agent models the reputation of buyers and dedicates them discounts based on their reputation. This model focuses on three important factors in market: quality, price and delivery. Because of the existence of buying agents with different preferences and priorities on their desired goods, the buying agents model the reputation of selling agents based on quality, price and delivery separately. For example, a buyer may need a good with high quality now, but with low price later. The proposed model has been implemented with Aglet [1], [13].
The paper is organized as follows: section 2 introduces our proposed market and learning algorithms. Section 3 discusses current experimental results and outlines proposed future experimentations with the model. Finally, Section 4 provides conclusion and some future research directions.

The Proposed Algorithm
In this section we propose our marketplace model and learning algorithm for buying and selling agents based on reinforcement learning and reputation modeling.
1. MAA (Market Assistant Agent): The MAA is responsible for registering mobile buying and selling agents in the buyer and seller database of marketplace. The buyer database of marketplace contains: owner of mobile buying agent, buying agent server, a unique identifier, and proxy address of agent provided by aglet context and time of registration. The seller database of marketplace contains: owner of mobile selling agent, selling agent server, a unique identifier, address proxy of selling agent provided by aglet context, goods which is available for mobile selling agent to sell and time of registration. Agent A can communicate with agent B through the proxy address of agent B and vice versa. Also the MAA answers to the mobile buying agent request by retrieving proxy address of sellers, from seller database, who have good g to sell and send the list to the mobile buying agent.
2. MBA (Mobile Buying Agent): stands for the buyer, moves to the Marketplace and trades with mobile selling agents and learns, based on reinforcement learning, that which sellers can satisfy its preferences. Also the MBA measures the reputation of each mobile selling agent on different factors: quality, price and delivery and focuses its business on reputable sellers and prevent to interact with non reputable ones.
3. MSA (Mobile Selling Agent): stands for the seller, moves to the Marketplace and trades with mobile buying agents and learns how to adjust its bids according to the preferences of the buying agents while trying to maximize its expected profit. Also models the reputation of mobile buying agents to dedicate discount for them based on their reputation.

Buying Agent Server
The Buying Agent Sever provides the interface of Buying Agent (BA) that lets users to initialize and control their buying agent to carry the e-commerce activation out. Buying Agent Server stores the information of buyer in the database and will produce Mobile Buying Agent (MBA) according to the requirements of the user. It stands for the user to go to the marketplace to make bargains.

Selling Agent Server
Each seller, which wants to join this e-marketplace, should build a Seller Server. There are two main Agents in a Seller Server, include: (1) Selling Agent (SA): which is provided by Selling Agent Server that lets the seller to initialize its selling agent and specify the goods which is available to sell, and (2) Mobile Selling Agent (MSA) is created by Selling Agent Server and migrates to the Marketplace and try to sell goods with maximum profit for its owner. Q q ∈ min and Q q ∈ max represent minimum and maximum quality of goods that can be available in the market and all sellers and buyers know this). Assume that seller S s ∈ has received a request from buyer . Seller s has to decide on the quality, price and delivery of good g to be delivered to buyer b. Assume that R be the set of real numbers. Let function estimate the expected profit for seller s if it sells good g with quality q at price p and delivery d to buyer b. Let be the cost that seller s incurs to produce good g with quality q for buyer b. Seller s produce different versions of good g based on buyers requirements. The price that seller s chooses to sell good g to buyer b is greater than or even equal to  b ∈ . We do not use the negative reputation for buyers, because we assumed all buyers are honest and no sellers are interested to lose their customers. The sellers want to satisfy the buyers' requirements so that they compete with each other to increase the number of their own customers. When seller s sends his bid to buyer b, there are the two following possibilities: Seller s succeeds to sell good g with quality q at price p and delivery d to buyer b. It means that seller s has presented a bid better than the other sellers' bids to buyer b. Therefore, seller s may be re-selected by buyer b if seller s repeats this bid again for buyer b for specified good g. Seller s delivers product to buyer b and updates the reputation of buyer b using reinforcement learning: Where, μ is a positive factor called cooperative factor and is equal to: So the new bid for buyer b based on its new reputation is calculated by seller s as follows: Seller s does not succeed to sell good g with quality q at price p and delivery d to buyer b. It means that, the bid of seller s has not satisfied the buyer b. if seller s repeats the previous bid to buyer b, the possibility of success in selling good g to buyer b is low. Therefore, it is needed to alter the price, delivery and may be quality of the good to be offered to buyer b. In a real market, buyers expect that if a seller want to deliver his good later than his offered time, the seller should reduce the price according to a formula based on price and delivery. Let rp be a variable that specifies the reduction percent of price for seller who want to delivers his product late. That is, he should reduce the price of his product according to this value. In addition, for preparing a new bid for buyer b, the reputation of the buyer is also used to determine the new price. The quality remains as before but new price is updated with reinforcement learning as follows: Which inc dt is increasing rate for delivery corresponding to price reduction which has been assumed by seller s.
According to the fact that a seller does not sell his goods with a price lower than the production cost of the good, therefore if , then seller s does not suggest the same good with previous quality. So that, he may optionally raise the value of quality by increasing its production cost as follows: Where, inc is a specific constant called seller s's quality increasing factor.

Buyer Algorithm
Assume that buyer b wants to buy good g. Buyer b broadcasts his request to all sellers which they have good g to sell (According to what discussed earlier in Figure 1, list of these sellers has been already retrieved from MAA.). Sellers answer the request by sending bids to buyer b. Buyer b receives all bids and selects the suitable bid. Buyer b models the reputation of all sellers and selects the suitable bid from a reputable seller. Buyer b models the reputation of each seller based on three factors of quality, price and delivery, separately. To model the reputation of each seller, buyer b uses functions . Similarly, we define buyer b's reputable and disreputable thresholds based on price and delivery by replacing q with p and d in the above inequalities, respectively.
S _ be the set of sellers with good reputation on price and delivery, respectively.
Let q w , p w and d w be the weight of values quality, price and delivery for buyer b so that We define the buyer b's general reputable threshold as follows: In the same way, we calculate the general reputation of seller s as follows: S be the sets of reputable and disreputable sellers to buyer b respectively, i.e., Where max q is the maximum quality of good g in the market, max p is the maximum price for good with quality max q and max d is the maximum time for seller s to deliver good g late. Then buyer b selects the seller ŝ who belongs to the set of reputable sellers for buyer b whose bid value for buyer b is more than the other sellers', i.e., After paying seller ŝ and receiving good g, buyer b examine the quality Q q ∈ of good g. Assume that buyer b find the quality q that has been delivered at dˆ. Let the expected quality, price and delivery for the buyer be b q , b p and b d , respectively. Updating reputation on quality, price and delivery of seller ŝ is illustrated in next parts respectively.
In addition, with a probability ρ buyer b chooses to explore (rather than exploit) the marketplace by randomly selecting a seller ŝ from the set of all sellers. Initially, the value of ρ should be set to 1, and then decreased over time to some fixed minimum value determined by the buyer.
Updating Reputation on Quality: If b q q ≥ then the reputation of seller ŝ on quality is updated using reinforcement learning as follows: Where, q μ is a positive factor called the cooperation factor. q μ is calculated as follows: That is, seller ŝ offers good g with a quality greater than or equal to the value that buyer b demanded for quality of good g and therefore the reputation of seller ŝ on quality is increased by equation (22) accordingly. q min_ μ is a positive factor called minimum cooperation factor for quality. If b q q < then the reputation of seller ŝ on quality is updated as follows: Where, q ν is a negative factor called the non-cooperation factor. q ν is calculated as follows: In which, q λ ) 1 ( > q λ is called the penalty factor so that q q μ ν > to implement the traditional assumption that reputation should be difficult to build up, but easy to tear down. Updating Reputation on Price: 1) If s b p p ≥ then the reputation of seller ŝ on price is updated using reinforcement learning as follows: Where, p μ is a positive factor called the cooperation factor. p μ is calculated as follows: That is, seller ŝ offers good g with a price lower than or equal to the value that buyer b demanded for price of good g and therefore the reputation of seller ŝ on price is increased by equation (26) accordingly. It implements this fact that buyer b expects to buy goods with low price, therefore sellers who offer goods with lower price than the other, set more reputation on price for themselves to buyer b and those sellers have positive reputation on price that their price is lower than expected price of buyer b. p min_ μ is a positive factor called minimum cooperation factor for price.
2) If s b p p < then the reputation of seller ŝ on price is updated as follows: Where, p ν is a negative factor called the non-cooperation factor. p ν is calculated as follows:  (30) μ is a positive factor called the cooperation factor. d μ is calculated as follows: That is, seller ŝ offers good g with a delivery lower than or equal to the value that buyer b demanded for delivery of good g and therefore the reputation of seller ŝ on delivery is increased by equation (30) accordingly. It means that sellers who deliver their product more quickly, set more reputation on delivery for themselves and those sellers have positive reputation on delivery that their delivery is lower than expected delivery of buyer b.
Where, d ν is a negative factor called the non-cooperation factor. d ν is calculated as follows:

An Example
This subsection provides a numerical example illustrating the proposed algorithm for buyers and sellers, respectively.

Buyer Situation
Consider a simple buyer situation where a buyer b announces its need of some good g to all sellers which they have good g to sell (According to what discussed earlier in Figure 1, list of these sellers has been already retrieved from MAA.). Suppose that there are 5 sellers in the marketplace, i.e.,  Also the reputation ratings on price and delivery-time of different sellers to buyer b are shown in Table 2 and Table 3 respectively.  Table 3: Reputation ratings on delivery of different sellers to buyer b General reputation threshold and general reputation of sellers are computed based on (14) and (16) respectively. General reputation of each seller is shown in Table 4,

465
. , , After b's announcement of its request for good g to all sellers which they have good g to sell, the sellers bid with the following specification to deliver g to buyer b have been shown in Table 5. Let triplet bid(quality, price, delivery) be the structure of a bid's specification.  Now buyer b guesses the value of each bid offered by sellers based on equation (19). Results are shown in Table 6.  Table 6: Bid's offered by different sellers for good g to buyer b Then buyer b selects the seller ŝ who belongs to the set of reputable sellers for buyer b ( { } s ratio to what b expected, increases the value of good g offered by 4 s .
Thus by providing good g with high value, seller 4 s has improved its general reputation to buyer b and increases its chance to be selected again by buyer b in the next auctions and remain in set b r S of reputable sellers to b.

Seller Situation
Consider how a seller in the above-said marketplace, behaves according to the proposed seller algorithms. In this example we investigate behavior of seller 4 s and 1 s in the marketplace. Assume these assumptions: We define the maximum percent of profit 2 . 0 = κ . Therefore, according to equation (1) if a good costs 47, then the maximum price that seller s can dedicate is equal to 56.4. We assume reduction percent of price (rp) and discount variable (β) in equation (5) are equal to 0.015 and 0.05, respectively.
Sellers increase cost and quality of goods in equation (7)  Seller 1 s should alter its bid to increase the chance to be selected by buyer b in the next auction. Seller 1 s decreases the price of good g by equation (5) , therefore seller 1 s does not propose the new price and try to alter the quality of good g. Seller 1 s produce good g with higher quality than before by equation (7)  s to buyer b for good g is bid (49.92, 59.00544, and 1). Seller 1 s start bid with maximum expected profit and delivery 1, and then if does not succeed to sell good g reduce price and increase delivery based on equation (5) and (6) respectively.

Experimental Results
We have implemented the proposed model with Aglets that are java based stationary and mobile agents built in the aglet environment. Our results show that when seller agent models the reputation of buyer agents and dedicates discount to those that are reputable, obtains greater satisfaction compare to the situation when he only alters the quality, price and delivery of his goods. Also buyer agents that follow proposed algorithms are more flexible in different conditions for selecting goods. We have tested our proposed model, both for buyer and seller agents, in extensive experimentation. In parts 3.1 and 3.2 the seller agents satisfaction and buyer agents satisfaction are presented.

Seller Satisfaction
In the test for evaluating seller algorithm, there are 25 seller and 20 buyer agents in our simulated marketplace. Assume that buyers arrange totally 2000 auctions. Let triplet g(quality, price, delivery) be the structure of a good's specification. All buyer agents use the proposed algorithm in this paper for buyer and seller agents which are divided into five groups: 4. Group A consists of five sellers ,..., , 1 0 s s and 4 s . These are dishonest sellers on quality who try to attract buyers with high quality goods and then cheat them with really low quality ones. They offer g(48,50,and 2) and then deliver its good as g(38,50,and 2). 5. Group B consists of five sellers ,..., , 6 5 s s and 9 s . These are dishonest sellers on delivery who try to attract buyers by offering the best delivery along with suitable quality but then cheat them by delivering goods so late. They offer g(48,50,and 2) but deliver their good as g(48,50,and 13). 6. Group C consists of five sellers ,..., , 11 10 s s and 14 s that do not cheat buyers and use fixed bid for any buyer.
They offer and deliver goods as g (40, 44, and 7). 7. Group D consists of five sellers ,..., , 16 15 s s and 19 s which alter quality, price and delivery of their goods but do not model the reputation of buyers. Moreover, they do not consider discount for buyers. They start their bids as g (38, 45.6, and 1) and then alter their offers based on buyers' requirements. 8. Group E consists of five sellers ,..., , 21 20 s s and 24 s that in addition to altering the quality, price and delivery of their goods, model the reputation of buyers and also consider discount for them based on their reputation. They start their bids as g(38,45.6,and 1) and then use the proposed algorithms to alter their bids.
In addition, there are other parameters are considered for sellers: 1. Quality is chosen equal to cost to support the common assumption that it costs more to produce high quality goods. That is, a good in quality of 38 costs just 38. 2. We define the maximum percent of profit 2 . 0 = κ . Therefore, according to equation (1) if a good costs 38, then the maximum price that seller s can dedicate is equal to 45.6.

5.
A seller can produce goods at the maximum quality of 50.
All buyers use the buyer agents algorithm proposed in this paper and the parameters that are applied are as the following: 1. For all buyers, reputable thresholds for quality, price and delivery are equal to 0.4, while their corresponding disreputable thresholds are -0.8, -0.5 and -0.5, respectively.  (25). We also define p λ and d λ equal to 1.5.
The results of this experiment confirm that sellers who exploit the proposed algorithms (i.e., group E), achieve better satisfaction than the other sellers. In addition, buyers learn to focus their business on sellers who have reached enough reputation and prevent to interact with disreputable ones. Average and total number of sales made by each of these five groups of sellers is shown in Table 7.  Sellers of groups A and B are dishonest sellers that lie on quality and delivery, respectively. In real markets, it is expected that when buyers purchase from a seller who tries to cheat them, they will not deal with him for their future purchases. Table 7 confirms this matter so that each buyer purchases from dishonest sellers no more than once. There are 20 buyers in the market and each of them was cheated by a dishonest seller once. Therefore each dishonest seller can cheat each buyer one time and totally wins in 20 auctions. Buyers model the reputation of dishonest seller and consider the reputation for the seller lower than disreputable threshold, b θ , as described in equation 15. Actually buyers learn to stay away from disreputable sellers.
Sellers of group C, offer goods in fixed quality, price and delivery. Although they may sell some of their goods in their first deals, but because of the existence of sellers of the other groups who alter their bids to offer goods in high quality, buyers will no longer purchase from sellers of this group, since they can not visit the buyers' requirements. Sellers of group D alter their bids based on buyer requirements and they achieve further sales in comparison to sellers of groups A, B and C.
The other parameters for buyers and sellers are similar to the parameters considered in prior section.
The results of this experiment show that buyers who apply the proposed algorithm (i.e., group V) achieve more satisfaction than the other buyers. Table 8 shows that each group of buyers has focused on which group of sellers for doing their trade. b . We know that sellers of group E make best offers for buyers and are more honest in comparison with the other groups of sellers. So we expect that buyers focus their trades on sellers of group E and then D, in order to obtain more satisfaction. Table 8 shows that buyer 0 b makes 51% of its purchases from sellers of group A, which are dishonest on quality and 40% from sellers of group B that are dishonest on delivery-time. Also 0 b makes just 2%, 2% and 5% of its purchases from groups C, D and E respectively. Purchases from group C, D, and E are done in random. Remember that buyer b with a probability ρ chooses to explore (rather than exploit) the marketplace by randomly selecting a seller ŝ from the set of all sellers as described in buyer algorithm. Other sellers of groups D and E alter their bids but in comparison with group A's bid obtain less value because sellers of group A bid to buyer with very high quality and cheat buyers. So if buyer does not model the reputation of seller, it considers very high value for sellers' bid and selects them as winner in auctions much more than once. Buyer 5 b just models the reputation of sellers on quality and avoids interacting with disreputable sellers on quality. Table 8 shows that buyer 5 b makes 5% of its purchases from sellers of group A who are dishonest on quality, it means that 5 b by modeling the reputation of sellers on quality; learn to avoid interacting with sellers who lie about the quality of their goods. But 5 b has made 78% of its purchases from group B. Sellers of group B attract buyers with high quality and very soon delivery-time but they deliver their goods so late. It is expectable that buyers do not interact with disreputable sellers but because 5 b do not model the reputation of sellers on delivery-time, so he was cheated much more than once by sellers of group B. Behavior of 10 b is like the behavior of 5 b , but 10 b just models the reputation of sellers on delivery-time. Table 8 shows that 10 b do 81% of its purchases from sellers of group A, because it does not model the reputation of sellers on quality and was cheated by sellers of group A who lie on quality of their goods. As we said before it is reasonable that buyers focus their trades on sellers of group E and then D, to achieve more satisfaction. 15 b models the reputation of sellers on quality and delivery-time and avoids interacting with disreputable sellers. 15 b has done 5% of its purchases with sellers of group A and 5% with group B. It means that 15 b evaluates the reputation of sellers and avoids interacting with disreputable ones at all. 15 b that applies the proposed algorithm for buyers presented in this paper has obtained more satisfaction in comparison with buyers 0 b , 5 b and 10 b . 15 b learns to focus its trades on sellers who alter their bids and increase the quality of their goods(group D and E) and in long time learns to focus on sellers which in addition to altering bids and increasing the quality of goods consider