Mashups: A Literature Review and Classification Framework

: The evolution of the Web over the past few years has fostered the growth of a handful of new technologies (e.g. Blogs, Wiki’s, Web Services). Recently web mashups have emerged as the newest Web technology and have gained lots of momentum and attention from both academic and industry communities. Current mashup literature focuses on a wide array of issues, which can be partially explained by how new the topic is. However, to date, mashup literature lacks an articulation of the different subtopics of web mashup research. This study presents a broad review of mashup literature to help frame the subtopics in mashup research.


Introduction
In the past five years the web has experienced a surge in growth, a phenomena described by O'Reilly [1] as the emergence of Web 2.0, a new trend for web applications that emphasizes services, participation, scalability, remixability, and collective intelligence.While some argue that the new applications emerging on the Web represents a gradual evolution of the Internet and not a new version of the Web [2], the term Web 2.0 is commonly used to refer to the current generation of social web applications being developed today.For example, the first metric used to evaluate eCommerce sites simply emphasized page views, but now eCommerce sites are evaluated by their cost per click [3].In addition to eCommerce applications, other Web 2.0 applications include Blogs and Wikis, both of OPEN ACCESS which foster communication, collaboration, work processes, and knowledge sharing [1,[4][5][6].Since Web 2.0 applications facilitate user involvement in contributing to information sources, there has been a vast increase in the amount of information and knowledge sources on the web.To foster the aggregation of differing information sources a content-syndication protocol (Really Simple Syndication: RSS feeds) was developed and enables web sites to share their content with other applications [7].While RSS feeds and web services provide the medium for aggregating differing information sources, the latest trend in Web 2.0 research focuses on mashup applications that are designed to synthesize knowledge by semantically connecting disjointed information and knowledge sources [8].
Web mashup research is gaining a lot of momentum in both the academic and industry communities.However, to date, web mashup literature lacks an articulation of the different categories of research.To address this literary shortcoming, a methodology was developed and conducted to review web mashup literature.A review of 60 publications revealed the following six categories of mashup research: access control and cross communication, mashup integration, mashup agents, mashup frameworks, end user programming, and enterprise mashups.The remainder of this paper is structured as follows.First, the methodology used to review the literature will be discussed.The literature composing each of the 6 research categories is then discussed, including related research and foundational concepts that web mashups are built upon.Following the literature review, the characterizing attributes from each section are aggregated into an overall web mashup classification framework that can be used by researchers to frame future research, and by web developers to aid in anticipating the appropriate design attributes of mashup environments.

Review Methodology
Over the past few years web mashups have become a popular topic amongst both research and industry communities.However, the current posture of mashup literature is rather disjointed, which can partially be explained by the newness of this topic.A possible explanation for this is that web mashup literature lacks a thorough literature review to identify its boundaries, common issues, and subtopics.Therefore, in an effort to address this need and help frame future research, a four step methodology was developed and conducted to review mashup literature and identify its boundaries and subtopics.
The first step in the review process was to gather as many mashup related publications as possible.To do this Google Scholar was used to search for publications with the following phrases in their title: mashup, mash-up, web mashups, web mash-ups, Web 2.0 mashups.The IEEE Explorer and ACM Portal libraries were also searched with the same keywords used for Google Scholar.Additionally, the references of each web mashup publication retrieved were also reviewed to identify additional mashup publications that fell outside of our search criteria.In result, 60 web mashup publications were gathered, which are summarized in Appendix A. The second step in the review process was to identify the number of subtopics within web mashup literature.To accomplish this, all of the publications were reviewed and were grouped based on the research questions addressed, key terms describing the research, and on research methods used.This resulted in six groupings of mashup papers.The third step of the review process consisted of developing names and definitions for each grouping of literature.The publications within each group of literature were reviewed a second time to help develop names (access control, integration, agents, frameworks, end user programming, enterprise) and definitions of each group.The final step of the review process was to ensure that each publication appropriately fit into the group it was originally placed in, after the group names and definitions were given.In this final review, 3 and 4 articles were moved from the Frameworks group to the Access Control and End-User Programming groups, respectively.Both authors discussed this change in classification and it was decided that while the 7 articles moved were framework oriented, the focus of the frameworks and subsequent research was primarily on the groups they were moved to.One interesting observation made was that of the 60 web mashup publications, 19 of them came from the International World Wide Web Conference.

Literature Review
Currently there's a lot of buzz surrounding web mashups in industry and academic communities.The term mashup was actually derived from the music industry, where disc jockeys remix original content from various artists to create new material [9].Therefore the idea behind a web mashup is to synthesize new information by reusing and combining existing content from disparate information sources.Mashups allow end-users to combine information and knowledge from a plethora of sources, and integrate them into customized, goal-oriented applications [10].
Web 2.0 mashups have emerged as a trend in website design that focuses on synthesizing new content by combining data and information from multiple sources in a unique way.As such, mashups have proliferated because of how quickly they can be created, given that they're composed of preexisting data [11][12][13].For example Huynh et al. [48] found that all users were able to learn their mashup interface and complete a mashup within 45 minutes.The expansion of mashups can be viewed from three different perspectives.The first area of expansion is simply in the number of new mashups that are continually being created.Since their building blocks are existing data sources, building new mashups is a relatively quick process.Secondly, the number of sources being used to build mashups has also greatly expanded.Initially, many of the resources used in mashups were publicly available data (e.g.web services and RSS feeds), but now mashups are also being composed of databases, data warehouses, and even legacy systems [14,15].The number of mashup research domains is another area of expansion found in mashup literature, for example, Hoyer and Fischer [16] distinguish between consumer and enterprise mashups.
The remainder of this section presents the results of the literature review and categorization process described in the previous section.Table 1 contains category names and descriptions for the 6 different literary groupings of mashup literature.The first 5 categories (access control and cross communications, integration, agents, frameworks, and end user programming) represent the interrelated technological challenges common to all web mashup applications, with the 6th category covering organizational topics that are present in the domain of enterprise mashups.

Access Control and Cross Communication
The first area of mashup research found in prior literature examines issues surrounding access control to mashup data and cross communication between backend resources.Mashups connect disparate applications and data resources to provide their services.This requires mashup applications to gain access to disparate data sources that have a variety of different sensitivity levels from RSS feeds (open access) to legacy systems (restricted access).This creates security risks for both mashup users, who have to give their credentials to mashup sites, and legacy systems managers, which must open their systems to external access.

Access Control & Cross Communication
Mashups connect disjointed applications to provide unified services, however access control to legacy systems is difficult and hinders a mashup's potential while increasing security risks to users who have to give their credentials to mashup sites.[17][18][19] Mashup Integration Mashups aggregate various different types of data sources (e.g.databases, legacy systems, xml, dynamic web pages, and rss feeds), this area of research addresses the data extraction obstacles presented by these different data sources.[11,14,[20][21][22][23] Mashup Agents A promising potential of mashups is the ability to semantically determine information sources that are relevant to the user, and autonomously include them in the mashup.[8,[24][25][26][27][28][29] Mashup Frameworks While mashups are gaining exponential popularity, their individual applications tend to be ad-hoc, partially because of the vast differences in data sources and purpose.Researchers have identified the need for mashup frameworks, to provide developers with a set of best practices.[4,15,[30][31][32][33][34][35][36][37][38][39][40][41][42][43]

End User Programming
Enabling the end user to create their own custom mashups is a major reason why mashups have gained such popularity.However this presents an obstacle in that most end users do not obtain the technical expertise to develop mashups, to address researchers are developing end user programming languages and tools, to enable nontechnical users to easily create mashups.[10,28,[44][45][46][47][48][49][50][51][52][53][54][55][56]

Enterprise Mashups
Organizations have identified the potential strategic advantage that could be provided by mashups in terms of business intelligence.As such business researchers are focusing on the various enterprise mashup related issues such as accountability, design principals, and intranet deployment.[27,[57][58][59][60][61][62][63][64][65][66][67] From a logistical and security perspective, the technical challenges involved in mashing legacy systems is much different than mashing multiple RSS feeds.As is the case of enterprise mashups, where the back-end resources being utilized by the mashup are legacy systems or databases, access control becomes a legitimate concern for several reasons [68].First, it can be difficult to include backend systems in the mashup because of the access control systems that they have in place.Often times the workaround is for the user to give the mashup their credentials and for the mashup to go on and impersonate the user.In situations where the legacy system or database is not designed to permit provisional mashup access, the mashup receives all access.If a malicious mashup was developed and gained access to a back-end system there would be great risk to the owners and other users of the backend system.Similarly, if a legitimate mashup was developed by a user and included the user's credentials to one or many back-end systems and the mashup was compromised by a malicious attack, the user's credentials could be leaked to many different sources.This access control "problem" has been addressed by several publications and continues to receive attention [17][18][19]68].
There are three primary mechanisms being used to provide access control for mashups.The first approach is referred to as a "strawman" approach, in which a series of access checks are made to authenticate the user and then the mashup application is provided the user's credentials to allow it to access the system (e.g.acts as the user when interacting with the application).Two examples of this approach are Authsub [69] and OAuth [70].Authsub is Google's protocol web services.Authsub requires users of web applications to complete an "access consent" page which provides the user with an authentication token.This token allows the web application to interface with Google, acting as an agent for the user.OAuth [70] is an open authentication specification that provides consumer applications requesting restricted data with an "unauthorized token" that is converted to an "authorized access token" when the user successfully logs.Once the service provider receives the authorized token it redirects the user to the consumer application where the authorized token is exchanged for an access token that contains a password.One complexity of the OAuth [70] approach is that it requires that the service maintain the state for all previously issued tokens.There are a number of downfalls to "strawman" schemes for mashup delegation [68].First, the mashup receives all of the user's privileges to the back-end system.Since the user cannot restrict the mashup's access within the back-end system, the user must completely trust the mashup.Additionally, a comprised mashup can leak user credentials, making the user, who is completely trusting the mashup, extremely vulnerable.
One alternative to "strawman" approach is an approach that mimics real-life permit-based authorization schemes.Hasan et al. [68] propose a delegation permit model that allows a user to grant a mashup "delegation permits," which specify limited access rights to specific services.This approach to mashup delegation addresses two of the shortcomings of other methodologies.First, the user can restrict the scope of access available to the mashup in the back-end system.It also limits the length of time that the mashup has access to the system.
A second alternative for providing access control to mashup applications is Open ID 2.0 [19].There are four entities in the Open ID 2.0 model: the user, mashup, claimed identification, and identity provider.First, the user visits the mashup and provides a claimed identification which specifies which identity provider they wish to use to access the mashup (e.g.users can log into many online mashups using Twitter, Facebook, Yahoo or Google credentials).The mashup redirects the user to the identity provider where the user can log in.The identity provider then redirects the user back to the mashup with cryptographic proof that the user has been authenticated and also provides the mashup with any profile information the user chooses to release.Similar to the delegation permit model [68], Open ID 2.0 allows users to control the amount of time the mashup is authorized to access the identity provider [19].
As Table 2 illustrates, access control methods can fall into one of three different categories: anonymous, full delegation, and limited delegation.Anonymous access control would be a situation where publicly published data sources are being accessed (e.g.web services, RSS feeds).Situations where the mashup is given the user's credentials, and the backend system cannot differentiate between the user and the mashup, would be categorized as full delegation (e.g.[69,70]).Finally, approaches that provide the user some control over what access the mashup has to the data sources (e.g.[19,68] would be considered limited delegation because the amount of access and the length of time that it is granted to the mashup is restricted.Full Delegation [69,70] Proxy [71] Limited Delegation [19,68] Abstracted [17,18] A fundamental feature of mashup applications is that they allow users to combine data from a variety of sources.However, when the underlying data resources are available from different providers, security issues can arise.Cross communication between multiple back end systems poses a problem in mashup development because the current generation of web browsers cannot adequately support it.Currently, browsers are designed around a "same origin" policy, where data (or code) from one source can only interact with content from its same origin.Consequently, mashup developers must choose between security and functionality, which often results in giving uncontrolled cross domain execution through the use of <script> tags and extending the browser with plug-ins for cross domain interactions [17].This sort of cross-site scripting introduces a computer security vulnerability which might enable malicious attackers to inject client-side script into web pages viewed by other users.It can also allow one back end system complete control over another [18].Figure 1 illustrates a warning message from a Google personal page, where a gadget requires "inlining" which provides the gadget more control over the full Google page because it is not wrapped in an "inline" frame.This allow the gadget to change aspects of the Google page and may also allow the gadget to access the user's Google account [17].
Keukelaere et al. [18] propose a model designed to address the cross communication issue by abstracting content from differing sources through a component based encapsulation.Instead of being proxied or using a <script> element, disparate components are loaded from their own server and then are isolated from the mashup code."This has security advantages in (1) not requiring the component to completely trust the mashup application, and (2) making the component abstraction compatible with password anti-phishing mechanisms that use the component (DNS) domain or the certificate of the SSL connection" [18, pg. 536].Jackson and Wang [17] propose a similar approach that uses nested (mediating) frames to hinder direct communication from disparate web services, but because certain web development platforms are shying away from the use of frames (e.g.MS Visual Studio), this approach may pose more obstacles in the future.Table 2 also illustrates the three different ways that cross communication between disparate data sources can be handled in the browser: unrestricted cross communication, proxy cross communication, and abstracted cross communication.Full cross communication is when developers bypass the browser's "same origin" security policy, and use <script> tags to extending the browser with plug-ins for uncontrolled cross domain interactions [17].Proxied cross communication is more secure than full cross communication and occurs when a web application proxy server is used to fetch the disparate content from different servers and then serves it to the mashup [71].From a security perspective, the most desired approach to cross communication would be abstraction, where content from differing sources is abstracted through component based encapsulation [17,18].

Mashup Integration
Web mashups offer a lot of potential to both consumers and enterprises [16].One of the most noted advantages of mashups is that people with little technical expertise can easily build new web applications and create new forms of visualizations [38].However, there remains a handful of technical challenges that need to be overcome before these benefits can be realized.This section looks at cross communication from a compatibility and syntactical perspective.When mashups aggregate heterogeneous data sources it becomes difficult to convert, condense, and intelligibly communicate the summarization on a common web interface [21,23].An example given would be combining XML with output from a legacy COBOL system.In this situation semantically predicting relevant mashups would be a challenge given that output from the latter of the two systems contains little or no metadata.While the following section covers semantic understanding of heterogeneous data types (Mashup Agents), this section covers the conversion aspect of this problem and has been labeled 'Mashup Integration' and includes topics such as integration, transformation, and cleansing.
Mashup integration is very similar to data integration, which has been well addressed in IS literature (e.g.[22,[72][73][74][75]).While much of the data integration work was published well before the term 'web mashup' was coined, this literature does provide a foundation for the problem currently being experienced in mashup integration.Two competing data integration models are Local as View (LAV) and Global as View (GAV) [72,75].The LAV approach is based on the enterprise model, and states that content from each source should be characterized in terms of a view over the global schema [75].An example of the LAV approach in practice is a data integration system that is based on the enterprise model [76]."This approach is effective whenever the data integration system is based on a global schema that is stable and well-established in the organization" [74, pg. 235].Therefore, the LAV approach would be suitable for enterprise mashups domains.Conversely to LAV, GAV is based on the concept that the content from each source is treated as a view which is aggregated to a global schema [74].Each source (or view) has it's own query interface, therefore GAV would be a good integration approach for consumer mashups, that aggregate sources from multiple organizations.
The aforementioned integration approaches are good solutions for mashups that are aggregating database or data warehouse's, but may not be warranted for aggregating XML sources.Thor et al. [23] propose an integration framework for XML that takes a transformation approach.Since mashups rely heavily on user interaction, the user should be included in the process when multiple sources are being transformed to a summary form [23].They provide a "fuser" that takes multiple XML documents or portions of XML documents and merges them into one, from there the user can make incremental refinements to the transformation process, until a sufficient solution is developed.Murthy et al. [21] present a similar transformation integration process, but add an additional emphasis on first cleaning the data before it is transformed.
While data integration literature provides a good foundation for web mashups, much of this literature is limited to homogeneous integrations.For example, the LAV and GAV integration approaches discussed previously focus on coupling databases together.Even if the databases being integrated are different (e.g.Oracle and SQL Server), this is still considered a homogenous integration because the systems being coupled are the same type.The same is true with the XML integration methods presented by Murthy et al. [21].One area that integration literature needs to be extended for mashup domains is in the integration of heterogeneous data sources.This need is extremely prevalent in enterprise mashup domains, where mashup's integrate organizational legacy systems and databases with publicly available information sources like web services and RSS feeds.Sneed [14] approaches heterogeneous data integration by developing a framework that uses a Service Oriented Architecture (SOA) to convert legacy systems into XML and make them available as web services.Similarly, Thor et al. [23] take a similar approach that utilizes XML wrappers to bring heterogeneous data sources to a common platform.In fact, both Sneed [14] and Thor et al. [23] share common ground with, and could benefit from derivation of, the Enterprise Application Integration (EAI) body of knowledge.EAI incorporates programming across multiple enterprise applications, provides workflow languages and messaging, and gateways to common enterprise applications (e.g.databases, application servers, and web servers), which is similar to SOA approaches discussed in literature [22].
As illustrated by Table 3, integration literature can be viewed from three different perspectives: homogenous integration, heterogeneous integration, and application (process) integration.Homogeneous integration research focuses on coupling similar data from different sources, two examples would be enterprise mashup's coupling of legacy systems and consumer mashup's coupling of web services.Heterogeneous integration research addresses domains that couple dissimilar data from different sources, an example would be an enterprise mashup that couples legacy systems and public web services.Application integration focuses on coupling disparate business processes in enterprise mashups through methodologies such as SOA.

Mashup Agents
A large number of mashup environments are being designed for end users that lack significant technical expertise (e.g.[47][48][49]51]), which increases the need for tools that support the mashup creation process.These tools generally include algorithms, web crawlers, and other technologies that are designed to go out and find potentially relevant sources of information.All of these tools can be classified as 'Mashup Agents', even though this term was not used in any of the publications that were reviewed.
Web services are a common ingredient in web mashups.Much work has been done on web service composition including industry efforts on standard composition languages such as BPEL [77].However, in terms of mashups, web service composition techniques are not suitable as they focus on describing messaging and synchronization processes; whereas mashups involve data integration and require content descriptions rather than process descriptions [28].To address this shortcoming, Tatemura et al. [28] applied a machine learning based approach, and designed an agent based framework that continually observes web service feeds over time to learn the patterns, in order to provide a semantic mapping of the web service's content.In this approach, Artificial Intelligence (AI) is used to determine the semantic meaning of the potential mashup ingredient itself.Another approach is to utilize AI from the user's perspective, to predict and recommend potential mashup ingredients that they may deem useful.Wang et al. [29] do this by developing a Bayesian network to build user profiles as the representatives of usage patterns.A Bayesian network is a decision tree where each alternative outcome is assigned a probability, so in this case, each potential mashup ingredient has a probability of relevance, based on the user's previous browsing patterns.Natural Language Processing (NLP) is another AI technique that has also been found to be an effective means for web service semantic discovery.Blake and Nowlan [8] implemented such an approach and developed an agent that performed pair wise comparisons of web services based on the commonalties between the service's inputs and outputs.An empirical evaluation of their agent showed that among the top 6 messages for mashup predictions were State, City, and Zip, which combined had a 29% prediction rate for valid mashups.In fact location based mashups could be considered a mashup sub-category all on its own (e.g.[26,29,34,45]).
As illustrated by Table 4, mashup agents can fall into one of two basic categories, server based agents and client based agents.Server based agents are tools that reside on a web server like Microsoft Popfly or Yahoo Pipes, whereas client based agents are plugins that are installed in the user's browser.Server based agents is the most popular of the two in industry and are widespread in academia too.For instance, Ikeda et al. [25] developed a mashup development framework that provides a server based data management engine that enables the developer to identify semantic relationships, and then to browse based on semantic relevance.Similarly, Lu et al. [78] present a semantic based agent that identifies similar API's and then enables users to build server based mashups.
That being said, client based agents have several advantages over server based agents which include but are not limited to data access, privacy, performance, and user experience [24].An ongoing research project titled MashMaker is an example that demonstrates the benefit of client based mashup agents [24,46,47]."MashMaker can see everything the browser can see, including local files, information on the intranet, information requiring a login, and active content generated by Javascript" [24, pg. 29], and therefore provides performance and privacy advantages that could not be realized by a server based agent, since it doesn't need to transmit data to a central server in order to create the user interface [47].

Induction Orientation
Machine Learning [28] Server Based [25,78] Bayesian Networks [29] Client based [24,46,47] Natural Language Processing [8] There are 2 classifiers that can be consistently used to categorize mashup agent literature, induction method and orientation.These are illustrated in Table 4.Despite the newness of web mashup research, we've already seen the application of 3 separate AI technologies applied to mashup agents for semantic induction: machine learning, Bayesian networks, and natural language processing [8,28,29].This begs the question, "What other AI technologies might be used to enhance the capacity and effectiveness of web mashup agents?", which is the first area of mashup agent research.The second and third areas of mashup agent research are server and client based agents respectively.As noted earlier both have their advantages.Server based agents are more widespread in industry, and thus may be easier for initial adoption.However client based agents offer many advantages over server based agents because they don't require interactive communication with a hosting server, and have access to local based resources that are not available to server based agents.

Mashup Frameworks
This section discusses mashups from an aggregated scope, in review of frameworks that address the different attributes of mashups collectively.Frameworks are extremely important in the initial phases of a technology's lifecycle as they provide an initial foundation that can be applied by early adopters in industry and that can be evaluated (or extended) by researchers in academia.Since mashups are designed by end users for specific contexts with specific purposes, they tend to be very ad hoc in nature and span a wide variety of domains.As such, many frameworks have been developed to help designers develop mashups for the different domains.
Mostarda and Palmisano [40] present a mashup framework that features a hybrid scripting language that supports unification as it is based on the type morphing paradigm.Much like inheritance in object oriented environments, type morphing is the ability of the language to cast any primitive type to another one where needed.The transformation of data across heterogeneous types is done by following a set of predefined rules.Through an interactive interface, users can iteratively overload (or mash) model elements until a sufficient solution is developed.Vancea et al. [15] also use an object-oriented data model for the exchange of data between the disparate data models.Furthermore they argue that current web mashup frameworks lack sufficient data models to handle data interchange and propose "a database-driven approach to web mashups that supports integration at the database level and enables mashup developers to work with a uniform abstract model and have direct access to powerful features of database systems" [15, pg. 162].Abiteboul et al. [30] present a mashup model that quantifies the different roles of mashups which are: query data sources, import other mashups, use external Web services, and specify complex interaction patterns between its components.Their model is derived from semantic web, and various object oriented concepts like inheritance.
One concept that is commonly discussed in mashup frameworks is interactive (or iterative) processes.This is especially true in the 'End User Programming' literature and will be discussed in the following section, and can be explained by the fact that multiple mashup variations are frequently considered before an acceptable solution is discovered.Cetin et al. [32] present a framework for migrating legacy systems toward service-oriented mashups.In their framework, they place emphasis on modeling the business requirement up front and then analyzing the existing legacy systems to see if such functionality currently exists and can be implemented into the mashup."The iterative mapping process is as follows: (a) if a business requirement can be satisfied by one of the existing legacy components, then simply wrap it by considering the QoS attributes; (b) if there is a gap with the existing legacy component and requirement, and the gap can be filled during service wrapping, then accustom the legacy component into a new service; (c) if the gap cannot be fulfilled, then develop a new service for the requirement" [32, pg. 171].

End User Programming
Another challenge being addressed by mashup researchers is how to seamlessly package mashup technologies in such a way that non-technical users can easily and effectively create mashup application in a myriad of different domains.As represented by the attention given to this topic in literature, developing mashup environments that are conducive to non-technical end user development is one of the more difficult challenges.Of the 60 publications that were reviewed in Appendix A, 25% (15) can be classified as 'End User Programming'.
Two different approaches are commonly taken in addressing (enabling) end user mashup development.The first approach is passive by nature and focuses on designing plugins that work with the user's current browser, observe what the user is viewing, to suggest related sources for potential mashing (e.g.[46,47,51,53]).There are several benefits to extending the user's current browser for the mashup process.First, the mashup process is very close to the user's web browsing experience, if the user encounters data that they would like to manipulate, the user would not have to launch a separate program to begin mashing [56].Secondly, as mentioned previously in the client based mashup agent discussion, it makes it easier to access local data on the user's machine that the browser normally has access to.Lastly, installing a browser plug-in is easier than installing a separate application, and thus may foster use [56].A second approach to end user mashup development is proactive by nature and is necessary when the mashup process becomes more complicated (e.g.process modeling or advanced interface integration) [52].Tuchinda et al. [54] present a tool that enables users to develop mashups in complicated integration domains by first providing examples of what the end mashup should look like; the tool then aims to mimic the format of the end result, allowing for mashups to be developed by end users who don't have programming experience.Tatemura et al. [13], take a similar 'by example' Proactive [52,54] Passive [43, 51] [47] approach in developing a tool that allows users to mashup disparate data sources by creating abstracted target schemas that are populated based on examples provided by the user.Mashroom is another end user mashing application that is based on the nested relational model, and allows users to iteratively construct mashups through continuous refinement [55].
As mentioned previously, end user programming literature seems to generally fall into one of two categories, passive or proactive.Passive approaches piggyback onto the user's current web browsing experience and function as browser plugins, and proactive approaches are executed from a separate application and are designed for more complicated mashup domains.Figure 2 illustrates a third approach that is beginning to appear in literature that is a middle ground to the passive and proactive approaches (e.g.[47]).In this scenario, the user's browsing experience begins as normal with the mashup process being passive.The browser has a plugin installed that observes the user's browsing patterns.Next, the plugin activates a server based inference engine (e.g.Google Suggest) that suggests potential mashup sources in a side bar.The user can continue browsing or at any point begin creating a mashup.

Enterprise Mashups
Business researchers have identified an emerging opportunity for organizations to use technology to exploit information from desktops, the web, and other non-traditional enterprise sources, in order to react to situational business needs [65].This tract of research has been labeled as 'Enterprise Mashups', and has been distinguished from consumer mashups because there are a plethora of legal and accountability related issues (e.g.security, availability, quality) that are specific to the organizational domain [16].Similar to end user programming, enterprise mashups have received a significant amount of attention in mashup literature, of the 60 publications reviewed for this study, 20% (12) were classified as enterprise mashups.
Two areas of existing enterprise related research that support enterprise mashup domains are Service-oriented Architecture and Enterprise Information Integration [27,61].Service-oriented Architectures (SOA) are based on semantic web services and have widely been considered a key technology for achieving business-to-business integration within corporate intranets [79].However, as Web 2.0 concepts are being applied to enterprise domains (e.g.enterprises mashups) the need to offer a user-centered focus to improve business productivity and innovation has been revealed [80].This usercentered approach is beyond the functionality of traditional B2B SOA's [27,61].Therefore, a big focus in enterprise mashup related research is on developing tools and frameworks to extend SOA's into mashup-able applications.
Lizcano et al. [27] highlight the shortcoming of SOA's in mashup domains and present the two proof of concept research projects (FAST and EzWeb) for enterprise 2.0 mashups.DAMIA is another research project that focuses on enabling the creation of enterprise mashups that combine data from desktop, web, and organizational IT sources into feeds that can be utilized by user created web applications [57,65].Siebeck et al. [64] highlighted the advantages of using cloud computing infrastructures as a platform B2B integration mashups.As indicated by the investments in research, organizations are beginning to see the potential impact that enterprise mashups could have on competitive advantage.

Mashup Classification Framework
Mashups tend to be ad hoc in nature, as they are created by end users for specific problems.It could be suggested that no two mashups are alike, each being designed to fulfill a very specific need.This presents a unique challenge for researchers seeking to understand mashups and for developers seeking to provide tools to support this diverse domain.Every mashup could potentially have its own framework tailored particularly to the attributes of the domain it's developed in.However, this would be of little use to researchers or developers.Instead, this study seeks to find similarities amongst the differences and to provide system designers with a framework that outlines the major design characteristics of mashup development projects.The Mashup Classification Framework presented in Table 5 synthesizes the primary attributes of mashup applications identified during the literature review.It provides an overview of mashup design options that can be used by practitioners seeking to define their mashup architecture.In addition to being a tool that supports mashup design, the Mashup Classification Framework can also benefit the academic community as well.First, it provides researchers a comprehensive scope of the technologies that are involved in mashup development.Second, by breaking down the technologies involved in web mashups, the framework provides insight into other bodies of literature that mashups can be built upon and derived from (e.g.Mashup Integration → Data Integration; Mashup Agents → AI; Mashup Interfaces → Human-Computer Interaction).Finally, in the same way that practitioners can use this framework to identify system attributes, researchers can use it to identify and focus their research on various aspects of mashups, and the interactions between them.To provide an example of this, 8 mashup applications have been selected from the literature review and are classified in terms of the mashup classification framework in Table 6.

Future Research
Over the past few years web mashups have generated buzz in research communities.However, as with any new technology, there remains many unanswered questions.In terms of access control, researchers are beginning to tackle the problem of proper access control for mashups, and differentiating them from actual users.Much work remains in this area, especially in regards to enabling legacy systems, to differentiate between actual users and mashup agents.Similar security related issues remain in managing cross communication between disparate data sources in web browsers.As web 2.0 technologies have advanced, the old "same origin" security policy still used by web browsers today is no longer sufficient.With promising success, researchers have begun developing delegation frameworks to work with existing browser (e.g.[18,68]).However, another approach that could be investigated would be to develop a next generation of web browsers that are more suited for the security and integration issues prevalent in web 2.0 technologies.To date, the majority of web mashup literature focuses on extending the capabilities and functionality of this new technology.Unlike other, more mature, IS research topics, web mashup literature lacks theoretical application (e.g.Trust, Perceived Usefulness, Computer Self-Efficacy).Computer Self-Efficacy likely has a big role in the successful application of web mashup infrastructures, because generally speaking, they are aimed towards end users who are developing applications, and lack advanced technical experience.Additionally, Trust and Personal Innovativeness would likely be applicable constructs to web mashups being applied to the domain of decision support (e.g.[81]).That being said, one observation that should be noted is the common inference that the user's mashup process is iterative (e.g.[28,47,54,56]).For example, Huynh et al. [48, pg. 13] state that they believe, "the users actually work iteratively on data, switching from aligning and clean up the data to using the data, and back, as they get to know the data better over time".This is important to mention because none of the literature that was reviewed investigated this aspect of the mashup process directly, but rather, only discussed it peripherally, which suggests a gap in the literature.Another opportunity for future research would be to apply a clustering algorithm, such as k-means, on the reviewed papers to see how each relates to the proposed categorization of Table 1.

Conclusions
Web mashups are an interesting and exciting web 2.0 application that are receiving lots of attention from both practitioners and researchers a like.The concept of web mashups was derived from the music industry where DJ's integrate and mix multiple tracks into a new track.While conceptually, web mashups are similar, they are extremely more complicated than mixing music, as they are a way to create new web applications by combining existing resources, data and APIs [82,83].This research focused on synthesizing the information available on mashups so that researchers and practitioners can better understand the scope of and challenges associated with this emerging research domain.This study revealed five common themes found across multiple research studies.First, because mashups aggregate content from disparate information systems access control is legitimate concern, especially in domains involving legacy systems which are not designed for delegated access control (e.g.[19,68]).Additionally, cross communication is another security challenge in today's web browsers that operate in a "same origin" security policy (e.g.[17,18]).Second, integration was identified as a significant technical challenge encountered when developing mashups.While integration has been addressed in IS literature, web mashups present new challenges like heterogeneous (e.g.mashing a web service with a legacy system) and application (e.g. business process) integration [11,14,[20][21][22].A third issue is related to information overload because there are simply too many information sources for individuals to process when selecting the best possible mashup resources.To address this problem researchers are applying artificial intelligence methods to mashup agents that go out to the web and retrieve appropriate mashup ingredients.Prior research has identified machine learning, Bayesian networks, and natural language processing as induction methods that can be used by mashup agents (e.g.[8,28,29]), but other methods like neural networks could also prove useful.A fourth issue identified during the literature review was the ability of end users to create custom mashup applications.This research focused on the alternatives available for user interfaces, which can utilize a passive or proactive mashup approach (e.g.[46,47,51,53,56]).Finally, this research identified a set of issues, including security, availability, and quality issues, that are unique to enterprise mashups (e.g.[27,57,60,64,65,84]).The primary contribution of this study is the development of a mashup classification framework that is rooted in prior research but extends that research to provide a tool to for designing and classifying mashups.This framework can be used by researchers and practitioners a like as mashups evolve to solve even more complex problems.Mashup Framework [40] MU: A Hybrid Language for Web Mashups.
A scripting language is presented that provides support for unification, it is based on the type morphing paradigm, provides user interface induction, and defines both the java runtime environment and java script profiles.
A database-driven approach to web mashups is presented that allows data integration and mashup logic to be managed within a database to enables developers to work with a uniform abstract model and to have direct access to powerful features of database systems.
Mashup Framework [30] Modeling the Mashup Space A mashup model is presented that quantifies the different roles of mashups which are: query data sources, import other mashups, use external Web services, and specify complex interaction patterns between its components.
Mashup Framework [31] Two Cultures: Mashup Web 2.0 and the Semantic Web.
The differences between Web 2.0 and Semantic Web are disputed by reinforcing their commonalities.The authors advocate a paradigm shift from an overly machinecentered AI view of the Semantic Web towards a more user and community centered approach that draws from the insights of Web 2.0.
Mashup Frameworks [4] Semantic Blogging and Decentralized Knowledge Management.
Presents a framework to suffice the organizational need of a decentralized, informal, knowledge management system.The framework is a middle ground between blogging and mashups, where multiple users can centrally contribute to decentralized information.
Mashup Frameworks [41] A Methodology for Qualitybased Mashup of Data Source.
A review of mashup literature is conducted, and the need for a mashup framework is identified.The authors present a framework that promotes mashup quality by focusing on inputs.

Figure 2 .
Figure 2. Classifications of Web Mashup User Interfaces.

Table 2 .
Classifications for Cross Communication and Access Control.
mashup development framework is presented.It is based on a data management engine to enable the developer to identify semantic relationships, and then to browse based on semantic relevance. A