Cwm Global Search—the Internet Search Engine for Chemists and Biologists

CWM Global Search is a meta-search engine allowing chemists and biologists to search the major chemical and biological databases on the Internet, by structure, synonyms, CAS Registry Numbers and free text. A meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source [1]. CWM Global Search is a web application that has many of the characteristics of desktop applications (also known as Rich Internet Application, RIA), and it runs on both Windows and Macintosh platforms. The application is one of the first RIA for scientists. The application can be started using the URL http://cwmglobalsearch.com/gsweb.


Introduction
The Internet is changing how we work [2].The Internet will also become the major information tool for chemists and biologists.Google has changed how we search and what we expect for answers.Our brain adjusts to the way we work [3].We have to realize that the Internet is not a simple tool, but something that can control us, and this can be a frightening thought.We get used to always finding an answer, but we need to watch out that we do not forget to search not for the first best, but for the best answer; otherwise we lose on efficiency what we gain on speed.

OPEN ACCESS
Chemists have a powerful language-the chemical structure-and this is not in the realm of Google and other search engines such as Wikipedia.Databases cannot be indexed like simple html pages and Google and other search engines cover only a fraction of the knowledge.Databases on the Internet, however, can be searched by web services and CGI (Common Gateway Interface) scripts.

Product Description
The web application CWM Global Search uses the chemical structure and web services to search Internet websites and Internet databases.CWM Global Search is not a database, and does not rely on an index, but relies on freely accessible web services and freely accessible scripting interfaces to websites [4].There is no database to update.CWM Global Search will always produce results from the current view of what is available on the Internet.
CWM Global Search allows all major, freely accessible, chemical databases hosted on the Internet to be searched using a chemical structure, CAS Registry number or synonym as a query.
The program generates InChI identifiers (both InChI Names and InChI Keys), molfiles, as well as Smiles strings from the query structure and subsequently locates those identifiers on the Internet.
The user interface has a Quick Search and a Global Search.In the Quick Search the user enters structure, name or CAS Registry Number.The program locates the IUPAC name, synonym names, brand names, other known identifiers and CAS Registry Numbers besides the structure using the NCI/CADD Chemical Identifier Resolver [5].Some hyperlinks are presented that might be already satisfactory for the user.This first search is very fast.An example of the quick search results is shown in Figure 1.The results of the Quick Search, structure, names and CAS Registry Numbers can be selected for a more comprehensive search using the Global Search.The results allow "cross-linking", for example: Quick Search can be used to get a list of synonyms for 'maslinic acid', and then this list can be used in a Global Search.You will find the natural occurrence of this compound in an article in Wikipedia which only mentions 'crategolic acid', (only one of the known synonyms for maslinic acid returned by the QuickSearch), but nowhere does the article in Wikipedia mention 'maslinic acid'.In Global Search, one can search presently over 46 different databases and Google.This includes chemistry centric databases such as ChemSpider, databases specialized in finding commercial suppliers such as eMolecules, databases with a focus on biological data such as PubChem, KEGG, ChEBI and the DrugBank, databases containing patent information such as SureChem, and literature databases such as MedLine.Some sources are gateways, like Open J-Gate, which is an electronic gateway to global journal literature in the open access domain.A complete list of the sources can be found in Appendix 1.
The user can select the data sources using predefined profiles, i.e., chemistry, biology, availability, safety, etc., and can create his own profiles, see Figure 2.
The search returns a collection of hyperlinks which allow direct drill down to the corresponding database to look at the data associated with the query.These drill-down pages are parsed by CWM Global Search to generate facts.These are true/false indicators that tell the user what kind of data he potentially can find in the corresponding database for his query.The result of the search is a grid with links and highlighted facts; see Figure 3. Examples of such facts are the presence of commercial suppliers, availability of safety information or spectra, as well as the presence of known biological activities associated with the query.CWM Global Search supports exact structure searches, isomer searches (tautomers and stereoisomers), as well as substructure and similarity searches.At the moment, only PubChem and ChEBI support a substructure and similarity search using the drawn structure without query features.The structures can be copy and pasted from any of the major chemical drawing programs, or can be drawn directly in the JDraw [6] applet.
CWM Global Search supports single structure searches and multiple structure queries via support of SDFiles.In addition, the program supports reaction based queries by support of RXN structures and/or RDFiles.In case a reaction is used as a query, CWM Global Search searches the Internet for all reactants and products contained in the reaction.This is not a reaction search, but is useful for finding suppliers for starting materials, while at the same time also making sure that the product cannot be bought.

Additional Features
In the Quick Search and Global Search results page, a button "chemicalize" links to a page that displays calculated values for the compound; see

What are the Differences between CWM Global Search and Other Systems such as PubChem or ChemSpider?
Unlike other systems such as PubChem or ChemSpider, CWM Global Search is NOT a database.We always search the most current snapshots of the supported data sources, thus also making sure that recently added records in the various data sources can be located.
This eliminates the problem that links to newly added PubChem records in ChemSpider and vice versa may not be found because of pending updates in the underlying database.
A major strength of CWM Global Search is the chemical structure search via the integrated structure editor (JDraw from Accelrys, Inc.).This structure editor supports copy/paste operations with major chemical drawing packages allowing the user to keep using his favorite structure drawing tool.Another unique feature of CWM Global Search allows you to draw a reaction; with one click you can find information for all reactants, reagents and products such as suppliers or safety information.
Comprehensive search-with a single click you can search for a structure, one or many associated CAS Registry Number(s), plus an arbitrary list of associated free text such as synonyms, brand names, and identifiers.

What are the Differences to SciFinder [8]?
SciFinder is the user interface for the world's largest chemistry database produced by the Chemical Abstract Service (CAS), a division of the American Chemical Society.However, it can only be accessed for a fee.While most academic institutions have access to SciFinder with an academic rate, many small and medium-sized companies simply cannot afford these fees, for them Global Search is a very important first alternative.In all patent related cases, the most important question is whether a given structure is known or novel.
The most important aspect of the comparison with SciFinder is the fact that one cannot expect to find all answers to a given query in SciFinder.Many cases are known in whch the use of Global Search produced important references that were not found in SciFinder.Thus, Global Search will provide the occasional user with quick answers, and it will give the professional information specialist the very important extra certainty that his search was as comprehensive as possible.The strength of CWM Global Search is its access to additional sources that are not considered publications, like entries in Wikipedia and databases.

How to Start CWM Global Search?
You can start the application using the URL http://cwmglobalsearch.com/gsweb.If you start the application for the first time you might be asked to install the Microsoft Silverlight Plug-in.We support Internet Explorer, Firefox on Windows and Safari on Macintosh computers.Detailed information about the program can be found on the CWM Global Search homepage: www.akosgmbh.de/globalsearch.The free version is limited to Google, PubChem, ChemSpider, AKos Samples and the SureChem patent database.The free version will only show how many search results are found in the 40+ databases supported by the commercial version, and will not provide hyperlinks to the actual data.

Considerations
CWM Global Search relies on web services to generate the InChI names, and keys, and the availability of websites to search the underlying sources.We have no control over these web services and websites, they can be down for maintenance, moved to another location, or turned off.We periodically run searches to check for such issues, but a user should be aware that sometimes he has to re-execute a query when the web service or website is available again.According to our experience the availability fail rate of the web services and websites is very low.Since the whole application including the search engine is hosted on our server, we can upload a new version any time without involving the user, and we will do this because new interesting sources can be added monthly.
Maybe we should also discuss our business model.We try to keep the license fee as low as possible and augment this by giving sponsors room for their advertisements.A user in CWM Global Search will not interrupt his work to click on an advertisement, at least not very often.Therefore, the information that the sponsor wants to show must be in the advertisement.We run a slide show, giving each advertisement enough space to display an essential message.The picture will stay on the screen for a certain amount of time before the slide will change to a new one.

Figure 1 .
Figure 1.The CWM Global Search user interface Quick Search-synonym search for maslinic acid.

Figure 2 .
Figure 2. The CWM Global Search user interface Global Search-Combined CAS registry number synonym, and structure search for maslinic acid.

Figure 3 .
Figure 3. Global Search Result Page of the combined CAS registry number synonym, and structure search for maslinic acid.

Figure 4 .
Chemicalize (www.chemicalize.org) is developed by ChemAxon.Chemicalize uses ChemAxon's name to structure parsing to identify chemical structures from web pages and other text sources.It provides a large variety of predicted data related to each structure [7].

Table 1 .
Cont.The Chemical Database will allow the user to retrieve information for any of 25,496 hazardous chemicals or 'generic' entries based on a keyword search.http://ull.chemistry.uakron.e du/erd Chemicalland21.comaims to be a resource of individual chemical information including technical data, safety data, and related compounds.http://chemicalland21.comThis database allows users to search the NLM ChemIDplus database of over 370,000 chemicals.http://chem.sis.nlm.nih.gov/chemidplus/ ChemSpider hosts the largest and most diverse online database of chemical structures sourced from over 150 different data sources.http://www.chemspider.com/ChemSynthesis is a database of compounds with their synthesis references and physical properties.http://www.chemsynthesis.com/ ClinicalTrials.gov is a registry of federally and privately supported clinical trials conducted in the United States and around the world.http://clinicaltrials.gov/ChEBI CiteXplore combines literature search with text mining tools for biology.http://www.ebi.ac.uk/citexpl ore Chemicals.CTD integrates a chemical subset of the Medical Subject Headings (MeSH®), the hierarchical vocabulary from the U.S. National Library of Medicine.http://ctd.mdibl.org/The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information.http://www.drugbank.ca/The Directory of Open Access Journals (DOAJ) lists open access journals, that is, scientific and scholarly journals that meet high quality standards by exercising peer review or editorial quality control.http://www.doaj.orgEnvirofacts contains chemical data from several different program system databases: the Aerometric Information Retrieval System, the Permit Compliance System, and the Toxics Release Inventory System.http://www.epa.http://www.epa.gov/envirofw/gov/enviro fw/ Find Suppliers and Information for over 8 million unique chemicals!http://www.emolecules.com/