FROM GENOTYPE TO PHENOTYPE FUTURE PERSPECTIVES ON DATA AND SERVICE INTEGRATION
TÓPICOS AVANÇADOS EM ENGENHARIA INFORMÁTICA BIOINFORMÁTICA Programa Doutoral em Engenharia Informática 20082009 Pedro Lopes | pedrolopes@ua.pt
TABLE OF CONTENTS Table of contents .....................................................................................................................................................2 Introduction – The GEN2PHEN Project.........................................................................................................3 Integration Scenarios and Related Work......................................................................................................6 Semantic Web ......................................................................................................................................................7 Social environments .........................................................................................................................................8 Integration ............................................................................................................................................................9 Summary............................................................................................................................................................. 10 Our Ongoing Developments ............................................................................................................................ 12 Dynamicflow ..................................................................................................................................................... 12 DiseaseCard ....................................................................................................................................................... 14 Summary............................................................................................................................................................. 15 Future Perspectives ............................................................................................................................................ 16 Cloud‐computing............................................................................................................................................. 16 Information Integration ............................................................................................................................... 17 Data Visualization ........................................................................................................................................... 18 Summary............................................................................................................................................................. 19 Conclusion............................................................................................................................................................... 20 References............................................................................................................................................................... 21
2
INTRODUCTION – THE GEN2PHEN PROJECT Bioinformatics is emerging as one of the more fastest‐growing scientific areas of computer science. Recent hardware and software developments show an evolution faster than the Moore’s Law predictions. This development has begun with the Human Genome Project 1 which has succeeded in decoding the complete human genetic code. This generated a tremendous amount of information that was readily available and the scientific community rapidly started designing applications, increasing the amount of resources needed in this area. Following the Human Genome Project came the Human Variome Project2, which aims to collect information about genome variations and their influence in human health. Along with the latter, European Community is also sponsoring a bioinformatics project in its Seventh Framework Program: Genotype to Phenotype Databases: a Holistic Solution (GEN2PHEN)3. The GEN2PHEN Project is a collaborative project with 19 partners. Most of the partners are from European institutions with relevant work in the bioinformatics scientific area. GEN2PHEN is an ambitious project aiming to unify human and model organisms genetic variation databases allowing the creation of a central genome browser with the ability to blend GEN2PHEN data and medical data. The overall goal is to create a complete biomedical knowledge environment. The strategy and objectives of this project may be divided in several research areas: •
Analyze the genotype to phenotype field and investigate current needs and practices in order to obtain a complete knowledge about other ongoing projects with similar objectives. The active biology community must be consulted in order develop an accurate state‐of‐the‐art document describing the general process on the field and enabling the most correct definition of what this particular area is lacking and what models and technologies are being effectively used.
1
Human Genome Project: http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
2
Human Variome Project: http://www.humanvariomeproject.org
3
GEN2PHEN: http://www.gen2phen.org
3
•
Develop standards for the genotype to phenotype field of research in order to speed up the standardization process with new data models, nomenclature and technology standards.
•
Create
generic
database
components,
services
and
integration
infrastructures for the genotype to phenotype domain. These solutions will be mostly web applications applying new interface usability standards and customized to their end users. Solutions for genetic and genomic databases will be developed. This particular objective is aiming to create a central GEN2PHEN database crossing all the research areas and a simpler application, which can be deployed by any research group. •
Create data search and presentation solutions for genotype to phenotype knowledge. Applications designed when fulfilling the previously mentioned objective won’t be complete without proper search mechanisms that must encompass information distributed throughout different applications and architecture layers. The applications must also have an effective interface layer designed to respect the community requests.
•
Facilitate research and diagnostic genotype to phenotype databases population by developing new tools and promoting them in the scientific community. The newly developed applications will also support more efficient methods for data insertion allowing anyone to collaborate in this project.
•
Build a major genotype to phenotype Internet portal, a GEN2PHEN knowledge centre. This portal will contain all GEN2PHEN related information, ranging from calendars to databases, from publications to discussion forums.
•
Deploy developed solutions to the community in order to increase researchers interest and participation. Several resources will be devoted to advertising, explaining and training researchers in using the developed solutions.
The project main focus is on developing and promoting a new generation of applications that will aid different types of researchers in their scientific work and, at the same time, gather and integrate information from different sources which will be shared to the community. GEN2PHEN applications have to be state‐of‐the‐art web applications. It is important to research and study the most popular Web2.0 (and next Web3.0)
4
applications in order to improve developers’ knowledge about what captivates the users, increasing general biomedical community interest. This research should be mainly focused on user interactions issues like usability, interfaces, “quality of service” and overall user satisfaction. This new wave of applications has to address issues like semantic data integration, user collaboration, information sharing and search engines’ algorithms improvements.
Fig. 1 GEN2PHEN strategy
Developing a simple Rich Internet Application is, by now, a somewhat trivial process, not requiring great software engineering and programming knowledge. However, bioinformatics and biomedicine don’t depend only on good‐looking interfaces. What matters, and this is the difficult part, is what’s under the hood. Going deeper in the application composition, several issues like data integration, service integration, service orchestration, workflow composition, distributed processing, query expansion or object ontologies arise. This report intends to give a GEN2PHEN project overview with special incidence in these next‐generation web applications problems. Some solutions with ongoing development will be referred as well as systems in development in our workgroup and how can both help assessing GEN2PHEN application design.
5
INTEGRATION SCENARIOS AND RELATED WORK
First of all is necessary to understand to whom these new application paradigms will be important and why these generic GEN2PHEN goals are so significant. The biological and biomedical scientific community is watching an exponential increase on the information available. This growth leads, subsequently, to the growth of the number of applications (web or desktop) to solve the same specific problems. And along with these new applications, come new data sources, new services and the heterogeneity among them is huge. The main issue one main found when doing scientific research is where to find information. A few years ago this was a problem because of the lack of applications and databases. Now, this is a problem because of the excessive amount of information available on every corner of the web.
Fig. 2 Web2.0 integration
From the users perspective, we believe they are looking for a central, unifying portal, customized to their personal status, where they can easily find all the information they need. This is the added value GEN2PHEN solutions may have. Currently, there are
6
innumerous ongoing works focusing on this problem. However, there isn’t a universal solution to solve all the heterogeneity problems arose by data and service integration. And the problems don’t boil down to this; there are also the novel functionalities possible with the semantic web [1] and the grand developments made in information mining. Following Goble and Stevens [2] work, one can conclude that not all is well in the kingdom of data integration in bioinformatics and that data integration has a long path to run in order to completely satisfy the initials goals. The group of applications that should be studied may be divided in three main areas that are largely connected and potentiate integration. There are developments in semantic web and its application in biology and how the bridge between generic ontologies and biological ones can be made. Other groups are working in collaboration tools for the community, which have better information sharing and productivity tools. The largest group is the integration one. In this group one can encompass data integration, service integration, service orchestration, workflow composition and mashup applications. SEMANTIC WEB Semantic web developments have the main purpose of describing, with a pre‐ defined ontology, all the information existent in the web. Semantic web key components are RDF4, OWL5 and SPARQL6. RDF stands for Resource Description Framework and is a generic metadata model for online information and content description. OWL is the Web Ontology Language, which is the ontology‐authoring tool usually associated with the RDF schema. SPARQL is a recursive acronym for SPARQL Protocol and RDF Query Language and is a query language, based on SQL, to obtain information stored in the RDF format. Implementing semantic web architectures is not a trivial task [3] for any kind of data. However, it is important to introduce these metadata structures and algorithms in bioinformatics, as they will become part of Web3.0. Applying semantic web concepts and technologies in bioinformatics one can access, in a unified manner, several biological documents described with RDF. Automation of processes and improved machine‐machine data exchange are also enabled with the application of these concepts. Belleau et al. propose Bio2RDF [4], a preliminary approach 4
Resource Description Framework: http://www.w3.org/RDF
5
Web Ontology Language: http://www.w3.org/2004/OWL
6
SPARQL Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query
7
to create an engine which provides RDF access to biological data distributed through several databases such as KEGG or NCBI. Bio2RDF7 makes all the data available in their website using only the URL to locate the resources. Splendiani [5] also as a proposal to bring the semantic web to biology, but the implementation isn’t as advanced as Bio2RDF. These are the most recent implementations but biology and medicine are very difficult scientific areas due to the complexity in defining a proper ontology that covers all the life sciences concepts and terms. SOCIAL ENVIRONMENTS Social networks and collaboration environments are some of the most popular Web2.0 applications. These applications connect users and allow them to share personal information, music, videos or any other type of data. Additionally, several small applications are developed to integrate information about different users or entertainment areas. For instance, a movies application would allow every user to describe his personal movie tastes; when used in a large scale environment, it would give the developers important information about cinema which could be used to improve advertisements shown to the user: a user who likes horror movies would have a greater probability of seeing horror movie ads than one who likes comedies. Facebook8 is one of the largest worldwide used social web applications with over 120 million users. Using the personal connections, personal preferences and other specific applications, Facebook owners have valuable market information. Like Facebook, MySpace9 or Google’s Orkut10 provide almost the same functionalities to users. Experiencing a sustained growth is Carole Anne et al. [6] myExperiment11 which is the first bioinformatics social network application where one can connect with others, share files (with focus on Taverna workflows, detailed more ahead in this report) and create scientific communities. Despite the focus on Taverna, myExperiment provides a rich scientific ecosystem offering the community a wide range of tools essential in any social collaborative environment. myExperiment also offers access to its services using RESTful programming interfaces, 7
Bio2RDF: http://www.bio2rdf.org
8
Facebook: http://www.facebook.com
9
MySpace: http://www.myspace.com
10
Orkut: http://www.orkut.com
11
myExperiment: http://www.myexperiment.org
8
thus, it is possible to build new applications on the framework or use myExperiment data and tools to improve existing ones. INTEGRATION Integration in bioinformatics is one of the areas where more groups are interested and with more ongoing work. Integration is a research area which includes the mentioned semantic web and social networking tools besides other fields such as mashups or workflows. A workflow is a simple sequence of logic steps or activities that are executed independently from each other [7]. Applying this generic concept to bioinformatics, one may assume that a workflow is an organized information flow, connecting distinct services and/or data sources in order to solve a problem in a modularly manner. The most used solution for workflow building and execution is Taverna [8, 9]. Taverna is a Java based desktop application offering a simple interface for workflow composition and execution. It can access several types of services such as BioMoby [10] or generic WSDL web services. The major setback is that to integrate services, one must define an integration XML component to assist information piping from service A output to service B input. Taverna can also be used from within other applications, allowing access to the results of previously saved workflows or executing workflows in real time. One of myExperiment functionalities is workflow sharing, one may access a large workflow storage system and find solutions developed by others or share one’s workflow and important development information. Currently, Taverna’s greatest flaw is being desktop based as we’re assisting a shift in the computational paradigm: web applications usage dominating over desktop ones. Alongside with workflows there are mashups. Mashups begun in the music industry: they were simple mixes of several songs into a single song. With Web2.0, this idea crossed to web applications. Mashups are web applications which combine information from a predefined collection of data sources or services in a single interface. We can consider a mashup as being a meta application: it basically creates a new application by using functionalities provided by other applications. Online, there are several workflow/mashup building frameworks. It is important to mention Yahoo! Pipes12 and Microsoft Popfly13 because they have remarkable interfaces and pre‐built components to access World Wide 12
Yahoo! Pipes: http://pipes.yahoo.com/pipes
13
Microsoft Popfly: http://www.popfly.com
9
Web most popular websites. Bioinformaticians can use these tools with data from different data sources to develop new applications. Cheung et al. [11] pursued this approach to create a biomedical mashup application. Despite this, the mentioned tools weren’t specifically designed to be used in the life sciences area. Therefore, several researchers are working on service integration frameworks: de Knikker et al. [12] have a basic web service choreography scenario; Bio‐jETI from Margaria et al. [13] is a similar solution, using the same principles as de Knikker. These tools share a common problem in integration: the information sources heterogeneity doesn’t allow a fully automated integration solution. Each service stores and offers the data in its own model, increasing the difficulty in concept mapping and information exchange. There isn’t yet an automated tool which offers a simple integration interface, allowing the use of components from any random service. BioMoby [10] is an initiative to create an ontology and central repository of bioinformatic resources. With this semantic framework, one can share or use online services created by others in an almost automated fashion [14]. BioMoby 14 central repository faces typical resource discovery problems such as validation or duplication. Anyone can add services and the description provided or service functionality may not be scientifically valid and induce errors to users. Duplication of services is also a problem: there can be any number of services doing the same task, thus it is difficult to choose which ones fits better in the desired requirements.
Fig. 3 – Existing developments categories
SUMMARY Fully automated and dynamic integration is the panacea that developers haven’t yet reached. Workflow or mashup solutions are the most popular to integrate services and 14
BioMoby: http://www.biomoby.org
10
data sources. However, both of them imply hard coding several functionalities, increasing dependency on developers to add new functionalities. Applying a semantic web approach to bioinformatics will empower developers to create more independent applications. Describing services and information semantically will allow automated communication between heterogeneous applications. This will enhance existing workflow and mashup applications: it will be easier for users to add new services to existing applications, becoming developers of new meta applications adjusted to their needs.
11
OUR ONGOING DEVELOPMENTS
Our bioinformatics group is, like others, developing software solutions to solve problems associated with this specific area. The developed work didn’t focus on integration or semantic web. Our work was mostly focused on aiding microarray laboratory research. ANACONDA [15] is a tool to study gene primary structure. The Microarray Information Database – MIND [16] ‐ is a web application which helps researchers in the task of analyzing microarray experiment results. More abstract than MIND is GeneBrowser [17], a tool for gene expression studies from microarray gene lists results. However, the web trends and the association with projects like GEN2PHEN or ALERT15 brought the necessity to expand our group’s application range. DynamicFlow [18, 19] is a web‐based workflow management application, providing Web2.0 semi‐ autonomous service integration. DiseaseCard [20] is an older application, however it already implements basic collaboration and integration functionalities which later became famous with Web2.0. Further developments are being studied to implement semantic web engines, mashup applications and novel information visualization techniques. DYNAMICFLOW DynamicFlow is a framework for dynamic integration of heterogeneous information sources. The main goal when developing this framework was to create a novel and agile interface for service integration. The application should have a usable, easy and intuitive interface for solving problems using a “divide and conquer” strategy: the main problem is divided in smaller tasks that can be solved with a certain web service; the tasks are then combined, using the workflow metaphor, creating an information flow from task to task, until we get the final solution. This modular approach could be useful for researchers because it is more similar to the plan they have when solving problems in the wet lab: structuring the problem and then solving it iteratively, using simple tasks in a web application running in their browser. 15
ALERT Project: http://www.alert-project.org
12
Fig. 4 DynamicFlow framework model
One of DynamicFlow’s key elements is its innovative model. The three‐layered model ‐ Fig. 4 – divides the application in access: the bottom layer, containing the databases and the external services; design, the top layer where the user interactions like workflow building occur, using AJAX technology and drag‐‘n‐drop metaphors; core, the processing layer which encompasses server‐side processing on the application’s web server and client‐side processing in the client’s browser. This is one of the framework’s main features, the division of the processing layer in two separate components. The web server processes client requests and connects to the authentication server and the framework’s DBMS but service–application communication and data piping between tasks are client‐side processed, reducing server charger and speeding up the application execution with an increase in efficiency and response time. This semi‐autonomous process of maintaining a valid information flow from one service to the next is possible due to the service definition standard that was previously defined. The standard follows a simple ontology and provides an easy way for editing the available services. Using it, the application can validate workflow consistency, execute the workflow and display intermediate results all using the browser’s resources. It’s a primitive version of semantics in an information integration application. The work conducted resulted in a web application prototype available for testing and open to new developments. These new developments will be on five main topics: perfecting the service definition standard, inclusion of semantic web technologies (RDF), interface improvements, new user interaction and widening the service range.
13
DISEASECARD DiseaseCard 16 project has begun in 2003 with the objective of creating a rare disease link aggregator, integrating information from distributed and heterogeneous medical and genomic databases. The links were gathered by a web crawling engine and grouped into nodes representing concepts ‐ Fig. 5. For instance, for the Peters anomaly17 disease, the node References contains all the reference sections of the NCBI OMIM18 database that refer to this disease and the node Pathology contains Orphanet 19 information about this disease. Along with the external information, each disease also has a forum entry, where any registered user can share his personal experience. A tree – similar to Windows Explorer one – shows all the nodes and their collection of links, displaying, in a unified interface, information from the genotype to the phenotype. As we want to gather as much information as possible, rare diseases are the main target due to their high association between genotype and phenotype. It is important to mention that no database information is replicated: DiseaseCard only saves link information of shared data. Modern concepts like integration – heterogeneous link gathering – and collaboration – public disease forums – where already considered when developing the system.
16
DiseaseCard: http://www.diseasecard.org
17
Peters Anomaly disease card: http://diseasecard.org/evaluateCard.do?diseaseid=604229
18
OMIM Home: http://www.ncbi.nlm.nih.gov/omim
19
Orphanet: http://www.orpha.net/consor/cgi-bin/index.php
14
Fig. 5 DiseaseCard concept map
As the application got older, it lost quality: the web crawling engine doesn’t automatically adapt to link changes and so, for several concepts, the resulting nodes were empty. In a preliminary analysis of GEN2PHEN goals and how they can be achieved, we concluded that DiseaseCard was the most adequate solution and should be under development again. After a careful analysis and the definition of an action plan, its operability was restored, the crawler was corrected, the interface got a new look and DiseaseCard is back on track. As far as GEN2PHEN is concerned, DiseaseCard will be a simple way to achieve some of the initially proposed goals. In the future, adding GEN2PHEN related databases and web portals is a priority to complete the application. The inclusion of semantics in DiseaseCard and in the portals it crawls will ease the crawling process and improve the obtained results precision. Information miming features are also being researched: even if it only stores links, DiseaseCard contains valuable information in those links which can be useful in new types of queries. SUMMARY Both DynamicFlow and DiseaseCard are ongoing projects that will be developed within the GEN2PHEN perspective. The next section details new functionalities, interfaces and user interactions that can be implemented in either of these applications in order to improve their quality.
15
FUTURE PERSPECTIVES
Web2.0 changed Internet forever. Developers don’t just care about what the application does anymore but also what the users want it to do. Users are now the most important part of the Internet. They produce content, they have their own web footprint, and they are part of a new online community. If Web2.0 is the social web, Web3.0 may be the intelligent web. Despite being science fiction, Web3.0 is nearer one may think. Different platforms can communicate with each other automatically; “cloud‐computing” is taking over the web; web is getting intelligent with new semantics; distributed applications are being integrated. These facts, which were mere dreams a few years ago, are empowering the Internet with new solutions and establishing it as the platform for everything: productivity, entertainment, research, leisure… CLOUD‐COMPUTING New computing paradigms are changing the Internet at the architecture level. GRID [21] architectures are the new solution for distributed computing. Virtualization improvements [22] make virtual machines almost as powerful as real ones. “Cloud‐ computing” [23] uses the best of both to offer an online development environment. Microsoft with the Azure Services Platform20, Amazon with the Elastic Compute Cloud21 or Google with its App Engine22 offer access to virtual machines where anyone can deploy applications which will use distributed resources to guarantee real‐time scalability, flexibility and availability. Following the same paradigm trend, new web applications and web applications suites are replacing traditional desktop apps. For instance, Microsoft’s Live23 suite offers almost all the Office suite tools online and Google24 also has the essential productivity tools online, in the “cloud”. 20
Azure Services Platform: http://www.microsoft.com/azure/default.mspx
21
Amazon Elastic Compute Cloud: http://aws.amazon.com/ec2
22
Google App Engine: http://code.google.com/appengine
23
Microsoft Live: http://www.live.com
24
Google Apps: http://www.google.com/apps
16
INFORMATION INTEGRATION Considering information integration tools one can explore mashups and web desktops. Popular mashup applications are personal and customizable web portals, made with gadgets that access almost any web application. Netvibes25 is definitely the most complete personal portal in the Web. However, the most famous is Google’s iGoogle26. Both offer, in a simple interface, the ability to customize a page with any gadgets we want. Available gadgets include e‐mail access, calendars, to‐do lists, newsreaders and almost any interesting tool to include in a single portal.
Fig. 6 iGoogle gadget interface stub
Web desktops are web applications that simulate the traditional desktop environment: there’s wallpaper, icons to access applications, trash bin, task bar and menus for applications. eyeOS27 is a cloud computing operating system allowing any user to work online in a vast set of applications. Besides this, it is also an open source development platform: users can create their applications and install them on their web desktop.
25
Netvibes: http://www.netvibes.com
26
iGoogle: http://www.google.com/ig
27
eyeOS: http://eyeos.org
17
DATA VISUALIZATION Other interesting area is data visualization. Traditionally, search results are listed with a simple description. However, new search engines like Viewzi28 or Searchme29 offer results in different interfaces. The results are presented in a much more visually appealing interface. Screenshots are taken from the pages and show in grids or lists. Results are ordered by date to form a chronological sequence. Information is gathered from distinct search engines in order to better rank the results. Context relations are established among results to create a visual relational tree. The distinct visualizations of the same results are important as they can offer distinctive insights on the same data. Aiming an improved user interaction and greater usage satisfaction, these tools rely on AJAX, Flash or Silverlight to create captivating and usable interfaces.
Fig. 7 Viewzi result grid for gen2phen search
28
Viewzi: http://www.viewzi.com
29
Searchme: http://www.searchme.com
18
SUMMARY All the presented applications and interfaces are new solutions that are being considered in several thematic fields. They represent the first step to the next generation of web applications and open the door to a new level of user interaction. This new wave of web applications will have repercussions on bioinformatics. New applications like iBioinformatics and BioDesktop or new result visualization tools could leave their mark in the bioinformatics world. From the iGoogle and Netvibes example one could develop a similar portal, integrating gadgets and applications in a single interface. iBioinformatics or BioVibes would represent a leap forward in integration and personalization. If one could create a large range of services in the gadget repository, any research could customize the application according to his needs, thus, creating his own personal meta application. BioDesktop or BiOS could be an EyeOS based bioinformatics and biomedical web desktop. Following the desktop metaphor, one could create a web desktop implementation containing applications and tools useful for researchers. Any user could then have his own personal desktop online, customized according to his own needs and taste. Integration plays a large role in the future of bioinformatics, but data visualization is also important. Web screenshots are useful to show a preview of the page we’re searching. This idea could be applied to bioinformatics search results, showing pathway previews or protein structure previews. Arranging the results in grids or lists and using technologies like AJAX, Flash or Silverlight to create new interfaces one could develop interesting and useful applications.
19
CONCLUSION Bioinformatics applications are evolving. Evolution isn’t a simple process and choosing the right path isn’t a trivial task. This evolution process is usually sustained by large projects like the Human Genome Project a few years ago or the European GEN2PHEN project now. As bioinformatics is evolving, so are other software applications. The trend is to move the software to the web and to make it available, freely, to the entire world. This process may be complex, but in the end, the positive aspects rule over the tradeoffs that have to be made. For bioinformatics, continuing this ride along with state‐of‐the‐art web technologies is a tremendous task. The life sciences area is definitely one of the areas where the amount of data is larger, and where the differences between applications and services are more noticeable. This leads to an enormous complexity in integration heterogeneous information sources. Despite these facts, several groups are working to solve integration problems and they have several approaches. Semantic web concepts for better machine‐machine exchanges or “proprietary” integration frameworks using hard‐coded concept mapping are solutions currently under development. However, there isn’t any heavenly solution for these problems. Fully automatic and dynamic information integration hasn’t yet been achieved and is still science fiction. Hopefully, using the presented perspectives and using more concepts from success cases in other areas like entertainment or CRM, will enhance current bioinformatics web applications and empower developers with tools to design new ones.
20
REFERENCES 1.
Berners‐Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Sci Am 284 (2001) 34 ‐
43 2.
Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics.
Journal of Biomedical Informatics 41 (2008) 687‐693 3.
Fielding, R.: Semantic Web Services Challenge: Architectural Styles and the Design
of Network‐based Software Architectures. Semantic Web Services Challenge: Challenge on Automating Web Services Mediation, Choreography and Discovery: 2006; Stanford University, USA (2000) 4.
Belleau, F., Nolin, M.‐A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: Towards a
mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41 (2008) 706‐716 5.
Splendiani, A.: RDFScape: Semantic Web meets Systems Biology. BMC
Bioinformatics 9 (2008) S6 6.
Carole Anne, G., David Charles De, R.: myExperiment: social networking for
workflow‐using e‐scientists. Proceedings of the 2nd workshop on Workflows in support of large‐scale science. ACM, Monterey, California, USA (2007) 7.
Cardoso, J., Sheth, A.: Semantic E‐Workflow Composition. Journal of Intelligent
Information Systems (2003) 8.
Ludascher, B., Altintas, I., Berkley, C., Higgings, D., Jaeger, E., Jones, M., Lee, E.A.,
Tao, J., Zhao, Y.: Taverna: Scientific Workflow Management and the Kepler System. Research Articles, Concurrency and Computation: Practice & Experience 18 (2006) 1039 ‐ 1065 9.
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T.,
Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20 (2004) 3045 ‐ 3054 10.
Wilkinson, M., Links, M.: BioMoby: An open source biological web services
proposal. Brief Bioinform 3 (2002) 331 ‐ 341 11.
Cheung, K.‐H., Yip, K.Y., Townsend, J.P., Scotch, M.: HCLS 2.0/3.0: Health care and
life sciences data mashup using Web 2.0/3.0. Journal of Biomedical Informatics 41 (2008) 694‐705
21
12.
de Knikker, R., Guo, Y., Li, J.‐l., Kwan, A., Yip, K., Cheung, D., Cheung, K.‐H.: A web
services choreography scenario for interoperating bioinformatics applications. BMC Bioinformatics 5 (2004) 25 13.
Margaria, T., Kubczak, C., Steffen, B.: Bio‐jETI: a service integration, design, and
provisioning platform for orchestrated bioinformatics processes. BMC Bioinformatics 9 (2008) S12 14.
DiBernardo, M., Pottinger, R., Wilkinson, M.: Semi‐automatic web service
composition for the life sciences using the BioMoby semantic web framework. Journal of Biomedical Informatics 41 (2008) 837‐847 15.
Pinheiro, M., Afreixo, V., Moura, G., Freitas, A., Santos, M.A.S., Oliveira, J.L.:
Statistical, computational and visualization methodologies to unveil gene primary structure features. Vol. vol. 45, n.¬∫ 2 (2006) p. 163 ‐ 168 16.
Joel, A., Laura, C., Manuel, A.S.S., José Luis, O.: Collaborative work on microarrays
using MAGE‐ML. MGED 9: The meeting of the Microarray Gene Expression Data Society 17.
Arrais, J., Santos, B., Fernandes, J., Carreto, L., Santos, M., A. S., Oliveira, J.L.:
GeneBrowser: an approach for integration and functional classification of genomic data. Vol. vol. 4, n.º 3 (2007) 18.
Lopes, P.: Service Integration for Knowledge Extraction. Electronics,
Telecommunications and Informatics Department, Vol. Master of Science. University of Aveiro, Aveiro (2008) 19.
Lopes, P., Arrais, J., Oliveira, J.L.: Dynamic Service Integration using Web‐based
Workflows. In: Society, A.C. (ed.): 10th International Conference on Information Integration and Web Applications & Services. Association for Computer Machinery, Linz, Austria (2008) 622‐625 20.
Oliveira, J.L., Dias, G.M.S., Oliveira, I.F.C., Rocha, P.D.N.S.d., Hermosilla , I., Vicente, J.,
Spiteri, I., Martin‐Sánchez, F., Pereira , A.M.M.d.S.: DISEASECARD: A Web‐based Tool for the Collaborative Integration of Genetic and Medical Information. 5th International Symposium, ISBMDA 2004: Biological and Medical Data Analysis (2004) 409‐417 21.
Nadeem, F., Yousaf, M.M., Ali, M.: Grid Performance Prediction: Requirements,
Framework, and Models. Emerging Technologies, 2006. ICET '06. International Conference on (2006) 695‐702 22.
Chen, W., Lu, H., Shen, L., Wang, Z., Xiao, N., Chen, D.: A Novel Hardware Assisted
Full Virtualization Technique. Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for (2008) 1292‐1297
22
23.
Vouk, M.A.: Cloud computing ‐ Issues, research and implementations. Information
Technology Interfaces, 2008. ITI 2008. 30th International Conference on (2008) 31‐40
23