Cambridge Healthtech Media Group
www.bio-itworld.com
Indispensable Technologies Driving Discover y, Development, and Clinical Trials
NOVEMBER | DECEMBER 2010 • VOL. 9, NO. 6
Moving Data on the
FASP
Track Page 34
TOXICOGENOMICS TURNAROUND 24 DRUG DISCOVERY COLLABORATIVE 30 CLINICAL IMAGING EDC AT NOVARTIS 14 Michelle Munson, co-founder and CEO of Aspera
B:8.25” T:8” S:7”
STEM CELLS.
WE DISCOVERED IT’S TIME TO DISCOVER US.
Ontario is home to one scientific breakthrough after
another. From 1963, when James Edgar Till and Ernest Armstrong McCulloch discovered stem cells, to just last year when Dr. Andras Nagy and his team developed a safer way to generate them. With Ontario’s 16% cost advantage over the United States, plus tax credits that can reduce $100 spent on R&D to less than $37, isn’t it time you made a discovery of your own? Ontario. THE WORLD WORKS HERE.
investinontario.com /research
10/28/2010 10:43:33 AM
T:10.5”
11-10_BITW_Ad_Pages.indd 2
B:10.75”
S:9.75”
Paid for by the Government of Ontario.
You Have Freedom of Choice with Cmed Technology’s
Unified eClinical Platform
Work in any Location
With any Appliance
Efficiently manage any type of data, for any protocol, anywhere.
•
•
Electronic Trial Design Electronic Data Capture Monitoring Medical Coding Data Management Real-Time Reporting — All Clinical Data • In One Distributed System Accessible to All Worldwide At the Same Point in Time
•
•
•
Accessible to All Worldwide • At the Same Point in Time www.cmedtechnology.com/t5-preview | info@cmedtechnology.com
Contents
[11–12•10]
Computational Biology
18 A Community Experiment for the Genome Commons
Steve Brenner is addressing the challenges of shared genomic data.
20 A Personal View of Personal Genomics
7
34
Commentary Mike Cariaso discusses personal genomics for the rest of us.
Computational Development
34 Changing the Game of Collaborative Drug Discovery
Feature
New toxicogenomics tools accelerate early-stage safety assessment in drug development. 28
IT / Workflow
A Turnaround for Toxicogenomics?
Barry Bunin’s Silicon Valley company fosters information release between pharmas to spur research on neglected diseases.
38 Aspera’s fasp Track for High-Speed Data Delivery Up Front
7 The Data Bonanza at Bio-IT World Europe
At the second European event, attention turned to genomes, storage, and the cloud.
8 Sequence, Drugs and Rock n’ Roll
Ozzy Osbourne’s genes reveal alcohol tolerance, caffeine intolerance, equal parts worrier and warrior.
10 GSK Announces Singapore Collaborations...
Four academic groups get GSK drug discovery partnerships.
11 ...As Lilly Shutters Singapore Discovery Center
On the edge of a patent cliff, Lilly trims entire research center in Asia.
11 Millisecond Modeling
David E. Shaw’s supercomputer, Anton, models protein folding.
12 Data Hide and Seek vs. Safety Assessment
[4 ]
The Bush Doctrine Now is the time to finally decide how to handle tox data. BIO•IT WORLD
NOVEMBER | DECEMBER 2010
13 Epigenetics and the Role of Phenotypic Changes to DNA
Insights | Outlook DNA methylation can help guide drug discovery.
10 Briefs
Clinical Trials
14 Most Trials Now Eligible for Design Simulation
New simulation tool expands the range of eligible trials.
15 Taming the Beast
Going paperless speeds study review and gives clients real time access.
16 Novartis Gains Control of Clinical Imaging Data
ImagEDC is a game changer for clinical trial management.
41 Don’t Neglect Your Processes
High Performance Computing Begins on page 21
www.bio-itworld.com
Commentary Process excellence can lead to gains in drug discovery.
42 Cloud Computing Provides Booster Shot
Commentary Not all health-IT data is fit for the cloud, but a lot is.
In Every Issue 5 The Thousand Genomes Project
First Base New genetic insights, and the 2011 Best Practices Awards call for entries. BY KEVIN DAVIES
46 Ready for the GPGPU Revolution?
SPECIAL ADVERTISING SECTION
Data transfer protocol facilitating global data access and collaboration.
The Russell Transcript The life sciences were well represented the GPU Technology Conference. BY JOHN RUSSELL
6 Company Index 43 Advertiser Index 43 New Products 44 Educational Opportunities
Cover photo by Leah Fasten
First Base
A Thousand Genomes KEVIN DAVIES
T
here was big news in the world of genome research in October, and no, we’re not talking about the Ozzy Osbourne genome, first details of which were reported in a British Sunday newspaper. It turns out that researchers are also pretty intoxicated about the first published sets of data from the international 1000 Genomes Project, which has far-reaching implications for cataloguing and understanding the range of genomic variations associated with human disease, behavior, and evolution. The results of the pilot phase of the 1000 Genomes Project were reported in Nature and Science. Researchers now have the ability to undertake a survey of genome variation that could barely have been imagined when President Clinton declared victory in the Human Genome Project a decade ago. The 1000 Genomes pilot consisted of lowcoverage sequencing of 179 individuals, deep sequencing of two families (trios) and exome sequencing in nearly 700 individuals. One of the major conclusions from the pilot is that half of the single nucleotide polymorphisms (SNPs) catalogued—the total inventory currently exceeds 15 million—have never been seen before. Moreover, 1 million short insertions and deletions (indels) and 20,000 structural or copy number variants (CNVs) were also described. Remarkably, for any individual person, more than 95% of the 3 million SNPs in their genome will already sit in this catalogue. Contrast that to just 5% ten years ago, or about 40% five years ago. Indeed, the Broad Institute’s Calling All Entries for Best Practices 2011 As announced elsewhere in this issue, we have issued our Call for Entries for the Bio•IT World 2011 Best Practices Awards. On the evening of April 13, 2011, during the Bio-IT World Conference & Expo, we will announce and hand out some handsome hardware to select organizations for what a blueribbon jury considers to be the most outstanding and innovative examples of technology deployment and sharing across research, development and clinical trials. This year’s awards attracted some 75 entries, and we hope to maintain the trend of exceeding that tally next year. With more and more signs of pre-competitive approaches perme-
David Altshuler suggests that when the project is complete, fully 98-99% SNPs uncovered in any newly sequenced personal genome will have been observed previously. Another eye-catching result is that each individual carries deleterious or loss-of-function mutations in 250-300 genes. Thankfully, we usually have a spare functional copy as backup. The family studies suggest a de novo germline mutation rate of 10-8/base pair/generation, essentially confirming the conclusions of the Complete Genomics Miller Syndrome study earlier this year, and calculations made by J.B.S. Haldane in the 1930s. Writing in Science, Evan Eichler and colleagues have catalogued larger swathes of genomic variation, the CNVs. Eichler’s team remapped data using homegrown computational methods looking at NGS read depth and unique sequence tags to distinguish copies. “We think the veil has been lifted on a whole new level of genetic diversity,” says Eichler. Among the findings, most variable genes map to duplicated regions of the genome like accordions of the genome. There is new insight into the extent of the CNV between different human populations. And comparison with the great apes “identify gene families that have expanded in the human lineage since we separated from chimpanzees,” says Eichler. The success of the 1000 Genomes Project increasingly puts the spotlight on the “rare” variants—the 1-2% SNPs that are unique to that individual, that might have arisen spontaneously or very recently in their family history. “Those will never be contained in a catalogue,” says Altshuler, but together with environment, behavior and pure chance, hold the key to understanding human diseases. The 1000 Genomes Project provides a tremendous “foundational tool” for future study. We can all drink to that. ating across the industry (see, for example, pages 34 and 16) there is ever more reason to believe that the solutions we look forward to showcasing next year will not be private solutions but ideas and technologies that others can learn from, and possibly even use. Best Practices is about recognizing partnership and realworld uses of technology, not just the virtues of a new piece of hardware or software. We encourage vendors to nominate groups and organizations that they have successfully collaborated with. Full details on the Awards criteria, categories, and very straightforward entry process are here: www.bio-itworld.com/bestpractices. Good luck!
www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[5]
Company Index
®
Indispensable Technologies Driving Discovery, Development, and Clinical Trials
23andMe. . . . . . . . . . . . . . . 20 AccelerEyes. . . . . . . . . . . . . . 43 Accelrys. . . . . . . . . . . . . . . . 43 Affymetrix. . . . . . . . . . . . . . . 29 Allen Institute for Brain Science. . . . . . . . . . . 10 Amazon Web Services. . . . . . 40 AMD. . . . . . . . . . . . . . . . . . . 46 Aspera. . . . . . . . . . . . . . . . . 38 A*STAR-Singapore Institute for Clinical Sciences. . . . . . 10 AstraZeneca. . . . . . . . . . . . . 28 ATI . . . . . . . . . . . . . . . . . . . . 46 Berry Consultants. . . . . . . . . 14 BGI. . . . . . . . . . . . . . . . . . . . 39 BioTeam. . . . . . . . . . . . . . . . . 7 BlueArc. . . . . . . . . . . . . . . . . 40 Broad Institute . . . . . . . . 18, 39 Cambridge Healthtech Associates. . . . . . . . . . . . . 12 Celgene . . . . . . . . . . . . . . . . 13 Cofactor Genomics. . . . . . . . . 8 Collaborative Drug Discovery. . . . . . . . . . . . . . 34 Columbia . . . . . . . . . . . . . . . 34 Copernicus Group. . . . . . . . . 15 Cornell. . . . . . . . . . . . . . . . . 34 Critical Path Institute. . . . 33, 41 DE Shaw Research. . . . . . . . 11 DiscoTox. . . . . . . . . . . . . . . . 28 Duke Clinical Research Unit. . . . . . . . . . . . . . . . . . 15 Duke-NUS Graduate Medical School . . . . . . . . . 10 Eli Lilly . . . . . . . . . . . . . . . . . 11 EMBL. . . . . . . . . . . . . . . . . . . 7 EMC. . . . . . . . . . . . . . . . . . . 40 Erasmus Medical Center. . . . . 7 European Bioinformatics Institute. . . . . . . . . . . . 18, 39 GE Healthcare. . . . . . . . . . . . 29
Genedata. . . . . . . . . . . . . . . 29 GeneGo . . . . . . . . . . . . . . . . 29 Gene Logic. . . . . . . . . . . . . . 29 Genentech . . . . . . . . . . . . . . 18 Genome Commons. . . . . . . . 18 Genome Institute of Singapore . . . . . . . . . . . . . 18 GlaxoSmithKline. . . . 10, 28, 34 Harvard. . . . . . . . . . . . . . . . . 34 HP . . . . . . . . . . . . . . . . . . . . 40 Iconix Biosciences. . . . . . . . . 29 IDC. . . . . . . . . . . . . . . . . . . . 42 IDS Scheer. . . . . . . . . . . . . . 41 Illumina . . . . . . . . . . . 8, 20, 32 Ingenuity. . . . . . . . . . . . . 29, 32 Innovative Medicines Initiative. . . . . . . . . . . . 32, 41 Insight Pharma Reports. . . . . 13 Isilon. . . . . . . . . . . . . . . . . . 40 Johns Hopkins. . . . . . . . . . . . 34 Johnson & Johnson. . . . . . . . 28 Knome. . . . . . . . . . . . . . . . . . 8 Life Technologies. . . . . . . . . . . 8 Memorial-Sloan Kettering. . . 39 Merck. . . . . . . . . . . . . . . 13, 33 Merck Serono. . . . . . . . . . . . 28 MolSoft. . . . . . . . . . . . . . . . . . 7 National Center for Biotechnology Information . . . . . . . . . 18, 39 National Center for Computational Toxicology . . . . . . . . . . . . . 29 National Institute of Environmental Health Sciences. . . . . . . . . . . . . . 29 National University of Singapore . . . . . . . . . . . . . 10 Navigenics . . . . . . . . . . . . . . 20 NetApp. . . . . . . . . . . . . . . . . 40 Novartis . . . . . . . . . . 16, 33, 34
Novartis Institutes for Biomedical Research. . . . . . . . . . . 16, 32 NovoCraft. . . . . . . . . . . . . . . . 8 Nvidia. . . . . . . . . . . . . . . . . . 46 Oracle. . . . . . . . . . . . . . . . . . 33 Oxford. . . . . . . . . . . . . . . . . . . 7 Panasas. . . . . . . . . . . . . . . . 40 Penguin Computing. . . . . . . . . 9 Pfizer. . . . . . . . . . . . . . . 33, 34 Predictive Safety Consortium . . . . . . . . . . . . 32 Salk Institute for Biological Studies . . . . . . . . . . . . . . . 33 SciFlies.org. . . . . . . . . . . . . . 10 Singapore Eye Research Institute. . . . . . . . . . . . . . . 10 Singapore Institute for Clinical Sciences . . . . . . . . . . . . . . 10 Sitrof Technologies . . . . . . . . 15 SNPedia.com . . . . . . . . . . . . 20 Strand Scientific Intelligence . . . . . . . . . . . . 43 Symplified. . . . . . . . . . . . . . . 42 Tessella. . . . . . . . . . . . . . . . . 14 The Genome Center at Washington University . . . . . 8 Trianz Solutions. . . . . . . . . . . 43 UCLA . . . . . . . . . . . . . . . . . . 34 UCSF . . . . . . . . . . . . . . . . . . 34 University of Berkeley . . . . . . 18 University of Illinois, Urbana-Champaign. . . . . . 46 University of Leiden. . . . . . . . . 7 University of Maryland. . . . . . 39 University of North Carolina. . . . . . . . . . . . . . . 36 Uppmax. . . . . . . . . . . . . . . . . 7 VeryCloud. . . . . . . . . . . . . . . 40 Wellcome Trust Sanger Institute. . . . . . . . . . . . . . . . 7
EDITOR-IN-CHIEF
Kevin Davies (781) 972-1341 kevin_davies@bio-itworld.com MANAGING EDITOR
Allison Proffitt (617) 233-8280 aproffitt@healthtech.com ART DIRECTOR
Mark Gabrenya (781) 972-1349 mark_gabrenya@bio-itworld.com VP BUSINESS DEVELOPMENT
Angela Parsons (781) 972-5467 aparsons@healthtech.com VP SALES — LEAD GENERATION PROGRAMS
Alan El Faye (213) 300-3886 alan_elfaye@bio-itworld.com ACCOUNT MANAGER — ACCOUNTS A–M
John J. Kistner (781) 972-1354 jkistner@healthtech.com ACCOUNT MANAGER — ACCOUNTS N–Z
Tim Reimer (781) 972-1342 treimer@healthtech.com CORPORATE MARKETING COMMUNICATIONS DIRECTOR
Lisa Scimemi (781) 972-5446 lscimemi@healthtech.com PROJECT/MARKETING MANAGER
Lynn Cloonan (781) 972-1352 lcloonan@healthtech.com ADVERTISING OPERATIONS COORDINATOR
Stephanie Cline (781) 972-5465 scline@healthtech.com DESIGN DIRECTOR
Tom Norton (781) 972-5440 tnorton@healthtech.com
Contributing Editors Michael Goldman, Karen Hopkin, Deborah Janssen, John Russell, Salvatore Salamone, Deborah Borfitz Ann Neuer, Tracy Smith Schmidt
Advisory Board Jeffrey Augen, Mark Boguski, Steve Dickman, Kenneth Getz, Jim Golden, Andrew Hopkins, Caroline Kovac, Mark Murcko, John Reynders, Bernard P. Wess Jr.
Cambridge Healthtech Institute PRESIDENT
Phillips Kuhl
Contact Information
VO L U M E 9 , N O. 6
Editorial, Advertising, and Business Offices: 250 First Avenue, Suite 300, Needham, MA 02494; (781) 972-5400 Bio•IT World (ISSN 1538-5728) is published bi-monthly by Cambridge Bio Collaborative, 250 First Avenue, Suite 300, Needham, MA 02494. Bio • IT World is free to qualified life science professionals. Periodicals postage paid at Boston, MA, and at additional post offices. The one-year subscription rate is $199 in the U.S., $240 in Canada, and $320 in all other countries (payable in U.S. funds on a U.S. bank only). POSTMASTER: Send change of address to Bio-IT World, P.O. Box 3414, Northbrook, IL 60065. Canadian Publications Agreement Number 41318023. CANADIAN POSTMASTER: Please return undeliverables to PBIMS, Station A, PO Box 54, Windsor, ON N9A 6J5 or email custserviceil@IMEX.PB.com. Subscriptions: Address inquires to Bio-IT World, P.O. Box 3414, Northbrook, IL 60065 (888) 835-7302 or e-mail biw@omeda.com. Reprints: Copyright © 2010 by Bio-IT World All rights reserved. Reproduction of material printed in Bio • IT World is forbidden without written permission. For reprints and/or copyright permission, please contact the YGS group, 3650 West Market St., York, PA 17404; 800-501-9571 or via email to ashley.zander@theYGSgroup.com.
[6 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
editor@healthtech.com 250 First Avenue, Suite 300 Needham, MA 02494
Follow Bio-IT World on Twitter and LinkedIn http://twitter.com/bioitworld
www.linkedin.com/groupRegistration?gid=3141702
Up Front News
Dealing with the Data Bonanza at Bio-IT World Europe At the second European event, attention turned to genomes, storage, and the cloud. using HP’s local Storage Cloud X9000 to address their data management challenges in translational research and clinical care. “Proteomics is the biggest challenge of data,” said Eussen.
H
ANNOVER—The second annual Bio-IT World Europe conference, held at 2010 BioTechnica, drew a large and enthusiastic crowd who were treated to three days of top-class presentations on multiple aspects of IT infrastructure, data storage and knowledge management. One of the undoubted highlights of the conference was a charming presentation from clinical geneticist Marjolein Kriek (University of Leiden), who described her personal experiences as the first woman to have her genome completely sequenced. Kriek (initially chosen because her name sounds like “Crick” in Dutch) said her genome contains more than 4,500 known substitutions (including 11 nonsense) and 600 unknown substitutions (137 nonsynonymous variants) in coding genes. “Are there other advantages? Yes, I have my own statue!” she joked. Chris Dagdigian (BioTeam) said that data management is less scary in 2010 than a few years ago, but while the amount of next-generation sequencing (NGS) data that is routinely being saved off the instruments is dropping, research consumption is going up. The physical movement of data is becoming more important, with Dagdigian saying he was a fan of eSATA toasters as the fastest way to carry Terabytes of data around campus. Guy Coates (Wellcome Trust Sanger Institute) said the Sanger was likely to have close to 15 Petabytes storage by the end of 2010 after installing about 30 new Illumina HiSeq instruments. Coates’ colleagues have experienced these data challenges before as new platforms arrive. Takeaways included the virtue of periods of “masterly inactivity,” during which no storage is bought until users have cleaned up their archives, and applying a “storage surcharge” to PIs requesting sequencing
JERRY LAMPEN/REUTERS
BY KEVIN DAVIES
First lady (sequenced) Marjolein Kriek
capacity to alert them to IT costs. Jurgen Eils (LSDF, Heidelberg) is providing data to the bioinformaticians working for the International Cancer Genome Consortium. “The Large Hadron Collider produces 15 Petabytes/year. By contrast, ICGC expects to produce up to 50 Petabytes data per annum,” said Eils. He predicted his group would be dealing in Exabytes within a few years, and would have to think about new data management strategies such as those form Elixir. Other IT infrastructure highlights were presentations from Rupert Lueck (EMBL), who is managing a lot of microscopy/movie data with an IBM blade center and NetApp NFS storage in a new 0.8 MegaWatt data center with watercooled server rack systems. Sweden’s Ingela Nystrom (Uppmax) said UPPNEX is playing a national role connecting seven institutions and filling up to 10 TB/week, using 800,000 core hours/month, 3,000 cores, 10 TB RAM and 800 TB storage by Panasas. The system launched in March 2010. At the Erasmus Medical Center, Bert Eussen and Peter Walgemoed were www.bio-itworld.com
New Resources The Structural Genomics Consortium (SGC), discussed by Brian Marsden (Oxford) has deposited more than 1,100 protein structures to date and 28 percent of all novel structures since 2009, but communicating that information to the research community is a huge challenge. That is changing thanks to a new approach called iSee (http://whatisisee.org), developed with Ruben Abagyan (MolSoft), allowing annotated 3D visualizations of protein structures, dynamic animations, and growing acceptance of the content by journals such as PLoS ONE. Peer reviewers love it, said Marsden. One reportedly said, “I’m no structural biologist, but this is freakin’ sweet!” “Consider Elastic-R as a huge jukebox,” said Karim Chime who provided a superb live demonstration of running R-based applications in the Amazon cloud and sharing the results in real time with another user on an iPad. The early applications may focus on teachers, but there are many potential life science applications. Reinhard Schneider (EMBL) is also helping the cause of enlivening static journals with a tool called Reflect, which won the Elsevier Grand Challenge in 2009. Reflect is a plug-in that can tag proteins, genes, or small molecule names in web pages, providing a convenient summary of that molecule’s properties. • Editor’s Note: In addition to Bio-IT World Europe, CHI also held two other simultaneous conferences, PEGS Europe and Molecular Diagnostics Europe. Bio-IT World Europe 2011 will hold its third annual event on October 11-13, 2011.
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[7]
Up Front News
Sequence, Drugs and Rock n’ Roll: How Ozzy Osbourne Took a Bite Out Of His Genome Genes reveal alcohol tolerance, caffeine intolerance, equal parts worrier and warrior. BY KEVIN DAVIES
[8 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
T ‘
he only ‘genome’ I’d ever heard of was the kind you find at the bottom of the garden with a little white beard and a pointy red hat.”
ALLSTAR PICTURE LIBRARY/ALAMY
Last year, shortly after completing work on rock music legend Ozzy Osbourne’s memoir, I Am Ozzy, The Sunday Times (London) reporter Chris Ayres was sitting next to Knome CEO Jorge Conde at the TedMed conference in San Diego. “When Ozzy and I began to do the weekly ‘Dr Ozzy’ column for The Sunday Times— now also in Rolling Stone—I got the idea to ask [Knome] about possibly sequencing Ozzy’s genome as a one-off article. It snowballed from there,” says Ayres. John Michael “Ozzy” Osbourne, the former lead singer of Black Sabbath, has become the latest member of the celebrity genome club, joining Glenn Close, Archbishop Desmond Tutu, Jim Watson, Craig Venter, Henry ‘Skip’ Gates and others. On October 24, Osbourne penned an absorbing 3,500-word account in the Sunday Times Magazine of his reaction to being presented with his personal genome on a souped-up thumb drive. Osbourne is a curious character for a genome sequencing project, but his lively musical career, characterized by controversial episodes involving the occasional decapitation of birds and bats onstage and the consumption of copious quantities of drugs and alcohol, makes Ayres and others curious as to whether the secret of his extraordinary longevity lies in the sequence of his DNA. Osbourne, who described himself as “a rock star, not Brain of Britain,” initially needed some convincing to go along. “The only ‘genome’ I’d ever heard of was the kind you find at the bottom of the garden with a little white beard and a pointy red hat,” he wrote. “The only Gene I know anything about is the one in Kiss.” But he came around. After all, he wrote, “Given the swimming pools of booze I’ve guzzled over the years—not to mention all the [drugs]… there’s really no plausible medical reason why I should still be alive. Maybe my DNA could say why.”
Ozzy Osbourne, writing in The Sunday Times Magazine Backing Band Supporting Osbourne was a tight threepiece outfit. The sequencing was conducted at St Louis-based Cofactor Genomics, performed on the Life Technologies SOLiD 4 platform, which also assisted in the early bioinformatic analysis. The final genomic interpretation and presentation was handled by Knome. Cofactor Genomics was established in 2008 by former members of The Genome Center (TGC) at Washington University, St Louis, who worked in the technology development group under Elaine Mardis. “We’d see people come to The Genome Center with cool projects, but there were reasons why they couldn’t take it on—capacity, funding issues, etc.,” says Jon Armstrong, Cofactor’s chief marketing officer. Armstrong and colleagues had plenty of experience installing new sequencing platforms, including the very first Solexa machine (serial number #001). “Because of Elaine Mardis and Rick Wilson, we couldn’t have done what we’ve done without being under their tutelage. We had a unique and privileged outlook on
www.bio-itworld.com
the emergence of next-gen sequencing and how that hardware interfaces with genomics. We’ve been involved in the development of those protocols and producing very high quality data under time constraints and doing it right the first time.” Cofactor offers Life Technologies’ SOLiD 4 and Illumina’s GA IIx instruments, and is contemplating whether to acquire Illumina’s HiSeq 2000, but needs “to understand what that machine means on our operational and fiscal side,” says Armstrong. Other platforms, such as Ion Torrent, might prove a better fit depending on how the sales pipeline evolves. Sometimes, “using a paring knife instead of a cleaver is a better choice,” he says. Armstrong says Cofactor has “a very strong analysis component,” and a good relationship with NGS software company NovoCraft (Malaysia), using NovoAlign for Illumina work and NovoAlign CS for some of the SOLiD data. Early work focuses mostly on RNA sequencing, with steady business coming from the biofuels and agricultural sector. Cofactor is also preparing a major
Life Line Houston-based Matt Dyer has a fieldbased position with Life Technologies, supporting Midwest users of the SOLiD platform on issues including experimental design and data analysis. “Jarret [Glasscock, Cofactor CEO] contacted me, and said they needed some help for an urgent, once-in-a-lifetime project,” says Dyer, who didn’t need to be asked twice. “I’m a big fan of biology and bioinformatics— though not Black Sabbath necessarily!” Because of the urgency of the project, Dyer used cloud resources offered by Penguin Computing (Penguin On Demand). After Cofactor uploaded the sequence data to the cloud, Dyer logged on and took over, with immediate access to thousands of compute nodes. No read filtering was performed. “We don’t do any filtering— we let the mapping do the filtering,” says Dyer. “If the read is bad, it won’t map to the reference.” BioScope is Life Technologies’ integrated framework that allows users to perform secondary analysis (mapping sequence back to the reference) and tertiary analysis (e.g. SNP calling, indels, CNVs), all within a single software package. BioScope can run on many different types of hardware, but Dyer says, “for folks who can’t maintain an expensive IT infrastructure, they can use this fee-for-service cloud, buying the CPU hours they need.”
Dyer used BioScope to map Ozzy’s sequence reads to the human reference genome (HG18)—the process took about 8-10 hours. From there, he created BAM files, which were then used for the tertiary analysis. “We gave those pipelines the BAM file and asked to return the SNPs and small indels,” he says. Knome was able to download the SNP and indel data via FTP, while Dyer shipped the full sequence data on a hard drive. Not gnome, Knome! “Ozzy is a pretty unique guy with an extraordinary life but he’s still around to talk about it. We thought we could highlight someone, an everyman if you will,” recalls Jorge Conde. He received some external funding (he did not identify the source) while Life Technologies funded the reagent sequencing. Knome took the assembled sequence and ran it through its internal analysis pipeline, producing a richly annotated genome and a set of interesting DNA variants to select for further analysis. Knome’s director of research, Nathan Pearson, then flew to the UK recently to deliver Osbourne’s results in person, but not before he had conferred with Osbourne’s wife, Sharon, a judge on America’s Got Talent. “We thought about what it is that makes Ozzy unique,” says Conde. “Given he’s a musician and he’s been diagnosed with a parkinson’s-like tremor, we looked for things associated with nerve function. We found a couple of interesting things— but as Nathan says, ‘We found smoke but no fire.’ There was no ‘Ozzy variant’—that goes without saying.” The interesting variants lie in the genes TTN (associated with deafness and Parkinsonism) and CLTCL1 (brain chemistry). They might merit further study, Conde suggests. Knome also found a novel variant in the ADH4 alcohol metabolizing gene, which might explain Osbourne’s legendary high tolerance for alcohol. Ironically, Osbourne is highly sensitive to caffeine, and has the genetic variant to prove it. Another interesting variant was in the COMT gene. Osbourne carries two versions of the gene that have been associated with “worrier” and “warrior” behavioral tendencies. Another interesting tidbit was the ability to screen Osbourne’s genome for traces www.bio-itworld.com
FILE PHOTO/MARK GABRENYA
project with Sigma to sequence the genomes of several strains of rat and build a publicly available database of rat genomic variation. As for Ozzy, Armstrong was approached by Life Technologies asking if he would be interested in “an amazing opportunity” that had come up. He was told that Life Technologies “needed this done right the first time.” Ozzy was not Cofactor’s first whole genome sequence, but it was Cofactor’s first serious use of the SOLiD 4 machine. Cofactor generated 39 Gigabases of Osbourne’s sequence (13X coverage) over about three weeks using both long-insert mate pairs and fragment reads. (Human genomes are usually sequenced to 30X coverage, but Osbourne’s sequence was time sensitive.) About 70 percent of the DNA reads were mappable, which he calls “very good.”
O ‘
zzy is a pretty unique guy with an extraordinary life but he’s still around to talk about it. We thought we could highlight someone, an everyman if you will.” Jorge Conde, Knome
of Neanderthal DNA, following new evidence of ancient inter-breeding between humans and Neanderthals emerging from the Neanderthal genome study. “We were looking for matches for long segments unique to Neanderthals, and found a couple of pretty long segments [in Osbourne’s genome],” says Conde. “But George Church [Knome co-founder] has 3x as much matching segment.” Overall, Conde says Knome enjoyed working on Ozzy’s genome. “I don’t think doing celebrity genomes is a business model,” he says, “but we’re thrilled to work on interesting projects as they come along. This was a lot of fun!” Life Technologies’ Matt Dyer agrees. He says the project was “a great example of how genomics and bioinformatics helps us understand what makes you you and me me.” As for those front-row Ozzy Osbourne World Tour tickets—the Prince of Darkness descends on St Louis on December 10 and Houston on January 18, 2011—Dyer says: “Still waiting for those!” • NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[9]
Briefs BEST PRACTICES CALL FOR ENTRIES The 2011 Bio•IT World Best Practices competition has released its call for entries. Since 2003, Bio•IT World’s Best Practices competition has been recognizing outstanding examples of technology and strategic innovation initiatives across the drug discovery enterprise. Winners will be announced in April at the 2010 Bio-IT World Conference and Expo (see p. 30) at a gala Best Practices dinner. The deadline for entry is January 14, 2011, and the early bird deadline is December 19, 2010.
SAME GENES, DIFFERENT ACTIVITY Researchers at the Allen Institute for Brain Science have found that the same genes have different activity patterns in the brain in individuals with different genetic backgrounds. These findings may help to explain individual differences in the effectiveness and side-effect profiles of therapeutic drugs and thus have implications for personalized medicine. The study was published in the Proceedings of the National Academy of Sciences.
PUBLIC FUNDING SciFlies.org aims to connect scientists seeking research project funding with thousands of donors who wish to support their work directly. Formally launched in November, SciFlies allows scientists affiliated with universities or research institutions to present their project needs and goals in terms that the general public will understand. Donors can then make direct, tax-deductible contributions to the projects of their choice through the site. The funds are deposited directly into the foundation accounts of the university or research institution with whom the scientist is affiliated for direct disbursement once the fundraising goal is achieved.
[10 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
Up Front News
GSK Announces Singapore Collaborations... Four academic groups get GSK drug discovery partnerships. BY ALLISON PROFFITT
SINGAPORE—GlaxoSmithKline has announced the first four academic partnerships under the GlaxoSmithKlineSingapore Academic Centre of Excellence (ACE) announced in January. Awards went to researchers at the A*STARSingapore Institute for Clinical Sciences; Duke-NUS Graduate Medical School; National University of Singapore, School of Medicine; the Singapore Eye Research Institute, and the National University Hospital. Academics have strengths in biology, understanding pathways, and clinical research, said Patrick Vallance, senior vice president of drug discovery at GSK, pointing to Singapore’s Biopolis research park as evidence. GSK brings expertise in chemistry to the table, and Vallance believes that coupling those together will lead to more innovation approaches to discovering medicines. “We really tap into the investment that’s been made in basic science and clinical science here in Singapore, which has put it very much on the world stage, and why we want to be here,” Vallance said. The amount given to each collaboration was not disclosed. “This is catalytic,” said Vallance. “We’re not funding a research program the way people normally do. We’re giving enough get to a stage to understand whether there is a joint drug discovery program that we can take forward together.” Vallance defined success for the ACE partnerships as identifying promising projects in academia, forging joint research teams, and developing successful drug discovery programs. The goal of ACE is not just knowledge discovery and good experiments, he said. The goal is good medicine. “It’s very clearly objective driven. If we don’t get to a stage where there’s somewww.bio-itworld.com
thing concrete we can latch on to and we both know we’ve got a medicine, then we’ll move on to something else.” The first four projects chosen for collaboration are in the areas of ophthalmology, regenerative medicine, and neurodegeneration. These projects align closely with research that GSK is already active and invested in at the neurodegeneration research center in Singapore. However, Vallance hopes that future ACE collaborations will expand into areas in which GSK is not already active. “Going forward I really want to move to the system of identifying something where somebody has a great target that they’re working on, and we end up applying the chemistry and the know how in drug discovery,” he said. First Four The collaborations announced yesterday will be evaluated on undisclosed individual objectives and timelines. ACE will consider new collaborations on a continuous basis. Feng Xu of the Singapore Institute for Clinical Sciences will be working with the GSK Sirtris Discovery Performance Unit (DPU) on investigating how cells store adipose tissue and protein disfunctions that can lead to diabetes and obesity. Eyleen Goh of Duke-NUS Graduate Medical School will work the with stem cell and neurodegenerative DPUs to study the brain’s ability to regenerate connections. Gavin Stewart Dawe of the National University of Singapore is also working GSK’s stem cell and neurodegenerative DPUs to develop an animal model for evaluating and refining experimental medicines for neurodegenerative disease. Finally, Tien Wong and Carol Cheung of the Singapore Eye Research Institute and Christopher Chen of the National University Hospital are partnering with GSK’s Singapore DPU and the neurodegenerative DPU to identify retinal vascular biomarkers for progression of Alzheimer’s Disease. GlaxoSmithKline has had a research center in Singapore since 2005. •
... As Lilly Shutters Singapore Discovery Center On the edge of a patent cliff, Lilly trims entire research center in Asia. BY ALLISON PROFFITT
SINGAPORE—Eli Lilly will be closing the Lilly Singapore Centre for Drug Discovery by the end of the year, according to an email sent on October 15 by Jonathon Sedgwick, managing director & CSO, Lilly Singapore Centre for Drug Discovery (LSCDD). The email, addressed to members of the Singapore scientific community, stated that LSCDD will cease operations in Singapore and the site will close by the end of 2010. Most of the ongoing drug discovery, biomarker, and computational sciences work will be transitioned to the U.S., said Sedgwick, as well as relocating “some of the key talent we recruited [in Singapore] to Corporate HQ [in Indianapolis, Indiana] to maintain continuity around the work initiated and developed at LSCDD.” Sedgwick said that Lilly hopes to retain as many scientific collaborations with the Singapore community into 2011 as possible. Sedgwick expressed disappointment, but noted, “We are also very aware of the challenges Lilly faces in the next few years and that on occasion, difficult decisions will be made to ensure our R&D organization is structured best to develop novel medicines for patients.” The challenges Sedgwick referred to include regulatory disappointments, legal issues, and a harrowing patent cliff. Over the next seven years the drugmaker will lose patent protection on drugs accounting for 74% of its 2009 sales. In a September 30 interview with the New York Times, Lilly’s CEO, John Lechleiter, promised that the company will survive. He said the company plans to eliminate 5,500 positions and $1 billion in costs by the end of 2011. Yet in late September, J. P. Morgan forecast “a prolonged period of depressed earnings” from 2012 to 2016. Analysts noted that Lilly has several interesting pipeline assets, but they remain several years from market. In Singapore, a spokesperson for
Lilly was housed in the Immunos building on Singapore’s Biopolis research campus.
LSCDD told Bio•IT World that the closing will not affect Lilly’s research areas of focus. “Projects and capabilities deemed a priority for Lilly’s global research organization will be transitioned to the company’s global headquarters,” the spokesperson said, adding that, “Lilly employees in Singapore affected by the closure will have the opportunity to apply for limited jobs at our global headquarters.” Lilly Sales and Marketing and Lilly’s phase 1 clinic, the Lilly-NUS Centre for Clinical Pharmacology, will remain in
Singapore. The clinic is the only Lilly clinical pharmacological unit outside the U.S. with its own clinical research unit for conducting clinical trials with new pharmaceutical agents. Lilly Singapore Centre for Drug Discovery currently houses a small component of oncology and diabetes drug discovery research as well as systems biology and an informatics/computational science group. The site has 130 research employees and the company has had a presence in Singapore since 2002. •
Millisecond Modeling Bringing to fruition a project he discussed at the Bio-IT World Expo in 2006, David E. Shaw and his colleagues at DE Shaw Research have successfully modeled protein folding and conformational change over the course of a millisecond on their purpose-built supercomputer, Anton. Researchers focused on two proteins: a WW domain, a small, independently folding protein domain, and bovine pancreatic trypsin inhibitor (BPTI). Extremely long all-atom simulations were conducted revealing multiple folding and unfolding events that consistently follow a well-defined folding pathway. The work was published in the October 15 issue of Science.
www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 11 ]
Up Front The Bush Doctrine
Data Hide and Seek vs. Safety Assessment ERNIE BUSH
A
s the fall colors begin to emerge in New England, it seems fitting that Cambridge Healthtech Associates is preparing for yet another conference/collaborative project looking to sort out the issues around the enormous volumes of data that are collected when conducting non-clinical safety assessment studies for new drug candidates. This is at least the fourth time in our short history that we have become ensnared in these discussions and one wonders why this is such a difficult problem and why the industry can’t seem to get a handle on it. In my 28 years of involvement with pharmaceutical R&D, I’ve seen no single issue that has generated more meetings, instigated more fervent debate, consumed more dollar investment, and generated solutions with less overall impact than what to do with our study data once a report has been finalized (especially the regulated GLP data). The average tox study report contains 500-1000 pages of data/results, often collected from 5-10 different LIMS systems that cross several departmental boundaries. In addition, increasingly these studies are conducted at CROs from around the globe or are licensed-in (or inherited via mergers and acquisitions) from other companies that had different processes and informatics tools. Finally, most large pharmas have study reports dating back at least to the implementation of GLP (1978) and many large pharma were still producing paper tox reports well into the late 1990s. The veritable ocean of safety data at any one company (especially those with multiple research sites around the world) is itself staggering and when you think this is reproduced 10-20 times across the industry, the possible knowledge resource is mouth watering to anyone wishing to build in silico models of toxicity. Yet at most companies, to my eternal amazement, this information is so difficult to get at that it makes any kind of data mining or knowledge generation activity all but impossible. What is worse, even simple questions like “Have we seen this before?” are also nearly impossible to answer. Sadly and remarkably, most companies have very little access to their preclinical safety history. Here at CHA, this is the number one area that current or past heads of toxicology/preclinical safety tell us
[12 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
they would change if they had a magic wand. And with the advent of new high data volume safety assessment tools like high content analysis and toxicogenomics, the problem is getting worse at an alarming rate. The Season for Success In the mid 1990s, many in the preclinical safety field began to envisage a system that could be queried across all their safety studies to ask questions about their safety history and possibly generate new knowledge. A few brave souls began designing/ building such systems based on the emerging field of data warehousing, and then started the long (and expensive) process of cleaning and moving all their LIMS data into the new warehouses. Some of these have now been in production for several years, yet the overall assessment of their value has remained fairly negative: ••They are overly complex and specific, often based on database schemas that are themselves mammoth and difficult to navigate and mine effectively (especially for users just looking to ask simple questions about their previous work). ••They are s-l-o-w, with some users reporting query times of hours to days. ••They were built by IT groups as part of some larger vision for a company-wide R&D informatics repository and therefore the needs and objectives of the preclinical scientists are not considered or prioritized appropriately. ••While cleaning and copying data from an in-house LIMS is expensive, there are tools that can make this semi-automated. But, for studies coming from CROs, predecessor companies, or from paper reports, there are no good options for doing this in an automated fashion. As such, the majority of data warehouse systems currently in use are missing large sections of the safety history that comes from these sources. ••Names and terminology is often different across LIMS and departments, and have themselves evolved over the years. This makes apples to apples comparisons difficult when looking to compare data across systems, sites and companies. So, what is different now that leads me to believe this time it will be worth the effort? 1. There is a growing sense that a solution generated by any one company will be expensive and not as useful as one generated on an industry-wide basis (including the LIMS vendors, i.e. that this is probably best done on a collaborative basis). 2. There are several new tools on the market whose primary function is to gather and synthesize data from multiple sources and thus obviate the need to clean and copy data from many LIMS systems into one giant data warehouse. 3. There are multiple efforts to set standard terminology to make queries across systems more effective and consistent. 4. Hopefully, we will have learned from our past experiences. Ernie Bush is VP and scientific director of Cambridge Health Associates. He can be reached at ebush@chacorporate.com.
Insights | Outlook
Epigenetics and the Role of Phenotypic Changes to DNA
chemical modifications of histone proteins and direct methylation of DNA. The first mechanism of epigenetics to be widely studied was DNA methylation. DNA methylation is a chemical modification in which a methyl group is added to the cytosine group in the DNA. DNA methylation patterns in normal and cancer cells differ making the DNA methylation mechanism an attractive drug target, since increased DNA demethylation appears to be useful in the treatment of cancer. Two of the four FDAapproved epigenetic drugs are demethylating agents, and most of the diagnostic activity to date in epigenetics has focused on methylation markers.
Histone Targets A second epigenetic control mechanism involving the modification of histones has been discovered. Histones are small, alkaline proteins that are associated with DNA. Segments AL DOIG of DNA wrap around the histones to form nucleosomes, the basic structural unit of chromatin, which resembles a “string of hen I first studied molecular biology, life was beads,” the string being the DNA and the histones, the beads. relatively simple; a gene composed of DNA The extent of condensation of chromatin varies during the was transcribed to yield RNA, which in turn stages of the cell lifecycle. For example, in non-dividing cells, was translated to synthesize a protein. In the most of the chromatin is relaxed and not tightly condensed. intervening years this progression from gene However, in situations where the chromatin is highly conto protein has turned out to be deceptively densed, genes cannot be transcribed thus resulting in a gene simplistic. In 2010, this revelation should come as no surprise transcription control mechanism. The N-terminal domain of since the Human Genome Project and its aftermath has placed histones is modified by various enzymes to modulate the degree the number of human genes very close to the number of genes of condensation. These modifications of the fruit fly, about 21,000. I can’t fly, but a fruit fly can include enzyme catalyzed chemiscan’t ride a bicycle, so what’s going on? tries such as acetylation, methylation, Researchers have known for many years that inheritEpigenetic drugs or phosphorylation. These modificaable phenotype changes can occur that do not involve tions may either activate or repress changes in DNA sequences. This type of inheritable difmay prove to be transcription of the genes. Among the ference is referred to as an epigenetic change, and differs useful for a wide histone modifying enzymes, histone from genetic changes that result from nucleic acid base deacetylases (HDACs) and histone alterations in the DNA sequences of genes. The term range of diseases methyltransferases represent targets “epigenetics” was first used by Conrad Waddington in and cancers. for drug development. the 1940s, even before the structure of DNA was underThe first epigenetic drugs to stood, but it has taken a few decades of research aided reach the market target hematological cancers. Within the by advances in instrumentation and software to appreciate the hematological market, the currently approved indications for epigenetic control mechanism. Today, we know that epigenetics these epigenetic drugs represent fairly small markets. The two has an important role in embryonic development and is also DNA hypomethylation agents [Celgene’s Vidaza (azacitidine) involved in aging and disease processes such as cancer. and Eisai’s Dacogen (decitabine)] are both FDA-approved for Genetic information resides on chromosomes that consist of treatment of myelodysplastic syndromes (MDS), while the two chromatin, the combination of DNA and histone protein, and FDA-approved HDAC inhibitors [Merck’s Zolinza (vorinostat) other proteins. The functions of chromatin are to package DNA and Celgene’s Istodax (romidepsin)] are indicated for treatso it fits inside the cell nucleus, to strengthen the DNA for the ment of cutaneous T-cell lymphoma (CTCL). It is expected rigors of cell division, and to control DNA replication and gene that epigenetic drugs may prove to be useful for treatment of a expression. I’m focusing here on the epigenetic control over wide range of diseases including hematological cancers, solid gene expression by changes in chromatin structure, affected by tumors, and other non-cancer indications.
W
Further reading The field is reviewed in Insight Pharma Reports’ Epigenetic Drug & Diagnostic Pipelines: DNA Methylation, HDAC Inhibitors, and Emerging New Targets, June 2010. www.insightpharmareports.com
Al Doig is general manager, Insight Pharma Reports. He can be reached at adoig@healthtech.com. www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 13 ]
Clinical Trials
Most Trials Now Eligible for Design Simulation New simulation tool expands the range of eligible trials.
T
BY DEB BORFITZ
he first upgrade to a one-of-a-kind simulation tool co-developed by Tessella and Berry Consultants significantly enlarges the proportion of clinical studies that can benefit from simulation of the design. With FACTS 2, 85% of all clinical trials can be productively simulated, says Scott Berry, the consultancy’s president and senior statistical scientist. That’s up from about one-third of trials with the original FACTS (Fixed Adaptive Clinical Trial Simulator), introduced last year, which focused exclusively on single-endpoint Bayesian dose finding and dose escalation trials. Berry and Tom Parke, head of clinical trial design at Tessella, launched FACTS 2 at last month’s ADAPT 2010 conference in Arlington, Va. Berry Consulting (College Station, Texas) is a statistical consulting group specializing in adaptive clinical trial design and Tessella (Oxford, UK) is a global information technology and consulting services company with clients in the life sciences industry. After collaborating on trial optimization projects for more than a decade, the firms have created a simulation package that can be used to study a trial design’s operating characteristics at the planning stage. The software allows the impact of different dose response profiles, trial accrual rates, and subject drop rates on the likely trial outcome to be explored. The user interface for FACTS 2 is much like a conventional PC application, explains Parke, with the user-specified parameters handed to a “statistical powerhouse” that does all the mathematical processing. The program can also be distributed on the cloud to run simulations in parallel, he adds. Summary results are presented in the form of easy-to-read graphs and tables.
[14 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
Scott Berry
Tom Parke
Trial design simulation is intended to predict how a study is apt to play out if executed under a variety of potential scenarios and assess the impact of various design features on the study objectives, allowing for the best use of resources and better odds of success. Trial sponsors up to now have been wary of trial design simulation, says Berry, despite positive mentions in Food and Drug Administration draft guidance on adaptive clinical trials issued earlier this year. The vast majority of companies have been designing clinical trials using a classical statistical framework developed over 50 years ago. Expanded Simulation Four major pharmaceutical companies are currently using FACTS and two of them are at an “advanced stage,” says Berry. This includes the firm partnering in the tool’s development, which may soon be simulating up to 90% of its drug development portfolio. New features introduced by FACTS 2 could expand the practice of trial simulation considerably, particularly for oncology and central nervous system studies. FACTS can now simulate dose-finding trials with multiple endpoints, such as studies simultaneously driven by efficacy and toxicity goals. The efficacy value of a drug can be scaled by a safety factor that reduces the utility of a dose if side effects are above a chosen threshold, explains Berry. FACTS 2 will make clinical teams www.bio-itworld.com
and sponsors “think hard about why they’re doing a trial.” Like the existing FACTS 1 dose-finding simulator, longitudinal modeling of the relationship between subjects’ early responses and their final outcome can be used to maximize the information from subjects that have dropped out or have not yet completed. Time-to-event trials based on a time duration measurement—e.g. time-todeath and time-to-progression (oncology), time-to-failure (medical devices) and time-to-recovery (post-surgery)—can also be accommodated with FACTS 2. Relative to a quantitative response variable, time to event endpoints are now “fairly common,” Berry says. The new feature makes the potential benefits of early stopping, dropping ineffective arms, and adaptive randomization easy to explore. Rather than a collection of pre-canned trial designs, FACTS offers sponsors a “list of [pre-programmed] ingredients” they can combine to suit virtually any study circumstance,” says Parke. With the dual endpoint “utility function,” biostatisticians can create entirely novel trial designs, including those combining early and long-term endpoints or a pair of efficacy endpoints. Phase I “combination design” trials that vary the dose of two drugs given in combination remains a candidate for inclusion in a future release of FACTS. The world of clinical trial design is expanding with the upsurge in personalized medicine, so FACTS will ultimately assist with the design of studies focused on molecular-level efficacy, says Berry. Forthcoming upgrades will also expand the “multiple endpoint engine” to four or five endpoints and simulate designs for trials of a single drug in multiple diseases or tumor types. The latter feature could be especially useful in speeding the development of therapies for orphan diseases. •
Taming the Beast BY ALLISON PROFFITT
Copernicus Group made a fairly weighty donation to a local North Carolina school last month: four boxes of paperclips and binder clips all formerly used to wrangle the “paper-moving beast” that was Copernicus’ clinical trial review process. The independent institutional review board (IRB) doesn’t need them anymore. They’ve gone completely paperless. To assure regulatory compliance, IRBs review research protocols and study-related information, as well as investigator qualifications and resources. An IRB has the authority to approve, require modifications to secure approval, or disapprove research. Of the 200 accredited IRBs in the country, independent bodies like Copernicus account for only about 10%. Copernicus Group is nearly 15 years old, and by 2006, they’d generated an astonishing amount of paperwork. 96% of forms came in to Copernicus electronically, via fax or email. But those forms were then printed, paper clipped, passed around the office, and often scanned at the end of the process, says Jennifer Sodrel, director of IT at Copernicus. In addition to that, the IRB had 22 linear feet of its own paperwork to deal with each week. In 2007, Copernicus chose to go paperless, but not just moving forward. After getting started, the group made the unprecedented decision to scan over five million pages of active and open cases so that their entire database could be accessed by their clients. Making the switch to paperless was essential for efficiency, says Sodrel. “We were able to take all of those images and electronic documents, route those through our process, make them searchable for our Board, more efficient in transferring, and then apply electronic signatures to those and make our output of documents 21 CFR Part 11 compliant.” Copernicus started their own protocol tracking system , a proprietary software
PHOTO COURTESTY OF COPERNICUS GROUP IRB
Going paperless speeds study review and gives clients real time access to process. that was developed by Copernicus at its inception. Working with Sitrof Technologies, a company specializing in unstructured document management with extensive expe- When Copernicus completed the transition to paperless clinical rience FDA 21 CFR trial review, they were left with boxes of unused paperclips. Part 11 compliance, being accurate and on time,” says Barry Copernicus was able to link its system to Mangum, director of clinical pharmacola compliant electronic signatures system. ogy at the Duke Clinical Research Unit. Xerox DocuShare serves as the backbone “We need to make sure that we have the of the system. Sitrof installed DocuShare right version control. We need to make and then wrapped it with a compliance sure that we have the right regulatory module created to automate the workdocuments in the right place and engaged flow and decision process for electronic at the right time so that we can get IRB records while maintaining Part 11 compliapproval.” ance. An outside service came in to scan, Mangum’s group is doing hundreds of tag, and index the active forms. studies a year, mostly first in man or proof Portal Advantages of concept studies. “Especially in the early In mid-September, Copernicus launched clinical pharmacology world,” he says, Connexus, fully validated, FDA 21 CFR “time is of the essence.” Part 11 compliant and capable of providHaving worked with Copernicus for ing a full audit trail for users and manages more than two years, and using the Conevery phase of IRB documentation. With nexus portal for several months, Mangum appropriate security permissions, users has found the working relationship with can call up forms they’ve already submitCopernicus to be “tremendously open ted and watch the tracking system, giving and transparent”, something he believes users real time updates on the status of can be uncommon in the world of clinical the review, which documents are missing, research. and which need changes. The transition to Connexus was very “We’re always looking to improve easy, he says. And after the first few trials in everything that we do, whether it be with Connexus, Mangum and his team quality or speed or efficiency. The use of took a step back to assess the process. technology in the form of Connexus has The tracking options in Connexus enabled us to achieve all of those goals,” revealed some challenges to efficiency. says Bruce Tomason, CEO of Copernicus. “We were sitting on some documents For the review board, Connexus prolonger than we needed to sit on them at vides searchable archives and all of the our own shop and it became self evident necessary forms in one place. “It enables after we did the look-back who was hold[the board], we think, to make better ing up what and when,” he says. Easy decisions than they’ve made in the past,” communication with Copernicus allowed says Tomason. for streamlining the process. “Once we For clients, Connexus manages the figured out who was holding up what, submission process and enables audits. when and where, we could go back and “In a regulatory world, it’s all about correct that.” • www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 15 ]
Clinical Trials
Novel IT Platform Helps Novartis Gain Control of Clinical Imaging Data ImagEDC, an open-source platform, is a game changer for clinical trial management. BY KEVIN DAVIES
Researchers at Novartis in Basel have developed a powerful new electronic data capture (EDC) hub for clinical data that has allowed its investigators to control data across multiple trials as never before, according to David Tuch, head of clinical imaging at the Novartis Institute for Biomedical Research, Switzerland. “This is a game changer for managing clinical trials at Novartis.” Tuch credits his colleague Stefan Baumann, head of imaging infrastructure, for driving the project forward. A physicist by training, Baumann joined Novartis’ clinical imaging team in 2006, charged with managing the imaging IT. Traditionally, Novartis scientists assessing clinical imaging data would only receive a numeric analysis from a central reading location, rather than the primary image data of the clinical trial, whether MRI, CT, PET, or ultrasound. (Participating hospitals would send their images to a third party for evaluation, which would then send the results to Novartis.) But Novartis investigators wanted the primary image data for a couple of good reasons. First, they wanted to be able to save the image data for later retrieval in exploratory studies with an academic partner. Moreover, access to the primary image data would allow investigators to compare images from earlier trials with improved analysis algorithms. “Getting ownership around the data— that was my goal,” says Baumann. The Novartis group built an internal infrastructure as a hub for image data to do quality control. “It’s a perfect infrastructure to analyze and share data with collaborators,” says Baumann. For a start, it is in compliance with Good Clinical Practice (GCP) and Privacy Regulations such as HIPAA and the more stringent European Privacy Regulations. The custom system is based on an Oracle 11g
[16 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
SITE
EDC device
Images Modalities
Imaging Middleware Transport
Images & Metadata CRO
Imaging Middleware Transport
CRO Workflow
Sponsor Integration
Images & Metadata Sponsor
Imaging Middleware Transport
Sponsor Services
ImagEDC gives Novartis researchers unprecedented control over clinical image data.
database, using the latest digital imaging and communications in medicine (DICOM) features. Baumann notes that this was a full partnership and his colleagues contributed to the feature set in the database. Sitting on top of this is an application allowing analysis core labs to log in, submit, and retrieve data. “We can also www.bio-itworld.com
run quality checking steps and trigger further analysis,” says Baumann. External partners can submit data to the hub, but with a growing number of applicable clinical trials, Novartis needs to take it to the next level to ensure machine-machine integration. This is where Baumann’s first opensource effort—called ImagEDC—comes
be to find some private clouds to in (See: http://code.google.com/p/ host the data. imagedc/). It allows hospitals and Tuch declines to mention the academic core labs to load data, vendor Novartis has used to supwhile the tool ensures that the ply the application layer for the data fit Novartis’ desired format. image system, but notes that an“It helps any academic partner,” other pharma is buying a license says Baumann. “They don’t need to from the vendor. “We bring in the care about licensing restrictions.” business knowledge; they offer the Not only is it open source and finished product to us at a lower easy to submit data, but it has price and sell to other parties.” useful advantages regarding reguBaumann offers an anecdote latory requirements. “If you’re to illustrate the value of their dealing with clinical trial software, infrastructure. “We had an entity it must be installed and tested analyzing a multi-center trial, but in compliance with GCP regula- NIBR’s Baumann has image control at the push of a button. while it was still ongoing, the tions,” explains Baumann. “Open entity went bankrupt and couldn’t com“If storage is not transparent, it could be source enables CROs to integrate code plete the analysis. That would have been very hard to make any sense out of the with their own infrastructure. It’s much a disaster... But because we had the data data,” says Baumann. He admits that the easier this way.” loaded into our system, we could a) start scale of the data doesn’t really match that Why would Novartis share this softthe analysis again and b) plugging into of next-gen sequencing in terms of size, ware and know-how with its competitors? this infrastructure, we could plug in a new but they are just as complex in terms of “The main thinking is that, on the level of algorithm to fully automate the analysis the data structure and the distributed transporting images from point A to B, for and complete that analysis from scratch.” nature of clinical trials. every trial and hundreds of partners, this With ImagEDC, Baumann says, NoNow, at the push of a button, Bauis not what we consider to be a competivartis is using a public platform to transfer mann’s team can retrieve the images tive advantage,” says Baumann. It’s only data between partners, which involves from their central data archive. And with later, when one starts to engage in quality middleware to transport the data, ensurImagEDC, they’re preparing for a world checking and analyzing the data, that the ing high performance and security. “The of federated image data sources. With business knowledge and competitive adlimit is the bandwidth between the partmore and more organizations testing vantage comes into play. ner and us—there is a bottleneck,” says Cloud-based storage options, Baumann Every couple of months, Baumann Baumann. “One option is to use optimized says his team is keeping its options open. hosts an informal meeting of a handful of protocols. Then we have a backup option, “We’re not an IT company,” he notes. informatics experts at other big pharma “Our core knowledge is outside storing companies, where they discuss ways of so if your data density goes beyond a cerPetabytes of data!” His preference would enabling image data exchange. Work is tain point, we use physical media.” • currently progressing to shape the interface to be broadly usable so other sponEDC device sors and academic labs can reuse them. “It’s not just about a tool but specifying a common interface re-usable by everyone,” says Baumann. Site Site Site Site Site Baumann says the success of their infrastructure comes down to the capability to innovate. “We have a handle on data Images & Metadata now,” he says. “We can react to what has happened in clinical trials. That’s the big success of the infrastructure.” We’re Not an IT company “In 2006, our key objective was to own the data in house,” says Baumann. Of course, handling such large and complex datasets poses some familiar problems. There was concern that clinical trial image data, left in the hands of Novartis’ partners, could end up in a “data grave.”
Images & Metadata CRO
CRO
Sponsor ImagEDC’s capabilities are spreading to other trial sponsors.
www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 17 ]
Computational Biology
A Community Experiment for the Genome Commons Steve Brenner is addressing the challenges, both sociological and technical, of shared genomic data.
I
BY KEVIN DAVIES
n 2007, University of California, Berkeley computational biologist Steve Brenner published a provocative commentary in Nature proposing the Genome Commons, an initiative to expedite the creation of tools and resources for personal genome interpretation. Brenner wanted to encourage the genome community to see the wisdom of taking this on. “To my surprise and disappointment, it didn’t really have much uptake,” Brenner told Bio•IT World recently. “It was only after that that I went about seeing how I could muster the resources to take this on myself.” Brenner has had mixed success in that endeavor until now. Last year, he recruited Reece Hart from Genentech to be the Genome Commons chief scientist at Berkeley, but Hart left after less than 12 months to join an in silico drug design company called Numerate. “Reece’s transition [from industry to academia] didn’t go quite as anyone expected,” said Brenner. “Reece is really terrific and it’s a huge blow to have him leave.” Capturing external funding sources has also not gone as smoothly as Brenner would have liked, though he said he is confident that an industrial source will be confirmed shortly. Early Goals Brenner initially identified five main goals for Genome Commons (see http://genomecommons.org), chief of which, says Brenner is: “Let’s collect our information in one place.” He bemoans the fact that genotype and phenotype information is either siloed, of inconsistent quality, or both. “There are more databases for these [disease-related] genes than there are genes,” he says, referring to resources
[18 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
such as OMIM, GeneTests, PharmGKB, dbSNP, DECPIHER, the NHGRI’s GWAS database, and many more. HGMD has the largest array of annotated mutations, but as Brenner notes on the Genome Commons Website, database creator David Cooper, frustrated by his inability to secure grant funding, partnered with a commercial firm that seeks user license fees. “As we enter the era of personal genomes, there is a profound new impetus for suitable open resources,” says Brenner. “We want to create a Genome Commons Database, open access and open source,” says Brenner, “it’s a severe sociological problem.” He acknowledges that creating and amalgamating life sciences databases is tough. “You can imagine, they wouldn’t be very happy if someone came along and swooped up all their data and put it somewhere else, and they got no credit for the effort they put in there.” That is why his efforts for now are more sociological than scientific or technological, working with groups such as NCBI, EBI, and clinical genetics groups among others. “We could build it ourselves, or help someone like NCBI or PharmGKB to build it,” said Brenner. CAGI Community This December, Genome Commons will convene a promising community experiment called CAGI (The Critical Assessment of Genome Interpretation), to evaluate computational methods for predicting phenotypes based on genome variation data (http://genomeinterpretation.org). The program will be modeled on the successful CASP (Critical Assessment of Structure Prediction) meetings, which had a profound impact on methods for 3D protein structure determinations (see, “On the CASP of a DREAM,” Bio•IT www.bio-itworld.com
Steve Brenner
World, Nov 2006). Brenner says the first CAGI—he dubs it “Pre-Pro CAGI” to emphasize this is just the first iteration—is not designed to pick winners per se “but find challenges and lay groundwork to improve methods in the future.” The idea is for participants to take genetic variants and make predictions of molecular, cellular, or the organism’s phenotype. These predictions will be evaluated and reviewed against experimental characterizations. “CASP had a profound impact,” says Brenner, making the best use of protein structure evolutionary information. “The whole field turned based on using alignment information. We want to do the same thing for genome interpretation. Here’s a [gene] variant: predict!” Brenner is encouraging people to submit predictions, which will then be assessed and evaluated. He hopes that CAGI will help identify bottlenecks in genome interpretation, inform critical areas of future research, and connect researchers from diverse disciplines whose expertise is essential to develop powerful methods for genome interpretation. Datasets are being contributed by the likes of George Church and Jasper Rine, while the assessors will be Pauline Ng (Genome Institute of Singapore) and Gad Getz (Broad Institute). •
11-10_BITW_Ad_Pages.indd 39
10/28/2010 11:13:20 AM
Computational Biology [GUEST COMMENTARY ]
A Personal View of Personal Genomics Mike Cariaso discusses personal genomics for the rest of us. BY MIKE CARIASO
In case you missed its quiet entry, Personal Genomics is here. The list of early adopters begins with famous names like J. Craig Venter, James Watson and George Church’s Personal Genome Project (PGP). Approximately 100,000 names followed, buying SNP (single nucleotide polymorphism) kits from 23andMe, Navigenics and others, acquiring some level of information about their own DNA. There is a wide range in how much of each genome was read, but no technology today is able to tell you everything you might want to know at any price. Buying early was no bargain and in the past two years, directto-millionaire sequencing has fallen from $1 million to $20,000. Even in today’s $1,000 to $100 direct-to-consumer (DTC) market, paying more has little correlation to learning more. That’s good news, since many of the early adopters have paid even less, having instead been subsidized by investor speculation and the need for reference data. Knowing your raw data is not the same as understanding its implications. Every day new studies are published, which expand our ability to interpret a personal genome. Last week’s high confidence result often fails to replicate in this week’s new population. Today’s accumulated genetic information still fails to explain much of the observed family inheritance of common diseases. What we do know with near certainty rarely causes more than a 20% increased risk in some disease, often while decreasing the risk of something else. Since 2006, SNPedia.com has been tracking this progress. Of 20 million known genomic variations, we’ve cataloged 13,000 that have been published in the scientific literature as having some observed consequence. It is quite remarkable how rare and few are the ones which are directly causative of anything noteworthy. Instead, most have subtle effects on how the body responds to its environment. In an environment as di-
[20 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
verse and dynamic as our planet, it’s hard ously shed will incriminate the guilty, and to say that any one genotype is good or free the innocent. Even if the perpetrator bad. The genome has many stories to tell, is not actually in such a database, relatives but so far it stubbornly refuses to share its will be, and that will help focus the invesbiggest secrets. tigation. Prospective parents will consider You are probably not a member of this genetic compatibility early in the dating genomic secret society, but as a reader process, and dating services will make of Bio•IT World, someday you probably that sort of screening a first step in sugwill be. Is it yet worth the price of admisgesting matches for lonely hearts. Parents sion? For the healthy will not rely on dangerbut science minded, the ous trial and error to most useful thing you identify food allergies in can expect to learn today their newborns. Indeed, o technology is not about disease. the answers to many of Environmental factors today’s larger medical today is able to make most common dismysteries will be obvitell you eases still too hard for ous in such a detailed us to predict, but pharnetwork. everything you macogenomics is yieldThe majority of the ing early fruits. Users of first 100,000 DTC cusmight want to Plavix, Warfarin, statins tomers have been genoknow at any and many other medityped on an Illumina micines may find value in croarray. It was labeled price. genetic testing today. ‘For Research Only’ but James Watson exemplias the marketing of DTC fied this when he noted that he is homozycompanies has become bolder, so has the gous for the ‘10’ variant of the cytochrome FDA’s opposition. Some sort of regulation P450 drug metabolizing gene, CYP2D6. is coming to the field, but what kind? He has subsequently reduced his betaToday the only thing certain is that the blocker dosage from daily to weekly. would-be regulators are as uncertain as the rest of us. Genomics for Genealogy While there is considerable trepidation Following in the footsteps of scientists, at the marketing of genomes, I’ve been technophiles and their families, the next pleasantly surprised by the nearly uniwave of personal genomics will come versal consensus that you have a fundafrom family genealogists. Their numbers mental right to your genome. The biggest and their eagerness to share and compare reluctance comes from those who want genomes and family pedigrees will fuel you to have the data, but only after they companies and Facebook applications. tell you what it means for a reasonable fee. Empowered by DNA’s precise audit trail, But the head of the NIH, Francis Collins, they will entirely ‘solve’ genealogy by has said, “free and open access to genome mapping the full flow of human ancestry. data has had a profoundly positive effect Any anonymous DNA sample will immeon progress.” FDA regulations may curtail diately fit in exactly one place in this tree the marketing, but it seems increasingly of humanity, and from it we will know unlikely to limit the fundamental availall of your ancestors. Being adopted will ability of personal genomes for the masses. never be the same again. The learning curve is still steep and much That same public network of genomes uncertain remains, but the path seems will be used for non-genealogical pursafe with no fundamental barriers to conposes. The hair and skin cells we continutinued progress in all directions. •
N
www.bio-itworld.com
SPECIAL ADVERTISING SECTION
Signature Series
High Performance Computing
Life Sciences Pins Hope on HPC
I
T WAS IN THE EARLY 1960s THAT Gordon Moore for-
hundreds upon hundreds of blades simply for mapping and
mulated Moore’s Law, which stated that the number
basic annotation.
of microprocessors that could be squeezed onto a
The Cloud—or computing-as-a-service—could be the
computer chip would roughly double every 18 months.
answer, said Collins, but he wasn’t banking on it. Indeed,
Remarkably, Moore’s prediction has held true, but it
Collins hypothesized a change in paradigm, a new era of
is nothing compared to the glut of data being produced
Science-as-a-Service, where scientists become private man-
not only by next-generation sequencing (NGS) fleets but
agers of sorts to design experiments, send off samples and
also (lest we forget) many other forms of high-throughput
collect data while focusing on the technology interactions.
data-generating instrumentation, from proteomics and high-
At Merck Research Laboratories, Martin Leach, who
content assays to clinical imaging. The shift in the pattern
heads discovery and preclinical IT, is still grappling with the
and volume of data being handled by academic and pharma
aftershock of the big merger of Merck with Schering-Plough,
groups gives new meaning to the term “big biology.”
producing an organization of some 100,000 people. “What
Viewers of a recent Bio•IT World Web Symposium on
do we keep and not keep? I’m in the ‘Department of Sky
improving IT infrastructures for life science organizations
Surfing.’ It’s like cleaning a garage.”
heard two wonderful appraisals of the challenges of managing vast volumes of data. Jack Collins, a computational biologist at the National
Wherever possible, Leach grabs software and other IT needs “off the shelf.” “Knowledge is lost when people do homegrown stuff,” he said. When Merck closed Rosetta
Cancer Institute (NCI) in Frederick, MD, marveled at the
Inpharmatics, for example, Leach inherited 1.6 million files
data bonanza before listing the many bottlenecks organiza-
of genetic data. While much of that data was subsequently
tions such as his face in coping. “I can only use so much
transferred to Sage Bionetworks, the worry is that no one
power, archive so much data, employ so many informatics,
knows what the data are 6-12 months later.
IT and HPC personnel,” he said, adding that user expectations seldom match user education. Collins illustrated the size of the problem by discussing
Big pharma’s raw storage needs aren’t quite at the petabyte level yet, but it’s only a matter of time. Merck just signed a letter of intent to partner with China’s BGI, for
the scope of The Cancer Genome Atlas (TCGA). The figures
example. “I just got a request for another 30 TB because of
are staggering: TCGA results in 600 Gigabytes (GB) of data
a letter of intent,” said Leach. The increasing emphasis on
per patient per cancer. With 500 patients for each cancer,
modeling and simulation can lead to the sudden demand
that equates to 300 Terabytes (TB) data per disease, and
for 2,000 cores of compute power. “I could buy some iron
with the project studying 20 different cancer types, that
or explore the Cloud,” said Leach.
results in a mere 6 Petabytes (PB) of primary data. Collins and Co. will have to manage more than 100
Advances in HPC continue to impress. David E. Shaw and colleagues described the application of a supercom-
TB locally in the next 2-3 years of just primary data, not
puter called Anton to accelerate computational simulations
the data they have to store. Speculating on the impact of
of protein folding out to durations of 1 millisecond —an
the next wave of NGS machines from Pacific Biosciences
eternity in the Brownian world of molecular dynamics.
and Ion Torrent Systems, which could lead to a personal
In the following pages, you’ll read about a variety of HPC
sequencer in each investigator’s lab, and Collins could be
products and technologies that are having an impact in
looking at many petabytes of additional data to house, and
organizations around the world as we speak.
www.bio-itworld.com NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 21 ]
SPECIAL ADVERTISING SECTION
Signature Series
High Performance Computing
QUANTUM
Accelerating Computational Workflows
I
t’s no secret that life sciences research is now a data-driven proposition. The combination of new generation lab equipment and the availability of reasonably priced high performance servers has shifted many drug discovery efforts from looking into beakers to examining bits and bytes. In fact, the research and development process
increasingly centers on computational analysis. Such analysis provides the critical information needed to make intelligent decisions about which new drug candidates hold promise and should be advanced and which should be put aside.
The Need for Speed With the growing reliance on computational analysis, life sciences organizations need an IT infrastructure that ensures computational workflows are optimized and not impeded. Pressure to run the workflows as fast as possible so research decisions can be made sooner comes from several business drivers. First, many life sciences organizations today have sparse new drug pipelines. Delays caused by slowdowns in research data analysis simply keep the pipelines empty. Second, compounding the need to fill the pipelines is the fact that many blockbuster
Executive Summary · New sequencers and lab equipment are generating orders of magnitude more data · As data volumes grow, lifecycle management can consume great amounts of staff time · Optimized use of storage resources is essential to contain costs, improve operations, and keep data analysis workflows humming · Quantum solutions offer simplified management of data through tiers to ensure high performance workflows are not interrupted and storage resources are used most efficiently
[22 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
drugs have gone or are going off-patent and must be replaced. In fact, more than a dozen top selling drugs came off or will come off-patent between 2009 and 2011. Third, having fewer new drug candidates coming to market is bad enough, but the situation is actually worse because competition is increasing. The cost to run new lab equipment, particularly next-generation sequencers, has dropped significantly in the last few years. Similarly, high performance computing systems with supercomputing processing power are now within the price range of most organizations. This has opened up new levels of research capabilities to even the smallest life sciences organizations. And fourth, the high cost of drug development means it is greatly beneficial to fail early.
Potential Workflow Impeders All of these factors are forcing life sciences organizations to look for ways to keep their computational workflows running smoothly with the highest levels of computational throughput. However, there are several developments that can introduce bottlenecks and slow down this critical workflow analysis. To start, new lab equipment, particularly next-generation sequencers and new imaging gear, are producing orders of magnitude more data than their predecessors. In many cases, experiment output files are extremely large and must be accessed numerous times. Clusters of servers accessing www.bio-itworld.com
such datasets can quickly overwhelm the infrastructure resulting in computational slowdowns. Adding to the problem is that fact that the data produced by the next-generation lab equipment is much richer than that produced by the previous generation of equipment. The issue here is that the richness of the data makes it valuable to many disciplines. As a result, these large datasets must be accessed by the many researchers in the different disciplines all using different analysis tools. All of this makes workflows highly unpredictable and results in complex data management issues.
Life sciences organizations need an IT infrastructure that ensures computational workflows are optimized and not impeded Unfortunately, the situation is only going to become more challenging. New so called third generation sequencers will produce even richer datasets. And quite interestingly, while the size of the output files will be smaller, due to the increased sophistication of the measurement techniques there will be many more files and file types. This will put new throughput and I/O demands on IT and storage infrastructures.
What’s Needed? To ensure computational workflows run at their optimal speed, life sciences organizations need an intelligent way to share data across servers and workstations from a single storage source without requiring migration to various pools of storage. Also needed is the capability to easily scale
SPECIAL ADVERTISING SECTION
StorNext Data Management Software High Performance File Sharing Features
Enterprise Data Management and Protection Features
• SAN File System: Delivers high-performance
• Replication: Enables flexible data protection and data distribution
• Distributed LAN Client: Provides NAS-like scalability to thousands of server nodes • Shared File System: Offers simultaneous file access across platforms • Platform Independence: Supports Windows, Linux, Mac and UNIX • Storage Vendor Agnostic: Supports all major disk and tape systems
storage and selectively move data between different tiers of storage, each with different price/performance characteristics. Lab data must be on the highest performance systems, accessible to researchers across disciplines for analysis and discovery. When that data is part of a workflow it needs to be stored on more cost-effective systems to retain it for additional review and retrieval. And lastly, that data should be backed up and, if necessary archived to tape. If done manually, management of these processes can be time consuming adding to the total cost of ownership of the storage and IT systems. To meet these storage and data management needs in the life sciences sector, Quantum offers its StorNext data management software. StorNext software is comprised of two core components. The first is the StorNext File System, which is a shared file system that is operating system independent, enabling concurrent shared access to a single pool of data across heterogeneous operating systems. This allows organizations to not have to keep separate copies of large files or move one file to different systems for workers using computers with different operating systems. Essentially, one copy of the file can be saved and all users can access it simultaneously, accelerating the workflow. The file system supports data access by LAN and SAN clients. Of particular note is the fact that SAN clients can directly access
• Nearline Deduplication: Reduces storage requirements, optimizes capacity and cost of Tier 1 storage • Management Console: Simplifies data management complexities and reporting • Storage Manager: Drives transparent tiered storage and archiving • Distributed Data Mover (DDM): Improves access performance and scalability of storage tiers
files over high-speed Fibre Channel or iSCSI connections to enable high-performance computing workflows. Additionally, LAN-connected clients can achieve a performance boost using StorNext Distributed LAN Client (DLC), which uses a proprietary protocol that significantly reduces the overhead typically found with NFS and CIFS clients. The high line rates and efficiencies achieved when using this protocol means the solution can scale to thousands of compute or analysis nodes, outperforming many standard NAS solutions. These features can simplify data management in a life sciences organization. For example, when used in genomic research, a bench chemist running a Windows PC, an IT manager running a Linux desktop, and a database specialist running a UNIX workstation can all access a single copy of a data file concurrently, while maintaining file integrity. The second part of StorNext data management software is the Storage Manager. A primary capability of Storage Manager is that it supports tiered storage and transparent data movement. As data progresses through its lifecycle, it can be moved off of the highest performance systems after analysis, to more economical systems for easy access, and to tape for long-term and even more economical storage. In addition, this capability can be used to move data off of old and onto new storage systems, as well as migrating data between storage tiers. So as new storage systems offering
higher performance, more capacity, and lower energy consumption are added, older systems can be gracefully retired. During these processes, StorNext Storage Manager helps ensure that high levels of access are maintained. From the user’s perspective, the data movement is completely transparent -- the files always remain in the same namespace, regardless of physical location on disk or tape tiers. This allows life sciences organizations to keep projects and workflows intact since, for example, third party applications do not need to be modified or workflows altered to access the moved data. Combined, these features help life sciences organizations better manage the data explosion in their labs and improve the performance of their computational workflows. Specifically, Quantum solutions meet the need for simplified management of data through tiers to ensure high performance workflows are not interrupted and storage resources are used most efficiently.
For more information, visit: www.quantum.com/stornext
www.bio-itworld.com NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 23 ]
SPECIAL ADVERTISING SECTION
Signature Series
High Performance Computing
R SYSTEMS
R Systems provides record-breaking performance in next-generation sequencing at the University of Illinois Institute for Genomic Biology (IGB)
W
hen University of Illinois faculty member Jian Ma needed high performance computing resources with 256GB RAM/node to assemble several mammalian genomes, it was suggested he contact R Systems. “I did not know these resources were available,” stated Ma. “R Systems has world class
HPC resources that can assist any researcher with next-generation sequencing data analysis projects”. R Systems, a leading provider of high performance computing resources located in the University of Illinois Research Park, provided IGB access to its newly installed Dell AMD Magny Cours cluster. Each node
contains (48) 2.2GHz cores with up to 256GB RAM/node and Quad Data rate Infiniband interconnect. We received record breaking performance and had the genome sequence completed in less than 8 hours.
“Not only did the hardware blow away our expectations, the service provided by R Systems was outstanding. Without their systems and support, this project would have taken months to finish. We went from bare metal to the same configuration we use on our internal resources in a very short period of time. They were there to walk us through the system and answer any questions we had before we started our project”. About R Systems: R Systems NA, Inc. is a privately held corporation providing technical expertise and cluster resources for high performance computing. R Systems provides services aimed at benefiting the commercial and academic research communities and improving the quality of life throughout the planet.
Contact us today at 217.954.1056 or visit us at www.rsystemsinc.com to learn more about what we do!
[24 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
SPECIAL ADVERTISING SECTION
SGI
High Performance x86 Systems for Breakthrough Bioscience Meeting the Challenge of Sequencing Data Analysis
M
any sequencing machines are now generating over one terabyte (TB) of data daily. Some large centers have multiple machines, which results in enormous requirements for fast, scalable data access in application processes. Couple that with the demands of pre- and post-processing analysis and the comput-
ing challenge is even more daunting. SGI’s answer to this challenge is the Altix®UV shared-memory server platform. The Altix UV system is the largest capacity server available today as measured in number of processor cores and memory addressable in a Single System Image (a single instance of the operating system). The Altix UV system tops out at 2048 processing cores and 16 Terabytes of memory — the maximum supported by the Intel®Xeon®7500 series processors. This large memory address space accommodates very large datasets where a variety of applications can benefit from direct access without having to wait for relatively slow disk. Applications can access data thousands of times faster using memory vs. standard hard drives. SGI Altix UV is an innovative system designed to give users fast processing, combining multiple Intel Xeon 7500 series processors. These computational powerhouses each contain up to 8 cores. The Altix UV 100 model starts with a 4 CPU/32 core building block with up to 256GB of memory. These units can be linked together to build out larger systems. For even larger requirements, the SGI Altix UV 1000 version contains up to 512 cores and 4TB of memory in a single rack. A full-scale Altix UV 1000 is comprised of 4 racks and totals 2048 cores and 16TB memory in one instance of the Operating System (OS). In addition
to its industry-leading system scalability, the Altix UV supercomputer also features accelerated message-passing (MPI) and runs standard, off-the-shelf Linux, so that x86 software can run unmodified. Results of standard industry benchmarks show that the Altix UV 1000 is unmatched at computational throughput. Aside from performance, there are a number of other important benefits. The first comes from the ease of code development and optimization that is inherent in a system that essentially operates just like a super-powerful workstation. Another is the efficiency and ease of management that comes from being able to handle entire computational workflows — including preand post-processing analysis — on one platform without moving data. The SGI Altix UV solution can drive huge gains in time-to-solution for compute and data-intensive applications like computational chemistry, genome reconstruction and systems biology. This performance and versatility means excellent TCO and ROI for ongoing work as well as exciting new possibilities for breakthrough discovery. In the dynamic world of genomics, there is one constant; the explosion of data generated in sequencing work. These fast-growing datasets demand new and more efficient approaches to quickly access, process and analyze results. SGI®, with a long history
in high performance computing, has kept pace with these trends in offering the most scalable compute and storage platforms in the industry.
Visit www.sgi.com/altixuv for more information.
www.bio-itworld.com NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 25 ]
SPECIAL ADVERTISING SECTION
Signature Series
High Performance Computing
CONVEY COMPUTER
Hybrid-Core Computing Convey Computer Tames Data Deluge
A
ll across life-sciences research, large datasets and rigorous computational demands are the rule, not the exception. Richardson, TX-based Convey Computer understands these challenges. Founded in 2008 by a group of distinguished high-performance computing (HPC) executives—including IEEE Seymour Cray Award winner, Steven Wallach—Convey Computer’s mission is to develop innovative, practical approaches for improving data-intensive computing. Convey has pioneered a new innovative architecture that pairs classic Intel®x86 microprocessors with a coprocessor comprised of FPGAs1 to create the world’s first hybrid-core computer, the Convey HC-1. Particular algorithms — DNA sequence alignment, for example — are optimized and translated into code that’s loadable onto the FPGAs at runtime to accelerate the applications that use them. Convey calls these accelerated algorithms “personalities.” “Basically, this creates hardware that is unique to the algorithm you are running,” explains Dr. George Vacek, manager of Convey’s Life Sciences business unit. “There is no better way to make things faster than to wire them down into gates on a chip.” When an application is running, the extended instructions created specifically for that application are dispatched to the coprocessor. The personality needed for each program is loaded at run-time to reconfigure the coprocessor with optimized instructions for that specific application. Though still young — the HC-1 has been shipping since June 2009 — Convey’s hybrid-core computers have achieved impressive performance on a variety of bioinformatics applications. For example: • Sequencing. The Convey implementation of the Smith-Waterman algorithm (used for aligning DNA and protein sequences) is 172x faster than the best software implementation on conventional servers and represents the fastest Smith-Waterman implementation on a single system to date.2 • Proteomics. University of California, San Diego (UCSD) researchers achieved a
[26 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
roughly 100-fold faster performance of their sophisticated MS/MS database search tool program — InsPecT — that is able to accurately identify post-translational modifications (PTM). • Computational Phylogeny Inference. University of South Carolina developed and accelerated MrBayes, a phylogenetics application able to accurately infer “evolutionary trees,” a problem that was previously considered impractical on most computer systems. Performance is significantly faster even than other FPGA implementations. • Genomics. The Virginia Bioinformatics Institute (VBI) is using Convey hybrid-core systems for its microsatellite analysis work for the 1000 Genomes Project, an international effort to sequence the genomes of approximately 2,500 people from about 1 Xilinx Field Programmable Gate Array 2 Metrics for a single system. According to Convey’s internal benchmarking, the Smith-Waterman implementation is 172x faster than SSEARCH in FASTA on an Intel Nehalem core using the SIMD SSE2 instruction set. The company’s hybrid-core computer can process 688 billion cell updates per second (GCUPS) as compared to four GCUPS for SSEARCH in FASTA.
www.bio-itworld.com
20 populations around the world. These achievements reflect a fundamental change in the scientific computing landscape. “A very important thing has happened in the last couple of years,” says Dr. Harold “Skip” Garner, executive director of VBI, Professor in the Departments of Biological Sciences and Computer Science at Virginia Tech, and Professor in the Department of Basic Science at the Virginia Tech Carilion School of Medicine. “We’ve crossed over from data acquisition being the most expensive [part of research] followed by analysis to just the opposite. Data acquisition is getting cheaper all the time and analysis is becoming more complex and more expensive.” Scaling up standard microprocessor systems to cope with the new challenges won’t work, he says. “We just don’t have enough electricity, cooling, floor space, money, etc. for using standard clusters or parallel processing to handle the load. “The real key here,” says VBI’s Dr. Garner, “is that Convey has created a general purpose, paradigm-shifting machine and software environment. It can be applied anywhere a standard microprocessor can. An HC-1 is easily within the budget of anyone who’s buying a small cluster.”
For more information about Convey Computer, visit www.conveycomputer.com
11-10_BITW_Ad_Pages.indd 23
10/28/2010 11:06:37 AM
Feature
A Turnaround for New toxicogenomics tools accelerate early-stage safety assessment in drug development. By Dora Farkas
W
ith a 90% drug attrition rate in clinical trials due primarily to the high incidence of adverse effects, there is a huge financial incentive for drug companies to develop methods that will quickly eliminate toxic compounds from the drug pipelines. “The toxicology community is very conservative, with much reliance on well established histopathological diagnoses and clinical chemistry endpoints,” says Peter Lord, a consultant at DiscoTox. The safety of drug candidates is often assessed late in the development phase—after considerable resources have been invested—because of the lack of earlystage toxicity assessment tools. In spite of stringent safety testing, liver toxicity alone accounts for 40% of drug failures in clinical trials and 27% of market withdrawals. Conventional toxicology methods, such as histopathology and clinical chemistry, are too expensive to conduct on a large scale during the discovery phase. Toxicogenomics (TGx)—the branch of genomics that analyzes the interactions among the genome, exogenous chemicals, and disease—has progressed significantly in the past decade. Open-source and commercially available TGx databases now catalogue the genomic signatures of tens of thousands of chemical entities that serve as reference compounds for investigational drugs and chemicals. Advances in computational tools and data mining software have also facilitated early-stage safety assessment and elucidation of new pathways of toxicity.
[28 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
Peter Lord
Philip Hewitt
The development of TGx tools began over a decade ago but its role in safety assessment is still debated. Some toxicologists view TGx as mostly hype with few results, while others think it is only a matter of time before TGx is added to the battery of early-stage toxicity tests. Philip Hewitt, head of molecular toxicology at www.bio-itworld.com
Merck Serono in Germany, thinks the high cost of TGx technologies is discouraging, but companies are beginning to incorporate them into their pipelines as a result of positive results. “The major hindrance [to adopting TGx technologies] is management acceptance and costs of performing such expensive gene expression profiling studies,” he says. “The only way this can change is for more success stories where a drug was pushed forward (or stopped) and saved the company money. The costs of performing these experiments must fall and, of course, new low-gene number assays will be pushed.” According to Lord, who also has previous experience at Johnson & Johnson, GlaxoSmithKline, and AstraZeneca, the skepticism is rooted in the traditional toxicology community, which is resistant to incorporating new technologies into their protocols. “Many of the more experienced toxicologists have little molecular biology background and understandably it is a challenge for them to get a realistic sense of how to assess and integrate new molecular technologies. With the advances in computational technology in the last ten years, TGx analysis has become much faster and more easily incorporated with other data for biological context.” Examples of markers that toxicologists look for include changes
Toxicogenomics? Improvement
ADAPTED FROM LORD ET AL. BASIC & CLINICAL PHARMACOLOGY AND TOXICOLOGY (2006)
in the expression of genes for Learning Paradigm Compounds cytochromes P450, secondary Single Dose drug metabolism enzymes, and 24 Hours RNA Gene Selection Confirmation proteins involved in apoptosis x3 and cell proliferation. Discriminant “In a former company I saw Liver or Analysis Toxicogenomics Real-time TGx used to resolve conflicting Other Organs cDNA Microarray Database PCR data from early rodent safety studies. Several compounds Prediction Internal Compound n that produced no liver damage ctio edi r P Single Dose according to histopathology, 24 Hours RNA nevertheless showed an inx3 crease in liver enzymes indicaVal tive of liver damage. After the ida Liver or tion cDNA Microarray TGx analysis suggested no Other Organs liver toxicity, we were more confident in moving the drug Validation into the next phase of developRepeated Doses, Multiple Days Histopathology ment and we set up investigax5 Clinical Chemistry tions into the reasons for the Liver or liver enzyme increases,” Lord Other Organs recalled. The experimental methods Microarray data can include hundreds of millions of data points. Processing those data requires organizing and normalizing the data, statistically processing the data for significance and robustness, and visufor TGx analysis begin with alizing it in a biological context for interpretation. the collection of RNA 24 hours tems have been developed specifically to of thousands of microarray experiments after dosing with a test compound (see facilitate early-stage toxicity assessment are now available from several governFigure). Toxicology-specific microarrays in drug discovery. With genomics signament-sponsored sites. The Comparative from Affymetrix and GE Healthcare with tures, as well as histopathological and Toxicogenomics Database (CTD) from only a few thousand oligonucleotides clinical chemistry endpoints from thouthe National Institute of Environmental significantly simplify the analysis of data. sands of known compounds, these dataHealth Sciences (NIEHS) integrates The interpretation of the data, however, bases serve as references for the analysis information from public sites such as still presents a bioinformatics challenge. of novel candidates. The databases also ChemIDplus, Drug Bank, and PubMed Microarray experiments generate hunenable users to build their internal TGx and contains over 22,000 references as dreds of thousands of data points, and databases, including the genomics fingerof August 2010. The National Center typical TGx databases integrate thouprints of their investigational compounds. for Computational Toxicology (NCCT), sands of microarray experiments, aggreCommercial databases are complea division of the EPA, provides the Disgating hundreds of millions of data points mented by predictive modeling software tributed Structure-Searchable Toxicity overall. Data processing and biostatistical such as IXIS and ToxShield Suite from (DSSTox) Public Database Network as a analysis software gradually reduce the Iconix and Gene Logic, respectively, public forum for searching and publishdata to thousands of data points for interwhich provide detailed toxicity reports ing toxicity data, and focuses primarily on pretation by systems biology tools such as based on established biomarkers and a the effect of environmental chemicals on MetaCore from GeneGo and Genedata’s rank ordering of lead compounds. Biogene expression and disease. Expressionist System. markers identified by these packages can Commercially available toxicogenomTo build their internal TGx databases, serve as leads in later stage preclinical ics databases such as DrugMatrix from academic and industry research groups studies, including histopathology, clinical Iconix Biosciences, ToxExpress System frequently use open-source and commerchemistry, and molecular pharmacology, from Gene Logic, and the Ingenuity cial TGx databases as reference. OpenKnowledge Database from Ingenuity Syssource TGx databases, which catalog tens (continued on page 32) www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 29 ]
Register by December 10 and Save up to $650! Cambridge Healthtech Institute’s Tenth Annual Cambridge Healthtech Institute’s Tenth Annual
Cambridge Healthtech Institute’s Tenth Annual
April 12-14, 2011
World Trade Center Boston, MA
CONFERENCE & EXPO ’11 CONFERENCE & EXPO ’11 CONFERENCE & EXPO ’11 – 14, 2011 • World TradeCenter Center ••Boston, MA MA April 12 –April 14, 12 2011 • World Trade Boston,
April 12 – 14, 2011 • World Trade Center • Boston, MA
Enabling Technology. Leveraging Data. Transforming Medicine.
Enabling Technology. Leveraging Data. Transforming Medicine.
Enabling Technology. Leveraging Data. Transforming Medicine.
CONCURRENT TRACKS:
EVENT FEATURES: Access All Nine Tracks for One Price
1 IT Infrastructure – Hardware
Network with 1,700+ Global Attendees
2 IT Infrastructure – Software
Hear 125+ Technology and Scientific Presentations
3 Cloud Computing NEW!
Attend Bio-IT World’s Best Practices Awards
4 Bioinformatics
Connect with Attendees Using CHI’s Intro-Net
5 Next-Generation Sequencing Informatics NEW!
Participate in the Poster Competition Choose from 12 Pre-conference Workshops
6 Systems & Predictive Medicine 7 eClinical Solutions for Clinical Trials and Clinical Operations 8 eHealth and HIT Solutions for Personalized Medicine
See the Winners of the following 2011 Awards: Benjamin Franklin Best of Show Best Practices View Novel Technologies and Solutions in the Expansive Exhibit Hall
9 Drug Discovery Informatics NEW!
And Much More!
Making the World’s Knowledge Computable Stephen Wolfram, Ph.D., CEO, Wolfram Research; Creator of Wolfram|Alpha
PRESENTATION BY:
Featured Presentations by:
Platinum Sponsors:
Gold Sponsors:
Official Publication:
Bronze Sponsors:
Corporate Support Sponsor:
Organized & Managed by: Cambridge Healthtech Institute 250 First Avenue, Suite 300, Needham, MA 02494 Phone: 781-972-5400 • Fax: 781-972-5425 • Toll-free in the U.S. 888-999-6288
11-10_BITW_Ad_Pages.indd 26 BIT_BITW_2page.indd 1
www.Bio-ITWorldExpo.com
10/28/2010 10/28/2010 11:11:26 9:15:58 AM
650!
2011 Bio-IT World Conference and Expo Speakers (confirmed as of October 2010) Paul Aldridge, CIO, Genomic Health Jonas Almeida, Abell-Hanger Distinguished Professor, Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center V.A. Shiva Ayyadurai, Executive Director, International Center for Integrative Systems Research, Massachusetts Institute of Technology Andreas Bender, Ph.D., Lecturer for Molecular Informatics Dept of Chemistry, University of Cambridge Brian Bissett, M.B.A., M.S.E.E., Staff Analyst, Office of the Chief Information Officer, Social Security Administration Toby Bloom, Ph.D., Director of Informatics, Genome Sequencing, Broad Institute Atul Butte, MD, PhD, Assistant Professor, Stanford University School of Medicine; Director, Center for Pediatric Bioinformatics, Lucile Packard Children’s Hospital Zhaohui (John) Cai, Ph.D., Director Biomedical Informatics, Clinical Information Science, AstraZeneca Pharmaceuticals Inc Werner Ceusters, Ph.D., Director, Ontology Research Group, NYS Center of Excellence in Bioinformatics & Life Sciences Murtaza Cherawala, Senior Information Technology Architect, Enterprise Applications Group, Biogen Idec Baek Hwan (BK) Cho, Ph.D., Postdoctoral Fellow, Murphy Lab, Lane Center for Computational Biology, Carnegie Mellon University Mick Correll, Associate Director, Center for Cancer Computational Biology, Dana-Farber Cancer Institute Cindy Cullen, SSBB, CISM, CISSP, Chief Technology Officer, SAFE BioPharma Association Chris Dagdigian, Founding Partner and Director of Technology, BioTeam, Inc. Chinh Dang, Ph.D., Senior Director of Technology, Allen Institute for Brain Science Kaushal Desai, Global Informatics Lead for Real World Evidence Program, AstraZeneca Pharmaceuticals Inc Sue Dubman, Ph.D., Senior Director, Global Biomedical Informatics, Genzyme Ramesh Durvasula, Ph.D., Director, Chemistry Informatics, Bristol-Myers Squibb, and Board Member, Pistoia Alliance Sean Ekins, Ph.D., Collaborations Director, R&D Drug Discovery, Collaborative Drug Discovery Inc. Kevin Eliceiri, Ph.D., Director, Laboratory for Optical and Computational Instrumentation, College of Engineering, University of Wisconsin-Madison Norbert Fritz, Ph.D., Development Leader, Product Development Information Management, F. Hoffmann-La Roche Ltd Hugo Geerts, Ph.D., CSO, Computational Neuropharmacology, In Silico Biosciences and Adjunct Associate Professor, School of Medicine, University of Pennsylvania-Philadelphia
Bruce Gomes, Ph.D., Head of Mathematical Modeling, Systems Biology Group, Research Technology Center, Pfizer, Inc. Tim Harris, Ph.D., CTO and Director of the Advanced Technology Program, SAIC-Frederick Nurit Haspel, Ph.D., Assistant Professor, Computer Science, UMass Boston William Hogan, M.D., Associate Professor and Chief, Biomedical Informatics, University of Arkansas for Medical Sciences Hai Hu, Ph.D., Deputy Chief Scientific Officer and Senior Director, Biomedical Informatics, Windber Research Institute Curtis Huttenhower, Ph.D., Assistant Professor, Department of Biostatistics, Harvard School of Public Health C. Victor Jongeneel, Ph.D., Director, Bioinformatics & Biomedical Informatics, National Center for Supercomputing Applications and Institute for Genomic Biology, University of Illinois at Urbana-Champaign John Keilty, Vice President, Informatics, Infinity Pharmaceuticals Jochen Kumm, Ph.D., Director of Biomathematics and Head of IT at the Stanford Genome Technology Center, Stanford University Sharon Marsh, Ph.D., Assistant Professor, Pharmacy and Pharmaceutical Sciences, University of Alberta Michelle Munson, President and Co-Founder, Aspera, Inc. Ramzi Najm, Vice President, R&D Information and Technology Management, Allergan Lydia Ng, Ph.D., Director of Atlas Development, Allen Institute for Brain Science Florian Nigsch, Ph.D., Presidential Postdoctoral Fellow, Novartis Institutes for BioMedical Research
Vijay Samalam, Ph.D., Director, Information Technology & Scientific Computing, Janelia Farm Research Campus, Howard Hughes Medical Institute Richard Scheuermann, Ph.D., Professor, Director, Division of Biomedical Informatics, University of Texas Southwestern Medical Center Uma Shankavaram, Ph.D., Staff Scientist, National Cancer Institute/National Institutes of Health Phillip Sheu, Ph.D., Professor, Electrical Engineering and Computer Science and Biomedical Engineering, University of California, Irvine Ola Spjuth, Ph.D., Researcher, Department of Pharmaceutical Biosciences, Uppsala University; Project Leader, Bioclipse Susie Stephens, Ph.D., Director In Silico Immunology, Centocor Research & Development Steven Sweeney, Director, Head of Clinical Operations, Infinity Pharmaceuticals James Swetnam, Lead Scientific Programmer, Pharmacology, New York University Langone Medical Center Sandor Szalma, Ph.D., Head, Oncology Informatics, Oncology Biomarkers, Centocor R&D, Inc. Joseph Szustakowski, Ph.D., Senior Group Head, Bioinformatics, Biomarker Discovery, Novartis Bryan Takasaki, Ph.D., IS Informatics Science Director, AstraZeneca James Taylor, Ph.D., Assistant Professor, Departments of Biology and Mathematics & Computer Science, Emory University
Marina Nillni, Head EDC, Dana Farber Cancer Institute
Hanno Teeling, Ph.D., Scientist, Department of Molecular Ecology / Microbial Genomics & Bioinformatics Group, Max Planck Institute for Marine Microbiology
Peter Park, Ph.D., Assistant Professor, Pediatrics, Harvard Medical School
Gregg TeHennepe, Senior Manager, Research Liaison, Information Technology, The Jackson Laboratory
Eric Perakslis, Ph.D., Vice President, Research & Development IT, Johnson & Johnson Pharmaceuticals Research and Development
Jennifer Teta, Program Director, IT, Merck
Angel Pizarro, Director of ITMAT Bioinformatics Facility, Institute for Translational Medicine and Therapeutics, University of Pennsylvania Keith Robison, Ph.D., Lead Senior Scientist, Informatics, Infinity Pharmaceuticals Inc. Tibor van Rooij, Ph.D. Candidate, Pharmacy and Pharmaceutical Sciences, University of Alberta; former Director of Bioinformatics, Génome Québec and Montreal Heart Institute Pharmacogenomics Centre Michael Rosenberg, M.D., President and CEO, Health Decisions, Inc.
Peter J. Tonellato, Ph.D., Visiting Professor and Senior Research Scientist, Department of Pathology, BIDMC and Center for Biomedical Informatics, Harvard Medical School James Weatherall, Ph.D., Global Lead, Biomedical Informatics, Clinical Information Management, AstraZeneca Richard Wellner, President, Object Environments Stephen Wolfram, Ph.D., CEO, Wolfram Research; Creator of Wolfram|Alpha Yate-Ching Yuan, Ph.D., Director of Bioinformatics Core Facility, Molecular Medicine, Beckman Research Institute, City of Hope
Jonathan Rothberg, Ph.D., CEO, Ion Torrent Adam Ruskin, Ph.D., Director, Clinical Operations, Emergent Biosolutions
For more speaker additions, please visit: www.Bio-ITWorldExpo.com
www.Bio-ITWorldExpo.com
11-10_BITW_Ad_Pages.indd 27 BIT_BITW_2page.indd 2
10/28/2010 10/28/2010 11:11:47 9:16:01 AM
Feature
Toxicogenomics (continued from page 29)
to confirm suspected pathological endpoints. Most computational platforms are now Web-based and allow researchers to share results with other investigators. Furthermore, with the help of pathway analysis tools such as IPA-Tox from Ingenuity, researchers can make predictions about organ-specific toxicities particularly for the heart, kidneys, and the liver. To evaluate how well TGx methods predict long-term toxicity, in 2008 the Predictive Safety Consortium evaluated the correlations between the genomic fingerprints and the carcinogenicity of over 150 compounds. The study assessed the accuracies of two published hepatic gene expression signatures by Mark Fielden and Alex Nie. The evaluations were conducted in two laboratories with different microarray platforms, and the accuracies of the genomic signatures for predicting carcinogenicity were estimated to be 55-64 and 63-69 percent respectively. Interestingly, the internal validation estimates for the signatures were reported to be over 85%, and the decreases in the percentages were attributed to the differences in experimental methods. These results have prompted the consortium to establish standardized carcinogenicity signatures on quantitative PCR (QPCR) to aid in the validation of results across different laboratories. Overall, this study confirmed the application of TGx in earlystage safety assessment, but the numbers were not considered sufficient for regulatory decision making. In another report, which evaluated TGx in acute toxicity, the correlation between the adverse effects of acetaminophen and its genomic fingerprint were compared over five different studies. While each study identified different sets of affected genes, the results were encouraging as all of the laboratories reported changes in the stress response genes known to be involved in acetaminophen toxicity, in spite of variations in experimental methods. Epigenetic Excitement As TGx is slowly incorporated into earlystage safety assessment, epigenomics is
[32 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
Genedata’s Expressionist analysis platform stores, analyzes, and manages profiling data from all major commercial technology vendors.
also gaining attention from drug development companies. Epigenetic changes include modifications in the genome that do not affect the DNA sequence, such as DNA methylation, histone modification, and RNA silencing. DNA methylation in particular, has been shown to be involved in the development of diseases such as cancer, multiple sclerosis, diabetes, and schizophrenia. “The application of epigenomic profiling technologies within the field of drug safety sciences has great potential for providing novel insights into the molecular basis of a wide range of long-lasting cellular perturbations including increased susceptibility to disease and/or toxicity, memory of prior immune stimulation and/or drug exposure, and transgenerational effects,” says Jonathan Moggs, head of molecular toxicology and translational sciences at the Novartis Institutes for Biomedical Research. Of all the epigenetic changes, DNA methylation is the simplest to measure and traditional detection methods include bisulfite DNA sequencing, methylation-specific PCR, and MALDI mass spectrometry. A recently-developed highthroughput method for detecting epigenetic changes is from Illumina, which combines the GoldenGate genotyping assay with universal bead arrays. This method is has been shown to distinguish normal and cancerous lung tissue samples. As more epigenomics methods are tested and optimized, the hope is that they can www.bio-itworld.com
be applied to detect epigenetic changes in large populations for the diagnosis of cancers and other diseases. “Epigenomics has significant potential to impact translational sciences in the coming years. In particular, there is an opportunity to exploit and enhance emerging knowledge from epigenome mapping initiatives on the dynamic range of epigenetic marks in normal tissues versus disease states and also to investigate the extent of epigenome perturbation by xenobiotics,” Moggs says. The Innovative Medicines Initiative (IMI), funded by the EU, is one of the organizations that is working on elucidating mechanisms of nongenotoxic carcinogenesis (www. imi-marcar.eu). While there has been a revolution in high-throughput technologies in the last ten years, methods for interpreting large genomics datasets are lagging behind. One of the newest data management tools for whole genome analysis is Genedata’s Expressionist System, which stores, analyzes, and manages profiling data from all major commercial technology vendors. The Expressionist System “supports mRNA profiling using microarrays, PCR and next gen sequencing technologies, proteomic profiling using 2D gels, antibody arrays and mass spectrometry, metabolomic profiling based on mass spectrometry and NMR and genomic profiling using next generation sequencing and SNP arrays,” says Jens Hoefkens, head of Genedata Expressionist Business
cal Path Institute in Arizona, consists of disadvantages of ArrayTrack are that 1) Unit. Genedata’s other package, the Ex16 pharmaceutical companies including it is based on expensive database softpressMap, has the capability to interpret Pfizer, Novartis, and Merck. A similar orware (Oracle), 2) it was not designed to data from different omics platforms. ganization in Europe, InnoMed PredTox, integrate data from different ’omics platExpressMap “enables scientists to easily is a joint consortium between industry forms, and 3) it is not a public repository, combine data from different technology and the European Commission composed and cannot easily incorporate data from sources and use the integrated data for of 14 pharmaceutical companies, three other laboratories. statistical analysis without going through academic institutions and two technology ArrayTrack’s successor, ebTrack, has tedious matching of biological entities providers. These organizations are workwider-scope of analysis tools, which cover across technologies,” adds Hoefkens. ing toward combining data from ’omics genomics, proteomics, metabonomics, In September 2010, Genedata antechnologies and conventional toxicology and in vivo/in vitro toxicological data. nounced a collaboration with the Salk methods to facilitate decision making in ebTrack is based on the open-source Institute to validate the new Expressionist preclinical safety evaluation. PostgreSQL database engine and proRefiner module for whole genome analy“Both U.S. and EU have significant ingrammed in Java. The design of ebTrack sis, including epigenetic modifications. vestment in TGx and other new technolois based on the integration of three mod“We can collect data much faster than we gies, stimulated by the need to imcan analyze it, and a bioinformatprove drug development and get ics tool such as Genedata’s Refiner more medicines to meet medical Genome makes it possible for us to need,” says Lord from DiscoTox. integrate data from multiple data “This has been recognized globsets including RNA sequences, ally by governments, regulators DNA methylation and histone and the pharmaceutical industry. modifications, and visualize them In Europe there is also increasing fast,” says Bob Schmitz, research sensitivity to (and legislation on) associate at the Genomics Analysis the use of animals in drug and, Laboratory at the Salk Institute for especially, chemical safety assessBiological Studies. ment and this is driving efforts to Another all-inclusive omics use TGx and complimentary techdata management package is nologies to reduce animal experiGeneGo’s MetaCore, which inmentation. The major pharmaceucludes pathway analysis and data tical companies are multinational, mining tools to facilitate integraproviding good cross-talk between tion of genomics, proteomics and the initiatives with Europeans metabonomics data. GeneGo’s working in U.S.-based collaborasystems toxicology package, Toxtions and U.S. colleagues working Hunter includes a TGx database in EU-based programs.” combined with pathway analyTGx is a rapidly evolving field sis tools for lead optimization and biomarker validation, and as government, industry and acais suitable for the investigation demic institutions are developing of environmental contaminants and validating methods for earlyToxShield, a commercially available toxicogenomics database and drug candidates. To improve from Gene Logic, comes with predictive modeling software. stage safety assessment. While their system toxicology packages, TGx is not expected to replace ules: 1) databases, 2) analysis tools and GeneGo launched a partnership with the traditional toxicology methods, the hope 3) functional data modules that compile FDA known as MetaTox, which allows is that it will aid in the elimination of toxic large amounts of data from the public industry and government representacompounds from the drug pipeline and domain. While this tool was developed tives to discuss safety assessment issues, the discovery of new pathways of toxicity. primarily for toxicogenomics-driven including TGx data analysis. “TGx will be a standard part of the toxenvironmental health research, it is also As drug development companies are icology package in 10 years time, both in designed to handle data from the earlyincorporating high throughput technoloterms of prioritizing compounds in early stage drug development process. gies for safety assessment, the FDA is also discovery, as well as validating leads in To validate TGx as a new method in developing its own tools to review data later stages of development,” says Hewitt safety assessment, regulatory agencies from the Voluntary Genomics Data Subof Merck Serono. “But it will probably not and large pharmaceutical companies mission (VGDS) reports. Until recently, be replacing existing toxicology studies, have formed collaborations in the United the FDA relied on ArrayTrack, a comprejust be added as a “weight of evidence” States and Europe. The Predictive Safety hensive microarray data management, approach to add information on top of the Testing Consortium, initiated by the Critianalysis and interpretation system. The gold standard histopathology.” • www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 33 ]
Computational Development
Changing the Game of Collaborative Drug Discovery Barry Bunin’s Silicon Valley company fosters information release between pharmas to spur research on neglected diseases.
W
BY KEVIN DAVIES
hile there are occasional glimmers of pre-competitive cooperation between big pharma companies, few projects can match the tangible benefits achieved lately by Barry Bunin, Sean Ekins and colleagues at Collaborative Drug Discovery (CDD). Chief executive Bunin says CDD has reached a tipping point following the recent release of chemical datasets on malaria by GlaxoSmithKline (GSK) and tuberculosis (TB) by Novartis. But without a doubt the project Bunin and Ekins are most excited about is a new open model with Pfizer, which suggests that CDD’s approach can be extended to commercially relevant drug discovery, without disclosing proprietary chemical structures. CDD currently houses information on more than 3 million molecules. “We think this is game changing,” says Bunin, “an obvious experiment no one has tried before. Now others will see this and it can catalyze something beyond what we’ve done with Pfizer for the whole industry.” One can think of CDD as a scientific Facebook or LinkedIn, “a conversation around the data,” says Bunin, highlighting the matchmaking aspect of his organization that puts biologists in touch with chemists and vice versa. “We have no wet labs, and we don’t need to—there are enough wet labs out there! There’s more work to do than we have hours in the day, just doing the informatics damn good!” “A lot of academics had great ideas but couldn’t do drug development,” says Bunin. “We want to provide the infrastructure of a big pharma for the little guy—a full solution on the informatics and collaboration front. That was the idea of CDD.”
[34 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
CDD has signed up thousands of people, says Bunin, including UCSF, Columbia, Harvard, Cornell, Johns Hopkins, UCLA, and many academic screening centers. “A lot of innovation is happening at the outskirts,” says Bunin, “and we want to handle all their data. If they’re the world’s best biologist but don’t know the Lipinski Rule-of-Five, we have a computational chemist—Sean Ekins—who
Barry Bunin
understands the science and can form hypotheses and come up with new ideas. We complement people, either with technology they don’t have or the people or community they don’t have.” The Precedent Earlier this year, CDD added a new dimension to its offerings when GSK decided to open up its malaria data, making some 13,500 compounds from its Tres Cantos facility in Spain (see “Genomics provides the kick inside,” Bio•IT World, Nov 2003) freely available to scientists (through CDD and other sources) in the hope that other groups might approach certain candidates in ways that complewww.bio-itworld.com
ment or differ with its own approach. The Wall Street Journal likened the idea to the pharma equivalent of the Linux operating system. CDD’s Sylvia Ernst blogged, “The world of drug discovery has officially changed today… WOW!” The GSK release showed how “public data sharing invigorates the entire drug discovery ecosystem,” says Bunin, prompting CDD to provide free access to researchers interested in archiving and publishing all their data for the greater good of the community. “It’s a beautiful story, the intersection of commercial, economics, and humanitarian goals to combat malaria,” says Bunin. “With the barriers to precompetitive data sharing coming down, it is our wish to continue to receive and publish new data sets useful both to our client base and to the scientific community at large.” Tackling TB Worldwide Thanks to a $2 million grant from the Gates Foundation for TB research (on top of support from Lilly and the Founders Fund) Bunin has made CDD available for TB researchers worldwide. For TB, which claims some 1.8 million lives annually, “the main challenge is overcoming resistance and shortening the therapy,” says Bunin. “If you have to walk 20 miles to get a drug, you’re not going to do that.” The CDD community database is currently home to chemical screening data on nearly 1.5 million small molecules with associated cheminformatics properties from pharma, academia, literature and patents – ranging from malaria SAR data dating back to World War II to gene-family wide G-protein coupled receptor SAR, to the most recent results on Novartis’ anti-bacterial compounds. (continued on page 36)
Coming Soon... An online community in next generation sequencing and bioinformatics. www.ngsleaders.org NGS Leaders is organized by Cambridge Healthtech Associates.
For content partnership and sponsorship opportunities, contact Eric Glazer (eglazer@chacorporate.com).
11-10_BITW_Ad_Pages.indd 31 NGS_Leaders.indd 1
Founding Sponsor
10/28/2010 AM 9/9/2010 11:12:34 2:48:53 PM
Computational Development
CDD (continued from page 34)
The Pfizer collaboration that has Bunin and Ekins so excited dates back to late last year, when the pair asked Pfizer’s Chris Waller and Eric Gifford if they had ever tried open tools or descriptors for building computational models for various molecular properties such as ADME/ Tox. They said no. Ekins recalls, “This set in motion rigorous comparative studies by Pfizer’s Rishi Gupta that leveraged their massive high-throughput screening datasets for things like absorption, metabolic stability, toxicity etc.” Gupta found that comparable quality computational models could be generated using very large datasets (50,000 to 200,000 compounds) whether using open-type descriptors or commercial molecular descriptors. “We expected the commercial descriptors to be so much better than anything free, but in these examples, that was not the case,” says Ekins. Why should this be important for a company like CDD primarily interested in fostering data sharing and collaboration for neglected diseases? “Open descriptors and algorithms could enable the sharing of computational models between groups such as pharmas and academics working on neglected diseases like TB, malaria etc,” says Ekins. “These neglected disease researchers don’t have the luxury of such ADME/Tox models that could provide insights that might help produce better clinical candidates faster. Pfizer provided a proof of concept.” Bunin thinks the next big idea is to facilitate pharmas sharing their computational models (based on molecules they probably don’t want to share) with outside scientists to score their compounds for ADME/tox issues. “That may be a way off, but the proof-of-concept work now shows we do not need to use expensive commercial tools... This will make it easy to share and make such models available without any commercial software with anyone in the world.” In addition to benefiting neglected disease constituencies, CDD hopes to encourage pharmas to share metadata with other pharmas—without needing to reveal the compositional matter that allows
[36 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
the margins on successful drugs. This can help pharmas look at the molecular properties that can make the difference between clinical success or failure. After ensuring people cannot back calculate the actual molecular structures in the models, Ekins says they hope to “enable coverage of the massive chemical space and ultimately enable better predictions.” That could “open up priceless data” if the compound structures are protected, data that academics could never generate on this scale from the best data sets from the biggest pharmas. “ Fr e e t e c h n o l o g i e s on the web for this kind of thing are just as good as commercial software costing big companies millions of dollars in license fees. Therefore they can do Sean Ekins the same modeling at zero cost. If this is the case here, there may be other places they can cut costs using free tools that the companies have not explored aggressively”—something shareholders should demand, Ekins says. He adds that the study shows that pharmas can collaborate and share their models, so they don’t have to spend large sums of money doing the same kinds of repetitive work. “Folks will be competing where and only where they are really innovating,” he says. “This is how the open source meme plays out in a way that works in this complex IP space.” This begs the question: Could pharmas collaborate and share their models so they did not all have to spend money doing the same kinds of repetitive work separately in each company? “We think this is game changing,” says Ekins. “We just did the obvious experiment no one tried before. Now others will see this and it could catalyze something beyond what we did in this collaboration with Pfizer.” Cloud Collaboration Bunin says CDD was one of the first organizations using the Cloud for drug discovery data more than six years ago, called the CDD Vault. “We had GSK’s private data for six months before they went public with it,” he says. The CDD Vault www.bio-itworld.com
allows his team to selectively share data with anyone around the world. “You can log into multiple vaults, push experiments and collaborate from vault to vault.” (Bunin says anyone with Excel can get data in and out just by hovering over the molecule—no plug-ins required.) “We don’t host it in our own backyard,” he says. “We have a professional co-location facility, with armed security guards and concrete walls, similar to Facebook. The process of having a hosted system has become a commodity today. In our case, it’s such a custom app, we didn’t want it out in Amazon. We have our own physical server, it’s backed up. If we had ten times as many molecules, we might do it differently, but it’s worked for thousands of people so far.” CDD Public contains some 52 datasets (at last count), including 1,700 drugs from ex-Pfizer chemist Chris Lipinski—the first such submission. “We don’t have any labs, we’re just facilitating, but we decided on our own dime to buy the compounds,” says Bunin. “And we found known drugs that could almost completely reverse resistance against these strains of malaria. It’s been through Phase 2 trials, and that could save years off a new drug from scratch. If you’re a 4-year-old with a resistant form of malaria, maybe you could take this drug that already exists!” Meanwhile, Brian Roth (University of North Carolina), whom Bunin calls “one of the best CNS researchers in the world,” has supplied data on more than 45,000 compounds targeting G-protein-coupled receptors. “It’s a dataset we’re proud of, but it’s one of over 52 public datasets,” says Bunin, Novartis being the latest. Bunin is anxious to get his story out in the hope of bringing other pharmas along, which he says “will allow the whole industry (and thus human health) to take a giant leap forward. Now folks can both collaborate and compete at the same time!” As for future goals, Bunin hopes to keep attracting more users and more content, and spurring further collaboration within the pharmaceutical community. •
Next-Generation
Sequencing Technologies : Applications and Markets Ken Rubenstein, PhD Next-generation sequencing (NGS) has taken the worldwide biomedical research community by storm. Funding is relatively abundant for the moment, collaborative programs and consortia abound, and early results in many cases appear to justify all the activity. Many observers sense imminent new revelations and even paradigm shifts offering significant improvements in the understanding and treatment of disease. Discussed in this report:
• Evolution of NGS technologies and applications • Applications of NGS in basic and applied research • Issues related to the popularity, viability, and cost of NGS applications • Key market-related issues in the field • Survey results and views among current and prospective NGS users • Interview transcripts with industry experts
Reference keycode BTW when ordering this report.
InsightPharmaReports.com Insight Pharma Reports, a division of Cambridge Healthtech Institute 250 First Avenue • Suite 300 • Needham, MA 02494 • 781-972-5444 • InsightPharmaReports.com
11-10_BITW_Ad_Pages.indd 33 NextGen_Seq_BioIT_Ad.indd 1
10/28/2010 10/28/2010 11:12:53 9:28:32 AM
IT / Workflow
Aspera’s fasp Track for High-Speed Data Delivery Data transfer protocol facilitating global data access and collaboration. oftware engineers Michelle Munson and Serban Simu , the co-founders of Aspera in Emeryville, California, both worked in application-level networking since leaving graduate school, and were exposed to the problem of transporting data over wide area networks (WANs) early in their careers. “We’d worked on related areas, particularly in transferring digital media content, and knew there was an unsolved problem,” says Munson. That problem boiled down to: Why doesn’t the Transmission Control Protocol (TCP) work well for moving bulk data over WANs? And what were the alternatives? “We didn’t originally set out to make a transport, as we’d assumed there’d be open-source technologies for reliable transfer,” says Munson. Indeed there are, but Aspera, a bootstrapped company with roots in Munson’s garage, argues that the performance of its commercial software outstrips the open-source alternatives (see, “We Can’t Fix the Internet”). Munson claims that the typical increases in speed experienced life sciences companies based on network capacities and bottlenecks range from 10-fold to 100-fold. When Munson and Simu investigated the alternatives for high-speed data transport, they found that none of the available transport approaches held up. TCP is a reliable transport protocol that powers FTP, HTTP, CIFS and NFS, SCP and RSYNC, among others. But given the fundamental problems of TCP over networks with high round-trip time and packet loss, which severely limits the speed of large data transfer over WANs, Munson
[38 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
and Simu set out to engineer a new protocol that did not have any artificial bottlenecks under WAN conditions. The pair was able to forego any external investment or venture capital because of early customers that allowed the Michelle Munson, co-founder of Aspera company to grow software digital media (not to mention others that around the technology. The first two types have ever increasing quantities of data to of companies to test our technology and share) that is pushing research and develput us over the edge were affiliated with opment around the transport, she says. the Department of Defense [DOD] and media/entertainment.” (The DoD conFollowing Protocol nection was somewhat serendipitous, and Aspera’s fasp is a communications protocame from Munson knowing a then small col that aims to satisfy the burning need contractor in DOD intelligence that was posed by two fundamental problems having difficulty transporting data over surrounding the movement of file-based networks.) data from storage A to B across a netAbout two years ago, the genomics work—reliability and bandwidth. “There’s /life sciences community discovered a fundamental efficiency problem, and Aspera, becoming the firm’s third key then there’s a congestion or bandwidth vertical market—particularly in the field control problem, because the user doesn’t of next-generation sequencing (NGS). know the bandwidth or the other traffic, Each vertical has its own issues, but making it unsafe to blast traffic over the Munson says the problems confronting network,” says Munson. the intelligence community—the collec“If you use a TCP protocol, you’d tion and dissemination of unstructured experience a severe bottleneck due to data such as video surveillance and congestion control as it affects speed due high-resolution imagery—are not funto round trip delay and packet loss,” says damentally different from life sciences: Munson. “That’s the baseline.” both groups need to share and exchange Munson says there have been many large amounts of data over global Internet attempts to build various types of simple networks in rapid time. “data blasters”—reliable transport alterMunson and Simu wrote the first natives to TCP over IP or the User Dataversion of the fasp protocol and remain gram Protocol (UDP)—in which the data intimately involved in technical product traveling over an unreliable IP channel development although, with a twinge of has reliability implemented in a protocol regret, Munson says her coding days are above IP. But there’s a big drawback: behind her. It’s largely the impetus of key “From the controls perspective—i.e. how communities such as life sciences and LEAH FASTEN
S
BY KEVIN DAVIES
www.bio-itworld.com
such blasters re-send dropped packets over the IP network—it is extremely inefficient,” says Munson. “They generate heavy duplicate transmission of the data and tend to overrun the network bandwidth.” “This was shown in the literature and is what we confirmed when testing many open-source solutions, and this ultimately led us to create fasp.” Aspera elected to implement the fasp software protocol specifically as an App protocol rather than in the network stack as a driver. “We chose to do that to make it available so end users on their computers could use it without having admin rights to install and run,” says Munson. “That was important: It allowed our technology to be used in a very simple way and users to start experimenting with this.” Aspera has been deployed by the European Bioinformatics Institute (EBI), the Broad Institute, and other companies and academic groups, including the University of Washington, University of Maryland, and Memorial-Sloan Kettering in New York. Among the firm’s highest profile successes is work for the National Center for Biotechnology Information (NCBI) at the NIH as part of the 1000 Genomes Project. “The one that exposed our software to the community (and caused us to come to Bio-IT World Expo in 2009) was NCBI,” says Munson. The 1000 Genomes Project requires transferring and exchanging data from institute to institute, across continents. Users accessing 1000 Genomes data visit one of the four public websites that disseminate more than 7 Terabases of genomic data. They can browse and download data with FTP and/or the Aspera protocol—using the Aspera Connect free web browser plug-in. “In those cases, the dimensions of improvement over FTP go up with more bandwidth and more difficult networks/ distance,” says Munson. Whereas the speed of FTP is theoretically fixed based on round-trip time and packet loss, fasp fills the available bandwidth. “The difference is the bottleneck speed of FTP and bandwidth capacity,” says Munson. “fasp does not compress the data and achieves its speed up in the transport efficiency.” For example, from the US to Australia,
the FTP bottleneck speeds is 1 Megabit/ second or less. With Aspera on a 100 Mbps link with all bandwidth available, however, it’s virtually 100 Megabits/sec. Munson says BGI Shenzhen, the high-capacity Chinese genome research center, will soon become another hub on this data transfer pathway. BGI’s location makes an optimal data transfer solution essential, because there is typically a 200400 millisecond round-trip time and high packet loss into China. “As you go into China, especially mainland China, the wide area networking problem is unbelievable,” says Munson. For BGI, “on those types of networks,
User Needs To take advantage of fasp, users require no special networking, hardware, or fiber channels. “We run over standard IP networks,” says Munson. “The user experiences the fasp software as a file transfer application or an embedded transport in someone else’s application.” For the most part, the fasp protocol doesn’t vary from vertical to vertical, life sciences to digital media. But there are some special software adaptations in life sciences, says Munson. “It is the same transport core, but what is emphasized and refined by LS users has to do with upper end speeds. We have an adap-
WAN HTTP for Small Data
EC2 HPC
WAN Aspera On Demand EC2
http
Aspera’s fasp for Bulk Data
Amazon Simple Storage Service
WAN http
http
Local Disks
WAN HTTP for Small Data
Aspera On Demand EC2
Aspera’s fasp for Bulk Data
Elastic Block Store - EBS (NAS)
AWS Data Center
Aspera will allow users to transfer file data over the WAN and write directly to S3.
large data transfer is not only inefficient, it’s often impossible,” says Munson, because of the distance and packet loss. “There’s a massive opportunity and capability to process the data on these locations, but the problem in moving data to and from these locations becomes paramount. For an economy of scale, shipping disks won’t work.” Another niche that Munson expects to fill is where two research or medical institutes are sharing data between each other. Formerly they might have used Unix SCP or R-sync (open source). But Aspera can be used like a Unix utility, while transferring using the fasp protocol, which allows easy automation of data transfer between institutes. www.bio-itworld.com
tive rate control that adjusts the rate of transfer to match the available network bandwidth and disk throughput.” That is especially important on 1-10 Gbps networks, where the network capacity often outstrips the file system or disk I/O speed. This is an important issue in life sciences. Network bandwidth is quite large, and there is access to Internet 2, so the transfer bottleneck using fasp becomes shared access to the disk system as data goes in and out. Aspera’s rate control has both disk-based and network-based adaptation components. “We released the disk-space component during the time we’ve been working with LS community,” says Munson. Recently, Aspera was used in its first
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 39 ]
IT / Workflow single transfer session of more than 10 Terabytes of data. “Theoretically fasp can transfer data of any size, but in practice, single transfer sessions between institutes have gone up from hundreds of gigabytes in our first year, to now as high as 12 Terabytes at a time in a single transfer,” she says. “We made some architectural changes in the way our software was implemented to accommodate that. We have no limits today in our session sizes.” Aspera is also working closely with the data storage community to establish
benchmarks for the movement of ultra large file sets over 10 Gbps networks and beyond—including firms such as EMC, HP, NetApp, Isilon, BlueArc, and Panasas. Cloud Traffic Aspera’s early success in facilitating the transfer and movement of huge datasets begs the question of whether it can assist users in leveraging the Cloud? “Absolutely, and transfer of data to and from the Cloud is one of the most pressing challenges,” says Munson. Aspera’s On
Demand product enables data transfer to Amazon Web Services (AWS) at speeds (up to a current practical limit) of several hundred Mbps. But Munson says, “We are coming up against technology limitations the way the Cloud is currently deployed, in terms of directly reading and writing large file data to persistent storage.” That said, Amazon is moving very rapidly, she says, and improvements are on the way. “In the near future, users will be able to transfer file data over the WAN and write directly into S3 within the Aspera application, and at high speed.” •
‘We Can’t Fix the Internet’ “Technology is a balancing act between access and cost,” Bhavik Vyas, Aspera’s director of technology sales, told attendees at the second Bio-IT World Europe conference in Germany in October. From next-gen sequencing to medical imaging and media, managing data is about size, backups, and reliability. The key problems are: 1) Collaboration requires the Internet; 2) Data transfer is slow over the Internet via public or private WANs; and 3) Fast networks typically have not only slow transport but also very slow storage. “We can’t fix the Internet—no-one can—but we can tackle (2) and (3),” said Vyas. Vyas lays the blame with slow data transfer over the Internet on TCP for lost productivity and inefficiency. TCP has wellknown bottlenecks—high round-trip times (RTT) and packet loss rates, especially on high-bandwidth WANs. “The further you are from your data, the slower TCP will go,” he said. And while there are open-source protocols to help avoid congestion, they typically have “high inefficiency and catastrophic effects on packet loss,” said Vyas. For example, the RTT from London to New York has a latency time of 60 milliseconds. As TCP performance wanes with distance, the rate can be calculated. Combining distance with loss leads to terrible performance.
Performance Options One of the options to improve performance is to explore the use of commercial or academic high-speed TCP variants, such as CUBIC, BIC, Reno, FAST TCP, and H-TCP. These can reduce congestions and increase throughput, but on heavily congested WANs, modifying TCP becomes difficult, because it has to be deployed across all workstations—and packet loss can ensue, and the accelerated speed is still impaired on lossy networks. Another problem is that while TCP ensures no data loss, everything is sent in sequence. This results in a stop or slow down for every lost packet. Aspera argues that sequential delivery isn’t necessary, and network capacity should be utilized.
[40 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
A second option is the use of UDP-based transport applications. Many open-source and commercial technologies use UDP for moving data reliably and quickly (in a connection-less way), checking reliability and re-sending data if necessary. But as Vyas pointed out, “If the cost of the improvement is you send 10x more data than you receive (in duplicate retransmission and bandwidth overdrive), then the cost benefit isn’t really realized.” In other words, an architecture that facilitates data blasting creates its own problems, limiting the return on investment on multi-Gigabit Ethernet (GbE) and 10-GbE networks. Developed nine years ago by Yunhong Gu, UDT is a UDP Data Transfer application protocol that is faster than TCP, but Vyas argues there are performance issues in some typical WANs, such that the network can appear ‘full’ but with un-needed data. A company called VeryCloud provides commercial service for UDT. A new reliable transport option is Aspera’s fasp protocol, enabling access to and management of data. Vyas calls it purpose-built, reliable, with a theoretically infinite transfer speed and zero receiving cost.
Comparison Notes Vyas presented metrics comparing the speed and bandwidth cost of fasp compared to TCP (Reno TCP and FAST TCP) and UDT. Aspera’s fasp achieves a throughput of about 90-93 percent depending on round-trip time (anywhere from 20-1000 ms and packet loss of just 5-10%). UDT, by contrast, has low throughput and efficiency (less than 50%) over networks with high round-trip time and packet loss. Another issue is the “last foot” of the data transport pipeline—storage at the end user. Vyas cited benchmark studies with storage vendor EMC and its Celerra product in which they obtained 3 Gbit/second large data transfer rates over worstcase global WANs with round-trip time of 300 milliseconds, and packet loss rates of 5%. “You can get these speeds if you want to,” said Vyas. K.D.
[GUEST COMMENTARY ]
Don’t Neglect Your Processes Process excellence can lead to gains in drug discovery. BY JOCHEN KOENIG
The business model of developing and marketing innovative drugs is under siege. Despite increasing investments into R&D, pharma companies seem to fail to invent sufficient numbers of new products that would a) address currently unmet medical needs or provide significant advantages over the current standard of care and b) generate the profits—lost due to generics competition and pricing pressure from payers—required to finance further innovation. Perhaps it is time to consider the business implications of how science is performed and similarly apply scientific reasoning to improve business processes in R&D. Approaches to reduce the investment needed for bringing a new drug to market can be characterized by the degree to which they address the success rate and the efficiency of drug development. Progress toward increasing overall success rates, e.g. by reducing late stage attrition due to safety or efficacy, is a worthwhile occupation. Public-private partnerships like the Critical Path Institute (C-Path, www.c-path.org) and the Innovative Medicines Initiative (IMI, http://imi. europa.eu) underscore this notion. However, among R&D staff it seems little individual benefits are earned by working to increase the efficiency by optimizing the drug development process. To some degree, this is because scientists in the biomedical and chemistry field traditionally value (or are valued by) the hours they put in rather than efficiency. Process-oriented, continuous improvement techniques like business process management (BPM), lean or Six Sigma are often seen as a threat to innovation rather than an enabler. Indeed, according to a recent model (Paul et al. Nat Rev Drug Discov. 2010 Mar) delineating the influences on drug development cost, a mere 5% increase in efficiency (immediate cost and cycle times) would translate to savings of more than $150 million per
newly approved drug. This amount corresponds to the out-of-pocket cost of funding eight projects from target through preclinical testing. Naturally, certain processes are more amenable to standardization and optimization than others, including reducing administrative efforts, increasing transparency and quality of agreed services, and sharing best practices. In contrast, bad processes can be a major drain on work motivation and are likely to spur evasion strategies, cannibalizing the intended gains. Process Fine Tuning One example of improving research administration involves an E-procurement solution IDS Scheer realized at the German Cancer Centre (DKFZ) in Heidelberg. The project analyzed the procurement process for consumables, standard services and capital equipment needed by research groups or individual scientists. Following analysis, the process was simplified and largely automated as a software system. Instead of manually transferring data from multiple catalogues into paper order forms, researchers now can browse and search the offers of regular suppliers within a single web-based application. Comparing offers—including potential special conditions for delivery or discounts—is now straightforward and the error rates of orders and booking the cost against the correct budget have been reduced. As a result research groups now have the flexibility to purchase goods according to their need and budget and at the same time benefit from the negotiation power and administrative backbone of the DKFZ. Finally, researchers spend significantly less time on the procurement process—time they can spend on research. Additional processes in R&D with potential to benefit from standardization involve technology and logistics services or the clinical supply chain. www.bio-itworld.com
Continuous improvement (CI) methodologies fundamentally provide guidelines and tools for applying the scientific method—collecting and analyzing information on a system to facilitate predictions about its future state and response to changes—to the way recurring tasks are performed. These recurring tasks can be as diverse as running certain assays or entering data into some computer system. Because nearly all aspects of drug R&D interact with or are dependent on IT systems, the proximity of BPM to IT is a definite plus. The plethora of systems in R&D environments often leads to highly fragmented or inaccessible information and thus creates a roadblock to efficient decision-making that could be avoided if process-centric strategies like service-oriented architecture (SOA) were followed. BPM provides a systematic approach to collecting and describing processes, their time line and associated responsibilities, IT systems, input and output, and interfaces. This ensures that the process description is comprehensive and consistent, for example with respect to nomenclature for roles and systems, and thus is meaningful in a context beyond the isolated project. SOPs, regulatory documentation, and training materials can be generated automatically based on such process descriptions. Detailed process understanding is a prerequisite for any improvement. Wellrun CI projects will avoid secret society jargon and instead directly involve process owners and process participants in describing and modelling the business process in question, as well as in evaluating the process and defining improvements. This transparent and collaborative approach values and leverages the team’s experience and domain knowledge and lays the foundation for sustainable increases in efficiency and quality. New drugs will not be invented on process excellence alone. However, process excellence is a strategy to free up resources to support the best science and sharpen the profile of R&D organizations compared to outsourcing competition. • Jochen Koenig is business development manager, Pharmaceuticals at IDS Scheer. Email: jochen.koenig@ids-scheer.com NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 41 ]
IT / Workflow [GUEST COMMENTARY ]
Cloud Computing Provides Booster Shot for Health IT Not all health-IT data is fit for the cloud, but a lot is. BY ERIC OLDEN
As health sciences companies look for ways to improve collaboration and reduce new product development costs, cloud computing and Software-as-a-Service (SaaS) could be just what the doctor ordered. According to a June 2010 report by IDC, the SaaS market is poised to jump from $13.1 billion in global revenue last year to $40.5 billion by 2014. “The SaaS model has become mainstream, and is quickly coming to dominate the planning—from R&D, to sales quotas, to partnering, channels and distribution—of all software and services vendors,” said Robert Mahowald, vice president, SaaS and cloud services research at IDC. SaaS and the cloud are transforming collaboration. Simply put, the cloud enables different teams, departments, and organizations to come together quickly and cheaply to solve a problem. Speedto-market means money in the health sciences business. Therefore, the ability to deploy new technology in two months— rather than two years—provides a major advantage over competitors who are stuck building and maintaining in-house infrastructures. Many organizations are planning to adopt cloud applications and are laying the infrastructures to manage them. The advantages fall into two major categories: •• Collaboration—Occurring across the health sciences industry as companies look to lower costs and speed time-tomarket. The cloud is a perfect way to access “ready-to-go” applications with the ability to get business processes up and running fast. •• IT as Competitive Advantage—Most mid-sized enterprises cannot afford to leverage the old ways of on-premises software. Cloud applications are faster to deploy, increasing agility, so compa-
[42 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
nies can leverage them for a competitive advantage. The Cloud and Health Sciences: A Match? While adoption of cloud applications has been notable in health sciences sales, marketing, clinical trials, and supply chain, some applications are not ready to move out of the datacenter. A good example is those that must be FDA certified, because few of the major cloud vendors actually understand how the certification process works and even fewer have been certified. In this instance, it’s better to first survey the applications and data and then rank them in order of sensitivity and compliance needs. Companies should start with the applications that are lowest on the risk/non-regulated scale—this puts a foundation in place that allows management access to those applications and data. Once an organization gains comfort with the cloud model—and more importantly has the infrastructure in place to secure and audit access—they can move down the scale. Since 70% of most IT budgets are spent on infrastructure maintenance, enterprises can realize very real benefits from outsourcing the “plumbing.” Of course, there will be an inflection point where risk/reward analysis just won’t make sense or regulations clearly prohibit putting something into the cloud, but companies will be surprised at the vast amount of information that is cloud-ready. One thing most companies find particularly appealing about the cloud is that the service provider becomes highly accountable for service levels. If a service provider fails to meet expectations, the service can be easily discontinued. Not only does this hold cloud providers to a much higher standard, it also gets companies thinking about suppliers more as partners than vendors. While expectations will differ based on functions and industries, health sciences www.bio-itworld.com
companies should evaluate whether a provider understands the regulatory and competitive aspects of the business and the extreme importance of compliance and security. In terms of functionality, a cloud vendor should understand how collaboration affects the core of your business, which should be reflected in the way the technology is designed and with what applications it can integrate. There’s always a distinct advantage when you can work with a vendor that brings both industry and domain experience to the table. Security in the Cloud? While it’s fairly easy to outsource infrastructure, doing so does not mean the company is no longer responsible for managing security and risks for the confidential data contained in its IT systems. Good security practices are based on risk mitigation, which starts with understanding the scope of your exposure. This requires above all visibility into who is accessing confidential data and how. Unfortunately, this is an area where the cloud is lacking, since users can access data directly over the Internet outside the purview of IT controls. This is an unacceptable situation for sensitive data. The answer is to either not put that data into the cloud, or to implement controls that enforce policy, governance, and provide visibility. Technology exists that provides controls for the cloud and consistent visibility needed for audit and compliance. In the end, the benefits of SaaS and the cloud have to be weighed against the risk of doing so. Not all data will be suited for the cloud, but a surprisingly large amount of data can, and eventually will be, on the cloud because it is either more usable or more efficient by doing so. • Eric Olden is CEO of Symplified, which enables companies to extend and enforce identity and access management policies on Cloud applications. He can be reached at eolden@Symplified.com.
New Products
The 2011 Bio-IT World Conference & Expo will feature a New Product Pavilion on the Expo floor—a central place to reach over 1,800 conference attendees. See p 30-31.
CRO-Enhanced Clinical Suite Version 6.1 of Trianz Solutions’ Acceliant eClinical Suite is now available. This release benefits from Trianz’s relationship with REGISTRAT-MAPI, a global late phase CRO, as it incorporates a number of functional enhancements and a new UI architecture, identified by REGISTRAT-MAPI. The Acceliant eClinical Suite is a comprehensive, web-based, end-to-end platform with the ability to integrate electronic data capture, and data and document management with traditional paper-based processes for clinical trials. Product: Acceliant eClinical Suite v6.1 Company: Trianz Solutions For more information: http://acceliant.trianz.com
Sequence Analysis Software Symyx System Update
Strand Scientific Intelligence has released Avadis NGS, a software application for next-generation sequence (NGS) analysis. Avadis NGS is an application focused on ChIP-SEQ, RNA-SEQ, and genetic variation analysis that enables its users to assimilate large amounts of NGS data and ascertain deep biologi-
Accelrys has announced the release of version 3.3 of the Isentris data access, analysis, and decision support system (previously Symyx Isentris). Isentris 3.3 drives faster, betterinformed scientific decisions by enabling scientists to display, manipulate, and compare spectral, chromatographic, and XY graphical data. Product: Isentris v 3.3 Company: Accelrys For more information: www.symyx.com/isentris
cal insights using powerful statistics and interactive data visualizations in a state-ofthe-art genome browser, and downstream analyses such as GO, pathways, and GSEA. Product: Avadis NGS Company: Strand Scientific Intelligence For more information: www.strandsi.com
GPU Applications AccelerEyes has released version 1.5.1 of Jacket. This release of Jacket will allow a whole new collection of applications to benefit from GPU computing and NVIDIAs new FERMI architecture. AccelerEyes has advanced Jacket’s compiler technology and made many additions to the platforms function library. New features include IMFILTER, 2-D filter-
ing of multidimensional images; MEDFILT2, 2-D median filtering; INTERSECT, Find set intersection of two vectors; SORTKEYS, Sorts a column matrix based on input keys; and PLANEROT, Givens plane rotation. Product: Jacket v. 1.5.1 Company: AccelerEyes For more information: www.accelereyes.com
Advertiser Index Advertiser
Page #
Bio-IT World Conference and Expo . . . . . . . . . . . 30-31 Bio-ITWorldExpo.com
Bio-IT World Best Practices Awards . . . . . . . . . . . . 47 Bio-itworld.com/bestpractices
BioPartnering China . . . . . . . . . . . . . . . . . . . . . . . 19 Techvision.com/bpc
Cmed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Cmedtechnology.com/t5-preview
Convey Computer . . . . . . . . . . . . . . . . . . . . . . . . . 26 Conveycomputer.com
Advertiser
Page #
Educational Opportunities . . . . . . . . . . . . . . . . 44-45 Bio-ITWorld.com
Advertiser
Page #
Quantum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-23 Quantum.com/stornext
Invest in Ontario . . . . . . . . . . . . . . . . . . . . . . . Cover 2 Investinontario.com/research
Insight Pharma Reports . . . . . . . . . . . . . . . . . . . . . 37 InsightPharmaReports.com
R Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Rsystemsinc.com
SGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Sgi.com/altixuv
LabAutomation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 SLAS.org/LA11
Symyx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cover 4 Blog.symyx.com
NGS Leaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 NGSLeaders.org
www.bio-itworld.com
NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[ 43 ]
Educational Opportunities Keep abreast of the variety of educational events in the life science industry that will help you with your business and professional needs. To preview a more in-depth listing of educational offerings, visit the “Events” section of bio-itworld.com. To list an educational event, email marketing_chmg@chimediagroup.com.
X-Gen Congress and Exposition March 14-18, 2011 | San Diego, CA Biopharma Licensing Congress April 5-6, 2011 | Philadelphia, PA
Featured Events
Strategic Alliance Management Congress April 6-7, 2011 | Philadelphia, PA
CHI Events For more information on these conferences and other CHI events, visit healthtech.com.
February 23-25, 2011 | San Francisco, CA
Competitive Intelligence Summit April 6-7, 2011 | Philadelphia, PA January 10-14, 2011 | Coronado, CA Cambridge Healthtech Institute’s Tenth Annual
Biobanking: Maximizing Your Investment December 6-8, 2010 | Providence, RI The Compound Management Forum December 7-8, 2010 | Providence, RI Clinical Project Management December 8-9, 2010 | Philadelphia, PA
Summit for Clinical Trials Operations Executives (SCOPE) February 7-9, 2011 | Coral Gables, FL Electronic Data in Clinical Trials: Optimize Clinical Trials through Improved Data Collection and Utilization February 7-8, 2011 | Coral Gables, FL
CONFERENCE & EXPO ’11 April 12 – 14, 2011 • World Trade Center • Boston, MA
April 12-14, 2011 | Boston, MA
Enabling Technology. Leveraging Data. Transforming Medicine.
PEGS: the essential protein engineering summit May 9-13, 2011 • Boston, MA
Barnett Educational Services Visit BarnettInternational.com for detailed information on Barnett’s live seminars, interactive web seminars, on-site training programs, customized eLearning development services and publications.
Barnett Live Seminars Clinical Trials for Pharmaceuticals: Design and Development December 2-3, 2010 | Boston, MA Conducting Clinical Trials in Emerging Regions December 2-3, 2010 | Boston, MA
Data Management in the Electronic Data Capture Arena December 2-3, 2010 | San Diego, CA Drug Safety and Pharmacovigilance December 2-3, 2010 | Boston, MA Negotiation Skills for Clinical Research Professionals December 2-3, 2010 | San Diego, CA
Parexel’s
Bio/Pharma R&D Statistical Sourcebook 2010/2011
Just Released! Now Shipping
Order Toda y!
THE LEADING RESOURCE for statistics, trends, and proprietary market intelligence and analysis on the biopharmaceutical industry! To order, please visit BarnettInternational.com/Publications
[44 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
Managing and Conducting Global Clinical Trials December 6-7, 2010 | Philadelphia, PA Mastering Cost Management for Global Clinical Trials December 6-7, 2010 | San Diego, CA Regulatory Intelligence December 8, 2010 | Philadelphia, PA Drug Development & FDA Regulations December 9-10, 2010 | Philadelphia, PA Patient Recruitment and Retention December 13-14, 2010 | San Francisco, CA Working with CROs December 13-14, 2010 | Philadelphia, PA
Barnett Web Seminars Subject Recruitment: Proactive Project Plans & Issues Management December 14, 2010 How to Prepare and Submit a Bullet Proof 510(k) Submission December 14, 2010 Use of Notes to File in Clinical Trial Essential Documentation December 14, 2010 Critical Decision Points in Design & Conduct of Patient Registries December 15, 2010 Monitoring Plan Development December 16, 2010 Introduction to the FDA December 16, 2010
Webcasts and White Papers Visit bio-itworld.com to browse our extensive list of complimentary Life Science white papers, podcasts and webcasts. To learn more about developing a multimedia lead generating solution, contact marketing_chmg@ chimediagroup.com.
Webcast Cloud Computing Transformation in Life Sciences/ Pharmaceutical Sponsored by: HP Cloud computing is Internet-based computing delivering shared resources, software, and information to users on demand. Cloud computing can help drive infrastructure centralization and efficiency with opportunity for major productivity gains in solving Life Science and Pharmaceutical challenges. In this 60 minute web-based seminar, you will learn about: • Leveraging the “cloud” to gain competitive advantage in drug discovery, personalized medicine, translational medicine, personal genomics and basic life science discovery, including clinical trials. • Innovative advancements coming from full service “cloud providers” like Hewlett Packard and partners • Peer survey results showing how cloud computing will create a major paradigm shift for bioinformatics Visit: www.bio-itworld.com to download
Whitepapers Improving Research Productivity with Scale-out Storage for Science Sponsored by Isilon The explosive growth in data from next generation sequencers, combined with the increased multi-disciplinary nature of life sciences research, the wide-scale adoption of server virtualization, and the need for faster research decision making, are placing new demands on life sciences IT infrastructures. Download this paper to get: • An overview of the IT challenges facing life sciences organizations today • A detailed examination of TCO issues associated with using current IT infrastructures • Insight into research productivity and costsaving benefits gained from a storage refresh. Visit: www.bio-itworld.com to download
Hybrid-Core Computing: Taming Data Deluge Sponsored by Convey Computer A new; innovative architecture exists pairing classic Intel® x86 microprocessors with a coprocessor comprised of FPGAs creating the world’s first hybridcore computer, the Convey HC-1. Particular algorithms – like DNA sequence alignment– are optimized and translated into code that’s loadable onto the FPGAs. This advancement enables researchers to: • Work extraordinarily faster while tackling new research problems; • Dramatically reduce costs (cooling, power, space, etc.) by using fewer computers. Also, read about a Smith-Waterman algorithm 172x faster than the best software implementation on conventional servers representing the fastest performance on a single system to date. Visit: www.bio-itworld.com to download
For more details on this newly published report, visit InsightPharmaReports.com
Web Symposia Series covers a broad array of topics within the life sciences and drug development enterprise. • Register for upcoming web symposia • Listen to recorded web events
• Purchase a DVD or Electronic Version • Sponsor a symposium on a topic of your choice
For details on the Web Symposia Series, visit www.bio-itworldsymposia.com or email marketing_CHMG@chimediagroup.com
Conference CDs & DVDs Cambridge Healthtech Institute offers a variety of CDs and DVDs covering specific topics presented at CHI conference events. For a detailed listing and to order, visit healthtech.com/conferences/ compactdiscs.aspx
www.bio-itworld.com
Bio-IT World Ninth annual conference focusing on enabling technology, leveraging data, transforming medicine. Alliance Management Congress Focused on how critical and essential alliances are to an organization’s business. Biopharmaceutical Change Control The 3-track conference event looks at manufacturing change and its potential consequences and industry experiences with handling this and harnessing it to its advantage. NOVEMBER | DECEMBER 2010
BIO•IT WORLD
[45 ]
The Russell Transcript
Ready for the GPGPU Revolution? JOHN RUSSELL
S
trolling through the poster section at September’s GPU Technology Conference*, one was struck by two things—the sheer diversity of applications being sped up by graphics processors (GPU) and the large number of life sciences applications represented. Graphics processors used to be just that, specialized devices intended to speed and enhance computer graphics. The first GPUs were hard to program and mostly used to accelerate VGA (eons ago, I know) computer displays. A mess of companies (chip and board makers) jostled for sway in the early graphics acceleration market. Nvidia and ATI emerged as the big guns, and ATI was acquired by chip-maker AMD in 2006. Just this summer AMD reported it would retire the ATI brand. Nvidia is still a juggernaut in the space. This year’s GPU Technology conference drew roughly 2000 attendees from 50 countries and was distinctly dominated by programmers and scientists with relatively few marketers. Several things have happened to propel GPUs forward. The devices themselves, particularly their programmability (e.g. Nvidia’s CUDA architecture), have advanced dramatically. Traditionally, GPUs were designed to do fewer things, but to do them very fast, and to do them in parallel on very many cores. Conventional CPUs—Intel’s x86 architecture dominates—do many things very well, but do them one at a time, or in the case of multi-core CPUs do only a few at a time. It turns out GPU architecture is ideal for many scientific calculations particularly those speeded by parallelizing execution. The other game-changer is the gush of data from modern instruments; it has overwhelmed traditional CPU capacity and forced researchers to attack data-intensive computation with massive clusters. This has two key drawbacks. Performance, though improved, doesn’t scale especially well and certainly not commensurate with the data growth. Second and perhaps most importantly, big clusters greedily consume power, cooling and space which sends datacenter costs skyrocketing. The cost of the servers is almost inconsequential by comparison. *Sept. 20-23, 2010, San Jose Convention Center
[46 ]
BIO•IT WORLD
NOVEMBER | DECEMBER 2010
www.bio-itworld.com
Enter GPGPU (not that we need another acronym) which stands for General-Purpose computation on Graphics Processing Units. Once specially designed for computer graphics and difficult to program, today’s GPUs are general-purpose parallel processors with support for accessible programming interfaces and industry-standard languages such as C. Developers who port their applications to GPUs often achieve speedups of orders of magnitude,” according to the GPGPU.org website. Modeling Potential As if to underscore GPGPU potential in life science, one of the keynotes was delivered by Klaus Schulten, from the department of physics and theoretical and computational biophysics group, University of Illinois, Urbana-Champaign. “When we realized the potential of GPUs we decided early we wanted to make three uses of potential,” Schulten told the gathering. “The first was we wanted to increase the accuracy of our simulation. The second was we wanted speed up simulation to make calculating biological systems a more convenient which was possible because of the speed gains…[L]ast we wanted to open doors to new fields to tackle problems that were not possible before because computing took too long.” That was in 2007, when CUDA was introduced. Schulten then walked through several illustrative case histories including one to identify Swine flu resistance to the Tamiflu vaccine. “At the high point of the epidemic it was realized that the virus had become resistant,” said Schultem. Tamiflu had been designed to plug a key hole in the viral needed to do a critical chemical step (cleave bond) to produce itself. The simulation revealed “Tamiflu is not binding in one step but in two steps and that the additional step is actually the one where the virus realized it can fend off the drug. Knowing that, pharmacologist can More Online John’s full report on the now design drugs for that step.” meeting is available online Computation enabled by GPUs, at www.bio-itworld.com. constitutes “a new computational microscope to view small systems in living cells. We need to see those because those are scales that cannot be seen with other technologies and that are relevant for pharmacological intervention,” he said. He cited a polio virus surface receptor simulation conducted on the National Center for Supercomputing Applications GPU cluster which experienced a 25.5X speed-up and 10X power efficiency gain over conventional computing approaches, and a molecular dynamic simulation of RNA/ribosome translation conducted on NCSA Lincoln Cluster which was reduced from two months to two weeks of compute time. It is interesting to consider the surreal images made possible by GPU computing may actually produce insight not readily available using our ‘normal’ perspectives. Consider how Riemannian geometry helped Einstein develop his ideas of curved space. It may be that computer-generated visualizations, in some cases, mirror reality better than our human senses.
Bio-IT World is seeking submissions to its 2011 Best Practices Awards. This prestigious awards program is designed to recognize outstanding examples of technology and strategic innovation—initiatives and collaborations that manifestly improve some facet of the R&D/drug development/clinical trial process. The awards attract an elite group of life science professionals: executives, entrepreneurs, innovators, researchers and clinicians responsible for developing and implementing innovative solutions for streamlining the drug development and clinical trial process. All entries will be reviewed and assessed by a distinguished peer-review panel of judges. The winners will receive a unique crystal award to be presented at the Best Practices Awards dinner, on Wednesday, April 13, 2011, in conjunction with the Bio-IT World Conference & Expo in Boston. Winners and entrants will also be featured in Bio-IT World. For more information, visit: www.bio-itworld.com/bestpractices
DEADLINE FOR ENTRY: January 14, 2011
bio-itworld.com/bestpractices
11-10_BITW_Ad_Pages.indd 43 Untitled-10 1
11:14:04 AM 10/28/2010 10:15:05
Go beyond the everyday. EVERYDAY. Symyx Notebook by Accelrys The Freedom to Experiment.
Whether it’s on the trail or in the lab, you want the freedom to take new approaches, routes, and paths to your goals. That’s why there’s Symyx Notebook by Accelrys. It’s the only electronic laboratory notebook that can be deployed across the enterprise in multiple scientific disciplines. With Symyx Notebook by Accelrys, research teams share a single application to document, work, collaborate, and speed the experimentation workflow. Symyx Notebook by Accelrys streamlines the capture of all experimental information and intellectual ideas. Everyday tasks such as data capture and note taking are optimized and automated. All of which gives you the time and freedom you need to experiment — and get back to doing science. To learn more, visit www.accelrys.com/notebook6
© 2010 Accelrys is a registered trademark of Accelrys Software Inc. All rights reserved.
11-10_BITW_Ad_Pages.indd 76 44 0910_BITW_Ad_Pages.indd
10/28/2010 11:14:56 AM 9/10/2010 11:29:20