Wermuth's The Practice of Medicinal Chemistry (Edc. 2008)
Chapter 13
Web Alert: Using the Internet for Medicinal Chemistry David Cavalla
I. INTRODUCTION II. BLOGS III. WIKIS A. RSS information feeds IV. COMPOUND INFORMATION A. Chemspider B. The NIH Roadmap and PubChem C. ChemBank V. BIOLOGICAL PROPERTIES OF COMPOUNDS A. Prediction of biochemical properties B. Molecular datasets
VI. VII. VIII.
IX. X. XI.
C. Information on metabolic properties DRUG INFORMATION A. DrugBank PHYSICAL CHEMICAL INFORMATION PREDICTION AND CALCULATION OF MOLECULAR PROPERTIES CHEMICAL SUPPLIERS CHEMICAL SYNTHESIS CHEMICAL SOFTWARE PROGRAMS A. Chemical drawing and viewing software
XII. XIII.
XIV. XV. XVI.
B. Various chemoinformatics software C. Datasets for virtual screening ANALYSIS CHEMICAL PUBLICATIONS A. Journals B. Open Access C. Theses PATENT INFORMATION A. Japanese patents TOXICOLOGY METASITES AND TECHNOLOGY SERVICE PROVIDER DATABASES
Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s what we’re doing. Jimmy Wales, founder of Wikipedia
I. INTRODUCTION The internet has undergone further substantial change since the 2nd edition of Practice of Medicinal Chemistry, both in the continued growth in the internet, and the availability of additional resources. This article can only describe the situation as it currently stands and give some predictions as to the future. It is in the nature of such reviews that they can never be complete and up-to-date, moreover they deteriorate rapidly with time. Further more, the resources on the internet are now too vast for a comprehensive article in the space available; a view has had to be taken to include only what can be considered to be the most significant and likely to last. This review is intended to provide freely available resources for the tools that medicinal chemists use generally Wermuth’s The Practice of Medicinal Chemistry
in the work they do, which necessarily involves a variety of tasks, from drug design to chemical synthesis. Sites for prediction of physical activity are just as relevant as those for prediction of biological activity and patents. While there has been a substantial increase in the number of commercial sites for chemistry since the last edition, there has not been the predicted shift in balance away from freely available resources. If anything there has been a surge in open resources, most notably in the openaccess publishing movement (see later). There has also been the advent of certain forms of internet use which were not evident, or very substantially smaller in impact and not predicted to grow in the way they have done over the last 4 years. These include the introduction of the blog and the rise and rise of the wiki.
255
Copyright © 2008, Elsevier Ltd All rights reserved.
256
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008)
II. BLOGS The blog, or web log, has rapidly risen to a position of great importance in news and other forms of media. However, it has been relatively more slowly introduced into science in general. There are no quality controls in the blog medium; however, some gems are to be found. Some of the more well-known generalist sites are given in Table 13.1.
There is a “hyperblog” site at http://wiki.cubic.uni-koeln. de/pg. This site, among other topics, features a section on popular stories and areas of common interest. There is a section dealing with blogs on various aspects of chemistry from analytical to chemometrics/bioinformatics to pharmacology and genetic modification. In addition, Table 13.2 shows the sites with specialization on medicinal and pharmaceutical chemistry.
TABLE 13.1 Title
URL
Comments
A Zephyr in Time
http://echiral.blogspot. com/search/label/Chemistry
A set of general articles on chemistry.
Chemical Blogspace
http://wiki.cubic.uni-koeln.de/ cb/index.php
Chemical blogspace collects data from tens of scientific chemistry blogs and presents it in one place.
Chemical Forums Blog
http://blog.chemicalforums.com/
Tied to the chemical forums, a place for open discussion related to chemistry.
Chemistry World Blog
http://prospect.rsc.org/blogs/cw/
Blog from the UK Royal Society of Chemistry (http://www.rsc.org).
Culture of Chemistry
http://cultureofchemistry. blogspot.com/
Authored by Michelle Francl, Professor at Bryn Mawr College.
Manufacturing Chemistry
http://pharmamanufacturing. wordpress.com/about
Administered by Pharmaceutical Manufacturing’s editor in chief, Agnes Shanley (see http://www.PharmaManufacturing.com).
Science Quick Picks
http://www.pontotriplo. org/quickpicks/
This site deals with popular chemistry but also other branches of science, education, and technology.
SciScoop
http://www.sciscoop.com
General science blog site and forum.
The Curious Wavefunction
http://ashutoshchemist. blogspot.com/
Miscellaneous thoughts, facts, and tidbits, recent and past, about chemistry, science, and society.
The Skeptical Chymist
http://blogs.nature. com/thescepticalchymist/
The Sceptical Chymist is a blog by the editors of Nature and the Research journals
TABLE 13.2 Title
URL
Comments
Medicinal Chemistry: In the Pipeline
http://www.corante.com/pipeline/
The author Derek Lowe has worked for several major pharmaceutical companies, on drug discovery projects against schizophrenia, Alzheimer’s, diabetes, osteoporosis, and other diseases.
One in Ten Thousand
http://walkerma.wordpress.com/
General discussion on medicines and pharmaceutical chemistry.
The Half Decent Pharmaceutical Chemistry Blog
http://the-half-decentpharmaceutical-chemistry-blog. chemblogs.org/
A blog on pharmacology and pharmaceutical chemistry.
The Molecule of the Day
http://scienceblogs. com/moleculeoftheday/
Focused on medicines in the context of real life.
Totally Medicinal
http://totallymedicinal.wordpress. com/
A blog looking at the world of pharmaceutical chemistry.
Kinasepro
http://kinasepro.wordpress.com/
Excellent site on kinase medicinal chemistry.
257
IV. Compound Information
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Further examples of blogs related specifically to synthetic chemistry are detailed in the section on Chemical Synthesis. A report has been written on the effect of blog sites on the Pharmaceutical Industry (http://www.pharmaceuticalbusiness-review.com/research.asp?guid BFHC0707 ). As blogs become more widespread, they will influence the way in which healthcare is delivered, and peer-to-peer health advice will become another means by which information is delivered to patients.
III. WIKIS From software to encyclopedias, collaborative projects are one of the most evidently disruptive applications of the internet, posing multiple challenges to conventional business models. Wikipedia (http://www.wikipedia.org) is well established, even though its first page went online only in 2001. It is a generalist information source but its scope and depth exceeds many specialist alternatives. The word is a composite of “wiki” being the Hawaiian word for quick and encyclopedia, of which it is now the world’s biggest. An interesting article on the origins, implementation, and phenomenal growth of Wikipedia was published in the The Atlantic in 2006 (http:// www.theatlantic.com/doc/200609/wikipedia). It is the collaborative (and indeed co-operative) nature of wikis that has enabled the rapid growth of Wikipedia; by comparison with the first Oxford English Dictionary which took 78 years before the first product was published in 1928 (http://www.oed.com/), Wikipedia is 6 years old and has more than 1.7 million articles in its English language edition, growing by nearly 2,000 a day. In 2006, a report in Nature (http://www.nature.com) compared Encyclopedia Britannica and Wikipedia science articles and suggested that the former are usually only marginally more accurate than the latter. As an example, the entry for “angiotensin II antagonists” ( http://en.wikipedia.org/wiki/Angiotensin_II_receptor_ antagonist) nicely leads to a list of seven members of the group. While incomplete as a list, each entry contains a graphical structure, an IUPAC name, CAS number, Pubchem link, bioavailability, protein binding, metabolism, half life, and so on. In the main section, the articles include information on regulatory status, dosing frequency, therapeutic indications, and side effects. Other wiki-based information resources are more candidly scientific in focus, but nascent in coverage. Biocrawler. com (http://www.biocrawler.com) claims to be “a [scientific] encyclopedia written collaboratively.” It is divided into sections on biotechnology (http://www.biocrawler.com/biocorp/ index.php?words biotech) and drug discovery (http://www. biocrawler.com/biocorp/index.php?words drug) companies, as well as videos, images, and a directory. The site currently offers a much smaller collection of entries than Wikipedia, and moreover, less well developed information even in those entries that exist. “Mitochondrion” for instance garners a
much more impressive set of information in the generalist Wikipedia than in Biocrawler. The Chemical Information Sources Wiki (http://cheminfo. informatics.indiana.edu/cicc/cis/index.php/Main_Page) is a guide to the many sources of reference materials available for those with questions related to chemistry. The site includes information on primary, secondary, and tertiary publication sources, chemical information databases, physical property information, chemical patent searching, and molecular visualization tools and sites. The material is based on an undergraduate course offered for many years in the Indiana University Department of Chemistry by Gary Wiggins.
A. RSS information feeds Nowadays web information is often delivered by means other than the simple (static) web browser. RSS feeds (a new format for notifying new content at a web site), which stands for really simple syndication, and is a new way of getting news in general, has recently been introduced for the American Chemical Society (http://www.acs.org) journals (see http://pubs.acs.org/alerts/rss/index.html). The information is not fed directly into a browser, but into a news aggregator such as FeedDemon (http://www.feeddemon.com).
IV. COMPOUND INFORMATION A. Chemspider There is a growing number of compound databases, and the dominance of Chemical Abstracts (http://www.cas.org) is thus challenged, however the mere diversity of these databases poses its own difficulty. ChemSpider (http://www. chemspider.com/) is a new chemistry search engine built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. Some properties have been added to each of the chemical structures within the database such as structure identifiers like SMILES, InChI, IUPAC and Index names as well as many physicochemical properties. In addition, ChemSpider provides access to a series of property prediction algorithms. ChemSpider currently searches over 14 million compounds in multiple chemical structure databases. These include databases of curated literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data, and so on. ChemSpider intends to aggregate into a single database all chemical structures available within open access and commercial databases and to provide the necessary pointers from the ChemSpider search engine to the information of interest. This service will allow users to either access the data immediately via open-access links or have the information necessary to continue their searches into commercially available systems.
258
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Two blogs support the system, one for the science, politics and vision behind ChemSpider (http://www.chemspider.com/blog) and another for incremental changes in functionality at http://www.chemspider.com/news.
B. The NIH Roadmap and PubChem The NIH (National Institutes of Health) “Roadmap” was launched in October 2004 (http://nihroadmap.nih.gov/ initiatives.asp) to encompass five themes, namely building blocks, biological pathways, and networks, molecular libraries and molecular imaging, structural biology, bioinformatics and computational biology and nanomedicine. PubChem (http://pubchem.ncbi.nlm.nih.gov/) is the child of the chemoinformatics initiative, developing a “new and comprehensive database” of chemical structures together with their biological activities. The information from the screening centers (together with publicly available information) will be housed in PubChem, which will also feature new algorithms for computational chemistry and virtual screening. Already, both Nature Chemical Biology (http://www.nature.com/nchembio/) and NMRShiftDB (http://www.nmrshiftdb.org) are available through PubChem, which also provides links to Medical Subject Annotations and PubMed biomedical literature citations. The database, growing rapidly, now has 17,000,000 substances and 500 bioassays in its collection. PubChem provides a limited set of structure properties, selected to be relevant for typical drug design applications. Presently, it is possible to do chemical similarity searches based on SMILES. PubChem is intended to develop new bioassays and perform massive high-throughput screening experiments on a large number of compounds, resulting in a very large public store of biological activity data associated with chemical structures. The structure database will ultimately contain full catalogs of major suppliers of screening compounds, as well as the structures from other public databases (NCI, NIAID, NIST), and it will provide extensive linkouts to original data. Examples for productive queries in the PubChem system can be found at http://www.ncbi. nlm.nih.gov/entrez/query.fcgi?db pcsubstance. PubChem also claims it is already the largest freely accessible chemical structure store. If its proposed developments are delivered it could become very useful indeed. Tensions have arisen between the “not-for-profit” American Chemical Society (ACS; http://www.acs.org) and the NIH’s PubChem, and also with Google’s Google Scholar. The ACS have been concerned about the scale and freely available nature of PubChem, and have also claimed that the literature-search function of Google Scholar infringed upon its own SciFinder Scholar trademark (http:// battellemedia.com/archives/001116.php). These disputes can be seen part of a wider, revolutionary change in the publishing climate, due to the rising importance of Open Access (see later).
C. ChemBank ChemBank (http://chembank.broad.harvard.edu/) is an initiative of Broad Institute Chemical Biology Program (BCB), and sponsored by the National Cancer Institute’s Initiative for Chemical Genetics (ICG; http://deainfo.nci. nih.gov/ICG.htm). ChemBank was developed by the informatics group at the Harvard Institute of Chemistry and Cell Biology and utilizes software toolkits supplied by Daylight Chemical Information Systems (http://www.daylight.com). ChemBank is a freely available collection of data about small molecules and resources for studying their properties, especially their effects on biology. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays. The database can be searched by chemical name or activity, by substructure (SMILES string input), or for structure similarity (SMILES string input). Searches can be limited to subsets of available compounds, defined as natural products, known drugs, FDAapproved drugs, commercially available compounds, orally available compounds and primary metabolites amongst other categories. ChemBank stores an increasingly varied set of cell measurements derived from, among other biological objects, cell lines treated with small molecules. It is possible to pick an assay, and then view both the details of the screen and/or the data from the assay (http://chembank.broad.harvard.edu/ screens/screen_finder.html). There is an additional option to enable viewing of the chemical structure employed in the assay, and even to enable export of spreadsheet files into Microsoft Excel of comma-separated value format. ChemIDplus (http://chem.sis.nlm.nih.gov/chemidplus) is a search engine which allows retrieval of about 380,000 chemical substance files. The structure-searchable database may include structure (263,000 structures available), official name, systematic name, other names, classification code (therapeutic use), molecular formula, STN locator code, and CAS registry number. Compounds are also searchable by toxicity data and physical properties. ChemFinder (http://chemfinder.camsoft.com/) is a very large and specific chemical substances search engine, which provides basic information about chemicals such as physical property data and 2D chemical structures. Obvious spelling errors and invalid CAS registry numbers are corrected. Chemicals and pharmaceuticals can be searched by chemical name, CAS registry number, molecular formula or molecular weight. About 75,000 compounds are registered to date.
V. BIOLOGICAL PROPERTIES OF COMPOUNDS The NCI DIS 3D database (http://dtp.nci.nih.gov/docs/3d_ database/dis3d.html) is a collection of 3D structures for over 400,000 compounds which was built and is maintained by the Developmental Therapuetics Program Division of Cancer
259
V. Biological Properties of Compounds
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Treatment, National Cancer Institute (http://www.nci.nih.gov). While the information stored therein is only a connection table of atomic linkages, it can be interpreted by computer software to provide a 3D structure for each entry. This can then be cross checked against available biological pharmacophores, representing the preferred 3D arrangement for certain biological activities. Drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. This approach has been used to search for certain novel protein kinase C (PKC) agonists (http://dtp.nci. nih.gov/docs/3d_database/pharms/pkcsearch.html), using a pharmacophore derived from phorbol. A similar approach has been used to find new ligands for HIV protease, HIV integrase and HIV reverse transcriptase (http://dtp.nci.nih.gov/docs/3d_ database/pharms/ncisearches.html. The Reciprocal Net project, run by the Indiana University Molecular Structure Center (http://www.iumsc.indiana.edu/) is a distributed, open, extensible digital collection of molecular structures together with software tools for visualizing, interacting with, and rendering printable images of the contents, and their automated conversion into standard formats which can be globally shared. The contents of the collection come principally from structures contributed by participating crystallography laboratories, which include universities from the Midwest, the East and West coasts of the US, the UK and Australia. Reciprocal Net’s common molecules include a section on therapeutic compounds (http://www.reciprocalnet. org/edumodules/commonmolecules/biochemical/list. html#therapeutic). One of the most complete and useful resources for CNS pharmacologists is the detailed ligand-receptor chart from http://www.neurotransmitter.net/neurosignaling.html. This table includes over 100 signaling molecules, including a wide variety of neurotransmitters, neuromodulators, neuropeptides, neurosteroids, and neuroactive hormones. The list does not include growth factors, cytokines, and intracellular second messengers. In almost all cases, the substances listed are known to or very likely to affect neurons in the human brain. There are links to gene sequence from the OMIM site (Online Mendelian Inheritance in Man; http:// www.ncbi.nlm.nih.gov/entrez/query.fcgi?db OMIM ). This database is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the National Center for Biotechnology Information (http://www.ncbi. nih.gov). There is also a substantial range of reviews referenced for the various neurological agents.
A. Prediction of biochemical properties FlexX (http://www.biosolveit.de/flexx) is a fast, robust, and highly configurable (FlexX-able) computer program for predicting protein–ligand interactions. Its main application is
the prediction of binding. For instance, FlexX predicts the geometry of the protein–ligand complex for a protein with known 3D structure and a small ligand molecule, and estimates the binding affinity. The speed of the calculation permits operation in a virtual high-throughput screening (vHTS) mode: FlexX is capable of screening a database consisting of 100,000 compounds in about 8 hours on a 30-node cluster. One of the recent features is a new module called PERMUTE (http://www.biosolveit.de/Permute), which protonates molecules and generates tautomers. An evaluation license for FlexX is valid for approximately 6 weeks free of charge and provides access to the full functionality of the software. After the evaluation period the software must be purchased. AutoDock (http://autodock.scripps.edu/) which is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. AutoDock has applications in X-ray crystallography, lead optimization, structure-based design, combinatorial chemistry, protein–protein docking and chemical mechanism studies GRID (http://www.moldiscovery.com/soft_grid.php) is a computational procedure for determining energetically favorable binding sites on molecules of known structure. It may be used to study individual molecules such as drugs, molecular arrays such as membranes or crystals, and macromolecules such as proteins, nucleic acids, glycoproteins or polysaccharides. Several different molecules can be processed one after the other. Lastly in this section, a new improvement to the JME molecular editor (see above, http://www.molinspiration. com/cgi-bin/properties) permits prediction of bioactivity.
B. Molecular datasets There is a list of free molecular datasets to correlate chemical structure and biological properties, incorporating information on QSAR, QSPR, toxicity, metabolism, permeability, etc. available at http://www.cheminformatics.org/. Downloadable structures are available for 29 out of 31 of the datasets. (The two datasets that are not freely available are restricted due to license reasons, because they are taken from the MDL Drug Data Report (MDDR) database: interested parties can gain access through the MDL web site at http://www.mdl.com/ products/knowledge/drug_data_report/.) The Cheminformatics site includes the information in a tabular format, with links to the chemical datasets, in structure-data format, and to the peer-reviewed articles, accessible through a Document Object Identifier (“DOI”) linkup. The articles are free to subscribers only – all others must pay a copyright fee. There is a lot of information here, in a ready-to-use format. For example, blood–brain barrier penetration data is available on a training set of 57 compounds and a data set of 13 more; long-term animal carcinogenicity results are available for over 1,400 compounds, drawn from the Carcinogenic Potency Database (CPDB), an initiative of the Lawrence
260
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Berkeley Laboratory (Berkeley, California); and pharmacological data are available on a wide range of receptors
C. Information on metabolic properties For metabolism data, there is a superb database in the University of Minnesota Biocatalysis/Biodegradation Database at http://umbbd.msi.umn.edu/. The database includes a search capability for compound, enzyme, or micro-organism name; chemical formula; CAS registry number; or EC (enzyme classification) code. It also has lists of reaction pathways, enzymes, micro-organism entries, and organic functional groups. It specifically includes a large number of reactions of naphthalene 1,2-dioxygenase and of toluene dioxygenase. There is a paper describing the database, which was published in Nucleic Acids Research in January 2001, which can also be downloaded in full text or in .pdf format from the site. PharmGKB (http://www.pharmgkb.org/) is an integrated resource about how variation in human genes leads to variation in our response to drugs. The database contains genetic and clinical information about people who have participated in research studies at various medical centers. Genomic data, molecular and cellular phenotype data, and clinical phenotype data are accepted from the scientific community at large. These data are then organized and the relationships between genes and drugs are then categorized into clinical outcome, pharmacodynamics and drug responses, pharmacokinetics, molecular and cellular functional assays and genotype. The database itself has been created by Stanford University in a nationwide effort funded by the US National Institutes of Health. The site refers to the interesting set of tools available at http://www.drug-interactions.com/, which is located in the Indiana University Department of Medicine. This site includes the Cytochrome P450 Drug Interaction Table, a text-based list of drugs which are known to have interactions with cytochrome P450. A quick way to find a specific drug on this page is to use your web browser’s Search feature: press Ctrl-F and type all or part of the drug name. The drugs themselves are linked to entries in RxList (http:// www.rxlist.com) and to pre-composed search routines on PubMed (http://www.ncbi.nlm.nih.gov). The site is additionally categorized into compounds which are known substrates, inhibitors and inducers of a particular p450 subtype. There is an abbreviated table used for clinical purposes at http://medicine.iupui.edu/flockhart/clinlist.htm. Overall, this site is simply but clearly and excellently put together.
VI. DRUG INFORMATION A. DrugBank The University of Alberta supported by Genome Alberta and Genome Canada, has introduced the freely available online
resource DrugBank, which contains detailed chemical, pharmaceutical, medical and molecular biological information on more than 3,000 drug targets and 4,100 approved or experimental drugs products (http://redpoll.pharmacy. ualberta.ca/drugbank/index.html). DrugBank brings the latest data from the Human Genome Project together with detailed chemical information about drugs and drug products. It provides more than 80 data fields for each drug, including brand names, chemical structures, protein and DNA sequences, and links to relevant internet sites, prescription information, and detailed patient information. The database contains nearly 4,300 drug entries including 1,000 FDA-approved small molecule drugs, 113 FDAapproved biotech (protein/peptide) drugs, 62 nutraceuticals and 3,000 experimental drugs. Additionally, more than 6,000 protein (i.e. drug target) sequences are linked to these drug entries. Users may query DrugBank through a simple text query, for general text queries of the entire textual component of the database; they may browse for a tabular synopsis of database content, such as for instance for compounds grouped by their indication; and they may draw a structure (using a ChemSketch applet or SMILES string) a chemical compound to search for chemicals similar or identical to the query compound. Finally there is also a facility to conduct BLASTP (protein) sequence searches of the 15,000 sequences contained in DrugBank. Both single and multiple sequence (i.e. whole proteome) BLAST queries are supported. A relational query search tool allows users to select or search over various combinations of subfields. While the FDA has a very good searchable web site of approved drugs at http://www.fda.gov/cder/ob/ (and FDAapproved biologics and other biopharmaceutical products are at http://www.biopharma.com) this is not structure searchable and does not contain information on compounds in development. More complete database products, like PJB Publications’ Pharmaprojects® (http://www.pjbpubs. co.uk), Prous’ Ensemble (http://www.dailydrugnews.com) and Current Drugs’ IDdb (http://www.current-drugs.com) are only available for a substantial price. There are multiple other sources of information on marketed compounds, similar to that which is conventionally available in pharmacopoeias, indeed the names of these sites often reflects that connection. The Internet Drug Index (http://www.rxlist.com) is a prescription drug database, which provides good basic information about products on the market, searchable by keyword, brand, or interaction. RxList is a trove of pharmaceutical knowledge with more than 4,500 medications on file, a pharmaceutical discussion board, and an online dictionary of medical jargon. It provides useful basic information about conventional drugs and a handful of herbal remedies as well, in the form of drug FAQs (frequently asked questions) and patient monographs.
261
VIII. Prediction and Calculation of Molecular Properties
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Another source is the electronic Medicines Compendium (eMC; http://emc.vhn.net/), with electronic versions of data sheets and Summaries of Product Characteristics (SPCs, sometimes also called SmPCs to differentiate them from Supplementary Patent Certificates) for medicines. It provides the same information as that contained in the latest edition of the Compendium of Data Sheets and SPCs, which covers thousands of medicines licensed in the UK. As an ongoing process, the eMC is also incorporating the SPCs of several thousand other medicines approved by the licensing authorities. The eMC ultimately aims to provide information on every licensed prescription, pharmacy and general sale medicine in the UK, including generics. As well as SPCs, the eMC will eventually include all Patient Information Leaflets (PILs), and will also be enhanced with dynamic updating and online links to complementary sources of medicines information.
VII. PHYSICAL CHEMICAL INFORMATION A comprehensive practical guide for determination of physical properties has been produced by Ben Wagner at the State University of New York at Buffalo (http://www.che. utoledo.edu/findmatprop.pdf). ChemFinder (http://www.chemfinder.com/) is one of the most important sites for physical property information. This site provides the structure, synonyms, CAS registry number, and up to nine physical properties directly for each compound (melting point (m.p.), boiling point (b.p.), refractive index, evaporation rate, flash point, density, vapor density, vapor pressure, and water solubility). Other information such as the physical description and odor detection limits are also given, when available. ChemFinder also acts as a metasearch engine, searching over 350 web sites and displaying direct links to these sites. The links are arranged into several categories including biochemistry, health, MSDS, physical properties, regulations, structures, and usage. The MatWeb site (http://www.matweb.com/index. asp?ckck 1) is different, since it deals mostly with materials, instead of individual chemical substances. The free sites, while offering significant amounts of data, do not compare with the information available from Beilstein’s Crossfire product, which of course is commercially priced, either in terms of number of compounds or in terms of number of properties for each “hit.” Syracuse Research Corporation (SRC) (http://www.syrres. com/esc/physdemo.htm) offers commercial online searches of a number of physical property databases, including online logP measurements (octanol–water partition coefficient), environmental fate for over 25,000 chemicals. There is a good discussion of the theory and application of various kinds of solubility parameters at http://palimpsest. stanford.edu/byauth/burke/solpar/. There are a belwindering array of solubility parameters, such as Kauri-Butanol number,
solubility grade, aromatic character, aniline cloud point, wax number, heptane number, and Hildebrand solubility parameter, among others. The Hildebrand solubility parameter (http://palimpsest.stanford.edu/byauth/burke/solpar/solpar2. html), the most widely applicable of all the systems, includes such variations as the Hildebrand number, hydrogen bonding value, Hansen parameter, and fractional parameter, to name a few. They are directly related to the heat of vaporization. Information specifically on solvents is to be found at SolvDB (http://solvdb.ncms.org/solvdb.htm). This site, sponsored by the National Center for Manufacturing Sciences (NCMS), gives information on eight different parameters including solvent name, CAS registry number, molecular formula, and chemical category for over 200 solvents. Nine different properties are range searchable including flash point, vapor pressure, density, and surface tension. Up to 33 more properties can be displayed for each solvent. Results can be sorted by solvent name or any of the nine range-searchable properties. Extensive information is provided for each solvent with display of health, safety, regulatory, and environmental fate data. The ChemExper Chemical Directory (http://www.chemexper.com/) is also listed below as a resource for searching available chemicals from various supplier catalogs. Links are provided to the supplier’s web site and to MSDS. Only the basic properties are directly provided: density, m.p., b.p., and flash point. However, links to the full text of the MSDS will usually provide some additional properties. The NIST Chemistry webBook (http://webbook.nist. gov/chemistry/) from the National Institute of Standards and Technology (formerly the National Bureau of Standards), lists up to 45 thermochemical, thermophysical, and ion energetics properties which are available for over 40,000 compounds. Finally, the Organic Compounds Database (http://www. colby.edu/chemistry/cmp/cmp.html), maintained at Colby College, features a database of 2,483 compounds compiled by Harry M. Bell of Virginia Tech. Though only a few common properties are provided, the search screen allows the selection of a wide variety of parameters including property values, element counts, and the presence or absence of certain broad structural entities such as amines or hydroxyl groups. Unfortunately, retrieval sets are limited to twenty compounds, though the search engine does report the total number of hit compounds.
VIII. PREDICTION AND CALCULATION OF MOLECULAR PROPERTIES Molecular property prediction is becoming a useful tool in the generation of libraries of “beautiful” molecules, or molecules with the correct parameters to be useful drug candidates. Used in a more focused way, drug design and lead optimization benefits from an ability to predict physical properties such as lipophilicity and solubility, as well as physical molecular properties such as polar molecular
262
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) surface area (PSA). Methods for prediction of the latter are outlined in an old publication by David Clark at http:// www.documentarea.com/qsar/davclark.pdf. PSA, along with others such as charge, polarizability, molecular surface area and numbers of H-bond donor–acceptors are predicted by plugins for ChemAxon (http://www.chemaxon.com/ marvin/chemaxon/marvin/help/calculator-plugins.html). Alternatively, a program for the calculation of PSA directly from SMILES input, which claims to be 2–3 orders of magnitude faster than other methods, called tpsa.c is available from http://www.daylight.com/meetings/emug00/Ertl/. A nice way of visualizing five drug-like properties in a single graph is the PK RADAR presentation (http://www.rscmodelling.org/CEAtoDD/RitchieRadar.ppt). Alternatively, Vlaavis is a free visualization tool in which each circle represents a single compound and each slice of the “pie” represents a normalized response to a particular assay or property; originally a structure-activity relationship (SAR) tool, it is available from http://www.vlaaivis.com/index.htm. In addition to experimental information, SRC (referred to above) have also developed software to predict physical properties, such as the Estimation Program Interface (EOI) Suite (http://www.epa.gov/oppt/exposure/docs/episuite.htm) which was developed for the US EPA (Environmental Protection Agency). By entering a single SMILES notation as the search key, results from 10 separate programs are displayed. These are shown in Table 13.3. The program contains a SMILES notation database searchable by CAS registry numbers. By entering a registry number, the SMILES notation is automatically retrieved and entered into the search box. Another useful sites in this regard is ChemExper, which (see below) in addition to resources for searching available chemicals and their physical properties, hosts the OSIRIS Property Explorer (http://www.chemexper.com/tools/propertyExplorer/main.html) for calculation/prediction of a compound’s physical parameters. OSIRIS calculates various drug-relevant properties using a user-drawn structure. Prediction results are valued and color coded. Properties with high risks of undesired effects like mutagenicity or a poor
TABLE 13.3 Aquatic toxicity (LD50, LC50)
Henry’s law constant
Aqueous hydrolysis rates
m.p., b.p., and vapor pressure
Atmospheric oxidation rates
Octanol–water partition coefficient
Bioconcentration factor (BCF)
Soil sorption coefficient (Koc)
Biodegradation probability
Water solubility
Source: Properties available from the EOI Suite at http://www.epa.gov/oppt/exposure/ docs/episuite.htm.
intestinal absorption are shown in red, whereas a green color indicates likelihood of conforming to drug-like behavior. As the user is building the molecule, the cLogP and solubility characteristics are being calculated. The kinds of toxicological and safety issues that are predicted include mutagenicity, tumorigenicity, reproductive effects and irritancy. The algorithms used to calculate these properties are described in some detail – for instance the toxicity risk assessment is explained at http://www.chemexper.com/tools/propertyExplorer/tox.html. A substructure search process determines the occurrence frequency of any fragment (core and constructed fragments) within all compounds of that toxicity class. Similar explanations follow the fragment-based druglikeness score (http://www.chemexper.com/tools/property Explorer/druglikeness.html) and the overall drug-likeness score (http://www.chemexper.com/tools/propertyExplorer/ drugScore.html). The OSIRIS Property Explorer is an integral part of Actelion’s (http://www.actelion.com) in-house substance registration system. The prediction process relies on a pre-computed set of structural fragments that give rise to toxicity alerts in case they are encountered in the structure currently drawn. These fragment lists and toxicities (e.g. mutagenicity) were drawn from the RTECS database. RTECS, the Registry of Toxic Effects of Chemical Substances, aims to “list … all known toxic substances … and the concentrations at which … toxicity is known to occur;” currently there are over 133,000 such substance listed at http://www.ntis.gov/ products/types/databases/rtecs.asp?loc 4-4-3. The Interactive Laboratory (I-Lab: http://www.acdlabs. com/ilab) is a commercial product (but with a free demonstration version) available from Advanced Chemistry Development (ACD). It provides online computation of molecular physical properties for LogP, pKa, LogD, and aqueous solubility. I-Lab also includes database searching of ACD’s compilations of Spectra and Physical Properties. The ACD/logP calculator (http://www.acdlabs.com/ download/logp.html), now offered as freeware has been compared with competitive products at: http://www. acdlabs.com/products/phys_chem_lab/logp/competit.html. It is claimed to calculate an accurate logP derived from an internal ACD/LogP database containing over 5,000 experimental LogP values. An interactive web service for the calculation of molecular properties relevant to drug design and QSAR has been established at the Molinspiration Cheminformatics web site http://www.molinspiration.com/cgi-bin/properties. Properties calculated include logP, PSA, and Lipinski Rule of five parameters. Soon also a drug-likeness index will be available. Molinspiration is offering this as a free service to internet chemistry community for up to 100 determinations per month. In addition to single calculations, the site also offers the possibility to search in the web molecular database by substructure, structure similarity or pharmacophore similarity (http://www.molinspiration.com/cgi-bin/search).
263
IX. Chemical Suppliers
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Previously, the calculation of logP could be performed from SMILES strings at the Daylight site (http://www.daylight. com/daycgi/clogp), however, there was no facility for calculation based on molecular structure.
IX. CHEMICAL SUPPLIERS There is currently a very large amount of information on available chemicals on the web. This information is relevant for both laboratory-scale synthesis and for larger scale preparations, however, it is more easily searched for laboratory synthesis. A useful reference is to the list of the web sites of online searchable chemicals and suppliers; for example http://www.mdpi.org/forum.htm#chemicals offers a range of options, which are also accessible through its European mirror site at http://www.unibas.ch/mdpi/forum. htm#chemicals Examples of sites for online searching of available chemicals are provided in Table 13.4. The site http://www.chemexper.com also allows access to Expereact™ WEB, a laboratory management program that helps to keep stock control, order products, add reactions (electronic laboratory journal), export all the information to
a word processor, etc. Another site providing software for inventory management is ChemSW, at http://www.chemsw. com. Products include the CIS Inventory system Pro 2000, and a digital “MSDS digital filing cabinet” (very useful for managing data sheets as they go out of date), as well as more conventional chemical drawing and molecular modeling programs. Many suppliers offer database searching capabilities themselves. Large companies such as Sigma-Aldrich have managed to offer a complete searchable database of their products, by name, structure and CAS number (http://www.sigmaaldrich.com). They also feature online ordering via a secure interface. The smaller suppliers have been later in arriving at an online database with searching and secure ordering. There are some commercial databases products in addition to the sources listed above, such as ChemSources International (http://www.chemsources.com/csintl.htm), which includes the products of more than 8,000 chemical companies worldwide. The Chemical Section lists approximately 275,000 chemical compounds and provides contact data necessary for making direct inquiries to each chemical firm. The product is available at a cost of approximately $1,100 for a single user CD-ROM license.
TABLE 13.4 Web site address
Comments
http://www.buyersguidechem.de/
Excellent site with wide variety of chemicals, no prices; useful for bulk and for MSDS; a directory of over 100,000 chemicals.
http://www.chemexper.be/
Excellent search capability on a wide variety of research chemicals, and information that includes the exact chemical name as well as formula, melting point and other physical properties. Searching can be conducted by CAS number, molecular formula, substructure, name and a range of other terms. Hot links allow the user to directly go to the individual supplier.
http://www.molmall.org
MolMall features the Rare Chemical Samples ExchangeCenter. Compounds are made available from small samples provided by individual researchers. Full structure search or substructure searches are permitted on the web site, as well as for the name of the submitter and several other very useful searching functions. Links to Molecules MolBank (http://www.molbank.org) papers if the compounds are published there. There are plans to allow the sample submitters to add additional information to the data sheet, such as the literature where the compound was published.
http://www.icis.com/Search/ default.aspx
Fairly wide variety of chemicals but no prices.
http://www.chemindustry. com/apps/chemicals
ChemIndustry.com site enables the user to enter a product name and then search a database of web sites related to various chemical suppliers.
http://www.chem-edata.com
Text and CAS number search capability. Fairly limited selection.
http://www.chemacx.com
A commercial product through CambridgeSoft, Available Chemicals Xchange features the complete catalogs of over 200 vendors.
http://www.ubichem.com
Ubichem is an independent British company which was founded in 1978 with the aim of supplying a wide range of fine chemicals and intermediates. Its literature review section at http://www. ubichem.com/lit.php has reports on a variety of chemical topics, including isoquinolines, palladium coupling, azides, and radiolabeling.
264
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) CHEMCATS is an online database accessible through Chemical Abstracts that contains over 13 million commercially available compounds, including pricing information when available from suppliers and for many also direct hyperlinks to suppliers’ sites. CHEMCATS is routinely updated with new information provided by suppliers already in the database and with new suppliers and/or catalogs. This is another commercial product, but there is no up front fee: pricing is based on pay as you go and can be accessed through STN Easy (http://stneasy.cas.org/).
X. CHEMICAL SYNTHESIS WebReactions (http://www.WebReactions.net) is a new, unique reaction search system offering direct retrieval of reaction precedents through the internet. The WebReactions system is easy to learn and use, the user merely draws the reactant and product using a Java-based chemical drawing program. It is virtually instantaneous in displaying matches, not just for the input reaction itself, but for as broad a range of analogs as desired. The complete Organic Synthesis (OS) is now available free online at http://www.orgsyn.org/. Exact and substructure searches are supported following download of a free ChemDraw plugin as well as chemical name, formula, OS
reference, keyword index searches. This site is available free of charge to all chemists and contains all of the ten Collective as well as Annual Volumes and Indices. Organic Syntheses (OS) is a compilation of 84 annual volumes containing selected and independently checked procedures and new reactions in the field of organic synthesis. Since the 1920s, volumes of OS consisting of synthetic procedures have been published annually. The first 6 Collective Volumes were published every 10 years, and the last 3 at 5-year intervals. Two other sites related to chemical synthesis include http://orgchem.chem.uconn.edu/namereact/named.html which includes details of about 100 named reactions; and the reaction index at http://www.pmf.ukim.edu.mk/PMF/ Chemistry/reactions/rindex.htm, which contains a very extensive list of named reactions in organic chemistry. For biotechnological synthesis, there is a superb database containing information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds. It is called the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) at can be found at http://umbbd.ahc.umn.edu/search/index.html. The goal of the UM-BBD is to provide information on microbial enzyme-catalyzed reactions that are important for biotechnology. There are a number of blogs oriented toward chemical synthesis (Table 13.5).
TABLE 13.5 Title
URL
Comments
Useful Chemistry
http://usefulchem.blogspot.com
A useful aggregative site that includes articles from a range of writers, specifically on various aspects of synthetic chemistry.
Carbon Tet
http://carbontet.blogspot.com/
Medicinal chemist with a focus on synthetic chemistry.
Chemical Musings
http://chemicalmusings.wordpress. com/
Thoughts on organic chemistry and my experiences as a new industrial chemist.
Heterocyclic chemistry
http://hetchem.blogspot.com/
Monthly articles on heteroscyclic syntheses.
Org. prep. daily
http://orgprepdaily.wordpress.com/
Procedures for various simple reactions.
Organic Chemistry Highlights
http://www.organic-chemistry.org
Stereoselective synthesis of natural products, new methods in synthetic organic chemistry, and computational organometallic chemistry in organic synthesis; 5–8 highlighted reactions per month, and short reviews of organic, bioorganic, organometallic and microwave chemistry, total synthesis of natural products and multi-component reactions.
She Blinded Me with Science
http://blind-science.blogspot.com/
Synthetic chemistry from a chemical biologist’s point of view.
Totally Retrosynthetic
http://totallyretrosynthetic. blogspot.com/
Author has interest in synthesis of natural products.
Totally Synthetic
http://www.totallysynthetic.com/ blog/index.php
Author is a synthetic chemist with Prof S Ley’s group in Cambridge, England.
265
XI. Chemical Software Programs
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008)
XI. CHEMICAL SOFTWARE PROGRAMS A. Chemical drawing and viewing software There are now a number of 2D and 3D molecule viewers available for free download, which work to make chemical structures visible on web pages. A summary of some of the
available software is also held at http://www.indiana.edu/ ~cheminfo/mvts.html (Table 13.6).
B. Various chemoinformatics software There are a few software programs capable of 3D structure, conformation generation, computer-aided drug design, and/or
TABLE 13.6 Viewer
Description
URL
Chime
The Chime plug-in displays 2D and 3D molecules directly within a web page and works with both Mozilla and Microsoft browsers. The molecules in the web page are “live,” meaning they are not just pictures, but chemical structures that can be rotated, reformatted, and saved in various file formats for use in modeling or database software.
http://www.mdl.com/products/ framework/chime/key_features.jsp
JMol
JMol, initiated by Dan Gezelter at Columbia University is JMol is a free, open source molecule viewer and editor. It is a collaboratively developed visualization and measurement tool for chemical scientists.
http://jmol.sourceforge.net/
ChemDraw plug-in
This is claimed as being more than a mere structure viewer or a slow Java applet, rather it runs as fast and is as familiar as the regular ChemDraw application. Available without charge, it enables searching of web databases by structure or substructure, and viewing of ChemDraw documents that others have placed on the web.
http://www.camsoft.com
JME molecular editor
JME is a free Java applet which allows generation and editing of molecules and reactions, and creation of molecule SMILES. The JME applet, written by Peter Ertl from Novartis, has become a standard for molecule structure input on the internet.
http://www.molinspiration.com/ jme/index.html
MarvinSketch and MarvinView
A set of Java-based chemistry software including MarvinSketch, an applet for editing and visualizing molecules on a web page; MarvinView, an applet for viewing molecules in 2D or 3D on a web page; and MolConvertor, a command line program that converts between various file types.
http://www.chemaxon.com/ product/live_examples.html
WebMolecules
Web visualization of molecules in 3D – in real-time – has now been achieved. Over 150,000 molecular models are available onsite, which may be searched by CAS number or exact formula. In addition partial formula searching is permitted to look into the Top 2,000 molecules, which includes molecules of commercial value, educational importance, and of topical interest. Thousands of common molecules are organized into over 30 categories.
http://www.webmolecules.com
Waltz and CSD
ChemViz (short for chemistry visualization) is a set of web-based applications created by NCSA designed to catalyze a better understanding of chemical processes through visualization. Two tools are currently supported: Waltz, which generates images and animations of desired molecules and ions; and CSD, which presents a 3D model of complex organic compounds.
http://chemviz.ncsa. uiuc.edu/
Depict
This service accepts a SMILES string as input and returns an HTML page with an embedded image. Unfortunately, there is no control on the output style and image size.
http://www.daylight. com/daycgi/depict
ACD/Structure Drawing Applet
A complete structure drawing, editing, and visualization tool written in pure Java that can be incorporated into HTML documents. The applet can be used for composing substructure queries to databases and visualizing results.
http://www.acdlabs. com/products/java/sda/
OpenBabel
A cross-platform program and library designed to interconvert between many file formats used in chemical drawing and molecular modeling.
http://sourceforge.net/mailarchive/ forum.php?thread_id 8125020&forum_id 3042
266
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008)
TABLE 13.7 Program name
URL
Description
CORINA
http://www2.chemie.uni-erlangen. de/services/telespec/corina/
3D Structure Generator – 1,000 structures can be generated for free.
FANTOM (Fast NewtonRaphson Torsion Angle Minimizer)
http://bose.utmb.edu/fantom/fm_ home.html
Calculation of low-energy conformations of polypeptides and proteins, compatible with distance and dihedral angle constraints obtained typically from nuclear magnetic resonance (NMR) experiments.
MMFF94
http://ccl.net/cca/data/MMFF94/
Set of validation molecules based on X-ray crystallographic data.
Moloc
http://www.moloc.ch
Molecular modeling package.
NEWLEAD
http://www.ccl.net/cca/software/SGI/ newlead/README.shtml
Computer program for the automatic generation of candidate structures.
PADRE
ftp://ftp.CCL.net/pub/chemistry/ software/UNIX/PADRE/
Analysis of the results of conformational searches, and measurement of similarity and differences between molecules.
Pgchem::tigress
http://pgfoundry. org/projects/pgchem/
Chemoinformatics extension to the PostgreSQL database management system, that enables PostgreSQL to handle molecules through SQL statements.
PyMOL
http://pymol.sourceforge.net/
Open source molecular visualization system.
RasMol
http://www.umass.edu/microbio/ rasmol/index2.htm
Molecular visualization software.
molecular modeling are listed available on free licenses, at least for academic purposes (Table 13.7). A more extensive list of software generally available for pharmaceutical and biotechnological R&D is available from NetSci, a public information exchange (http://www. netsci.org/Resources/Software/Cheminfo/index.html ). This list includes chemical databases, reaction databases, QSAR, and other programs. The modeling section of NETSci is to be found at http://www.netsci.org/Resources/ Software/Modeling/CADD, and includes both open license and commercial software. Most of the references on this section of the NetSci site are to software programs which are not free, even to academic licensees. Their exclusion from explicit mention from this review is not intended to imply any value judgment on their worth. Interested readers are encouraged to make their own enquiries if they wish to review the available offerings.
C. Datasets for virtual screening Information from published literature, particularly from chemical catalogs has been used to create virtual libraries for drug screening. (Incidentally, virtual screening has also been used in a co-ordinated fashion together with high-throughput screening, rather than in competition with it – see http://www. ncbi.nlm.nih.gov/entrez/query.fcgi?cmd Retrieve&db PubMed&list_uids 12471601&dopt Abstract.) For example, ZINC is a free database of small molecules for docking that are commercially available (http://blaster. docking.org/zinc/ ). ZINC is a self-referential acronym
for “ZINC is not commercial,” and contains over 3.3 million compounds in ready-to-dock, 3D formats. The downloads are available in sdf, mol2, and SMILES formats. Subsets of the libraries are available, and can be browsed (http://blaster.docking.org/zinc/bysubset.shtml). There is a subset of “drug-like” molecules assembled by searching the database according to the Lipinski rules of 5 (logP 5; mol wt 500; number of H-bond donors 5; number of H-bond acceptors 10). It is also possible to create a bespoke subset by searching according to physical properties including structure, net charge, calculated logP, rotatable bonds, number of H-donors, number of H-acceptors, polar desolvation, apolar desolvation, and molecular weight (http://blaster.docking.org/zinc/choose.shtml). Another resource is within the similarity searching section of the Chemoinformatics site run by Andreas Bender at the University of Cambridge (http://cheminformaticcs. org/simsearch) using the MOLPRINT 2D approach. Users can compare a test library and a reference molecule in hydrogen-depleted mol2 formats. (These can be generated using the CONVERT or OpenBabel programs referenced earlier.) Sample structures for the reference molecule are available for the 5-HT3 antagonist and angiotensinconverting enzyme (ACE) inhibitor pharmacophores at http://cheminformatics.org/simsearch/testcompounds.shtml. Comparison of fingerprints is performed by using the Tanimoto coefficient, Tc, which is defined by the number of common features of the two structures (AND), divided by the number of features which are contained in at least one of the structures (OR).
267
XIII. Chemical Publications
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Another place where one can submit SMILES or molfiles and have a similarity score computed is described at http:// www.jchem.com/doc/user/Compr.html (library comparison based on similarity) and at http://www.jchem.com/doc/ user/Jcsearch.html (for all kinds of searching including the Tanimoto-based similarity method). Both of these methods are included in the JChem software (http://www.chemaxon. com/products.html), which is free for academic use.
XII. ANALYSIS Analytik (http://www.analytik.de/) is a very comprehensive German information site for analytical chemists. It relates discussions of analytical problems, contains a small but excellent link collection to chemical databases and literature (with an application database) and press releases from the German Chemical Society, etc. Spectra Online (http://spectra.galactic.com/) provides about thousands of IR, MS, NMR, UV/VIS, and NIR spectra which can be consulted free on this site. Retrieval of information is possible by entering the compound name, CAS registry number, molecular formula, etc. Requires registration (free). Macherey-Nagel (http://www.macherey-nagel.com) provides GC, HPLC, SFC, SPE, and TLC-methods with details and literature references listed on this site. A classification is made according to categories of products (e.g. HPLC analysis of amides, amines, etc.). A new release of version 1.0 of an open access, open submission, open source NMR database NMRShiftDB has been
announced at (http://nmrshiftdb.ice.mpg.de/). The software and database content can be downloaded via http://www.sourceforge.net/projects/nmrshiftdb. NMRShiftDB is a web database for organic structures and their NMR spectra. It allows for spectrum prediction (currently only for carbon) as well as for searching spectra, structures, and other properties. Currently, the database contains over 20,000 structures and over 23,000 measured spectra (as well as about 500 calculated spectra). The Java applet that comes with NMRShiftDB includes an array of features for molecular display (such as ball-andstick, wireframe, space-fill, etc.), translation into SMILES nomenclature, and possibility of structure editing. Searching of chemicals can be based on (amongst others) chemical name, keyword, CAS number, literature title/author, and chemical formula.
XIII. CHEMICAL PUBLICATIONS A. Journals Nearly all journals have a web presence, and an increasing majority have electronic versions of their publications (including archives) available through the web site. A convenient listing of them is available in the chemistry section of the WWW virtual library at http://www.liv.ac.uk/ Chemistry/Links/journals.html. Salient journals related to medicinal chemistry include those shown in Table 13.8. A piece of software to extract data from literature is the Experimental Data Checker and OSCAR toolkit available
TABLE 13.8 Publisher
Journal title
URL
American Chemical Society
Bioconjugate Chemistry
http://pubs.acs.org/journals/bcches
American Chemical Society
Journal of Natural Products
http://pubs.acs.org/journals/jnprdf
American Chemical Society
Journal of Pharmaceutical Sciences
http://pubs.acs.org/journals/jpmsae
American Chemical Society
Journal of Medicinal Chemistry
http://pubs.acs.org/journals/jmcmar/
American Chemical Society
Modern Drug Discovery
http://pubs.acs.org/journals/mdd/index.html
American Chemical Society
Organic Process Research & Development
http://pubs.acs.org/journals/oprdfk
Bentham Scientific Publishers
Current Medicinal Chemistry
http://www.bentham.org/cmc/
Bentham Scientific Publishers
Current Pharmaceutical Design
http://www.bentham.org/cpd/
Bentham Scientific Publishers
Current Drug Discovery Technologies
http://www.bentham.org/cddt/index.htm
Current Drugs
Current Opinion in Drug Discovery and Development
http://scientific.thomson.com/products/coddd/
Elsevier
Bioorganic & Medicinal Chemistry
http://www.elsevier.com/inca/publications/store/1/2/9/
Elsevier
Bioorganic & Medicinal Chemistry Letters
http://www.elsevier.com/inca/publications/store/9/7/2/
Elsevier
Drug Discovery Today
http://www.drugdiscoverytoday.com
Elsevier
European Journal of Medicinal Chemistry
http://www.elsevier.com/wps/find/journaldescription. cws_home/505813/description
268
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) from http://www.rsc.org/Publishing/ReSourCe/Author Guidelines/AuthoringTools/ExperimentalDataChecker/index. asp. Experimental data on new molecules in organic and inorganic chemistry is presented in a standard form which varies little from journal to journal. Typically, the appearance of the compound is described, followed by melting points (if applicable), Rf, infrared and NMR data, and mass spectral information. OSCAR will extract this information from either a paragraph of experimental data, or a full paper, and then run some checks to test the data for consistency. After the user pasting the experimental data into an HTML form, the program returns both the analytical information and a critical assessment of the same; it can also plot the 1 H NMR spectrum from the analyzed information.
B. Open Access The issue of Open Access is a huge one for scientific publishing. Advocates want to move from traditional subscriptionbased journals to a model that would make all research findings accessible to anyone with a computer. There are however a number of problems with the open-access approach.
1. Availability of information A fundamental issue for open-access journals is the quality of the science. Resources for open-access publications include: BioMed Central (http://www.biomedcentral.com) a forprofit open-access publisher, with a diverse group of peerreviewed journals. The Directory of Open Access Journals, DOAJ (http:// www.doaj.org/), a clearinghouse for free, full text, qualitycontrolled scientific, and scholarly journals. Public Library of Science (PLoS) (http://www.plos.org), a non-profit organization of scientists and physicians committed to making the world’s scientific and medical literature a public resource. One of their major initiatives is PLoS Biology, (http://biology.plosjournals.org/perlserv/?request index-html) a peer-reviewed, open-access journal published by PLoS. SPARC, the Scholarly Publishing and Academic Resource Coalition, (http://www.arl.org/sparc/), an alliance of academic and research libraries and organizations working to correct market dysfunctions in the scholarly publishing system. SPARC is a partner of PLoS. Various other sites have been set up to discuss the issues related to open access such as SciDev.Net (http:// www.scidev.net) and in blog-form at http://www.earlham. edu/~peters/fos/fosblog.html.
2.. Attitude of funders of science The Welcome Trust has announced it will require results from research funded by the Trust to be available in public repositories 6 months after publication, partly on the basis that an author-pays business model has the opportunity for
a saving of 30% on publishing costs alone compared to reader-pays (http://science.slashdot.org/article.pl?sid 05/ 03/20/2043237). Similarly, NIH have published guidelines to ensure that publicly funded research is widely available (http://publicaccess.nih.gov/), and encourages investigators to make NIH-funded peer-reviewed manuscripts available to other researchers and the public through the NIH National Library of Medicine’s (NLM) PubMed Central (PMC) (http://www.pubmedcentral.gov/) immediately after the final date of journal publication. At present, all papers appearing in Royal Society (http://www.royalsoc.ac.uk) journals can be accessed free of charge 12 months after their publication, but the Royal Society has expressed concern that the proponents of open access should not aim unilaterally for an environment inimical to for-profit scientific publishing. The Society’s publications include Biology Letters, Philosophical Transactions Parts A and B, and Proceedings of the Royal Society Parts A and B (http://www.royalsoc.ac.uk/page.asp?id 2462). The Directory of Open Access Journals (http://www.doaj. org) includes a full list; in the area of chemistry, there are currently 40 entries (http://www.doaj.org/ljbs?cpid 61). In addition to journals, a significant proportion of open-access information is available through self-archiving in institutional archives.
3. Impact factors? Currently, despite significant growth, only about 20% of the number of papers published annually are open access. Whether open access increases impact is still debatable (http:// opcit.eprints.org/oacitation-biblio.html). Proponents argue that open access increases the number of citations; dissenters argue that this is so only for prestigious authors who publish in prestigious journals and whose article is already highly cited.
C. Theses Many universities are installing searchable and accessible thesis archives and (at least theoretically) this is a welcome addition to the web-searchable pantheon of scientific literature. The practical difficulty associated with this task is the sheer diversity of information sources, which are not all archived in a central location. Lists such as these must be assembled by hand; not only are they vast but they are also constantly changing. A partial solution to this problem is addressed through the Networked Digital Library of Theses and Dissertations (NDLTD) (http://www.ndltd.org). Since its inception in 1996, over a hundred universities have joined the initiative, each of which has a process in place for archiving and distribution of electronic theses and dissertations (ETDs). The Union Catalog Project (http://rocky.dlib. vt.edu/%7Eetdunion/cgi-bin/OCLCUnion/UI/index.pl) is an attempt to make these individual collections appear as one seamless digital library of ETDs to students and researchers
269
XIV. Patent Information
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) seeking out theses and dissertations. ETDs are owned and maintained by the institutions at which they were produced or archived, while all the metadata (title, author, etc.) have been gathered into a central search engine. MIT’s (Massachusetts Institute of Technology) Libraries’ Document Services department at http://dspace.mit.edu/ handle/1721.1/7582, one of the foremost institutions in this effort, offer the full text of selected master’s and doctoral theses from all MIT departments. These include theses that have been previously requested and scanned by Document Services as well as theses from the university’s pilot project in electronically submitted theses. Users can search the database by keyword, perform an advanced search with separate fields, or browse by author or year. All theses can be viewed as lowresolution (100 dpi) greyscale inline .gif images. The theses of some of their Nobel Prize winning alumni are available at http://libraries.mit.edu/docs/nobeltheses.html.
XIV. PATENT INFORMATION The major world patent databases are online and searchable, there is a plethora of tools available for the desk scientific researcher. Esp@cenet at the European Patent Office (EPO) Databases (http://www.european-patent-office.org/index.htm) allows free online patent searching in over 30 million documents (in EPO member states and worldwide) by entering keywords, patent numbers, institute names, etc. The US Patent and Trademark Office (USPTO) web patent database (http://www.uspto.gov/patft/) provides access to both the US Patent Bibliographic Database, which includes bibliographic data from 1976 to the present and the AIDS Patent Database, which includes the full text and images of AIDS related patents issued by the US, European, and Japanese Patent offices. There is a patent number search page as well as Boolean and Advanced search pages for text field searching. Both cited and citing patents are hyperlinked to each patent. There are hyperlinks between the classification numbers and their definitions and good help pages for each search type. However, recent information suggests that, surprisingly, not all of the databases are complete. According to Univentio, a Dutch patent information company (http:// www.univentio.com/), Espacenet (http://es.espacenet.com) is missing hundreds of thousands of patent documents from various European countries including the UK, France, and Germany. The problems may be caused by a variety of means such as errors in digitization, or because the original paper copies were not available. The loss of paper archives is a growing problem, with lack of shelf space compromising libraries’ ability to house complete collections. This in turn represents a significant problem for electronic patent office searches, which are the default method for a patent examiner being able to identify prior art for a new application.
Some national offices such as the UK Patent Office (http:// www.patent.gov.uk/) have plans to extend the electronic archive with scanned images of British patents; these will need to be processed for optical character recognition, in order to make the text within searchable: without such processing, the images are not identified by searching methods. Although USPTO (http://www.uspto.gov) provide images of the actual hard copy, the user currently has to combine singly downloaded .tiff or .pdf files in order to generate a single-file document. This tedious process has been obviated by commercial patent engines such as MicroPatent (http:// www.micropatent.com) and Delphion (http://www.delphion. com), but now there are free alternatives to address this issue. FreePatentsOnline (http://www.freepatentsonline.com) is, as the name suggests, a freely accessible database that contains all patents published by the USPTO since number 4,000,000. It is automatically updated weekly, is searchable and can retrieve images of the results from the patent text pages. The search methods are similar to those available at the USPTO site. Search terms can be entered in certain fields, such as Title, Abstract, Assignee (Owner), etc., to locate patents or published patent applications having the entered terms in the specified fields (in the specified sections of the patents or applications). Search strings can also be connected with Boolean terms such as AND, OR, and ANDNOT, and parentheses can be used to order the connected terms. The ends of search terms can be truncated and the “wildcard” symbol $ (note that a term may not be truncated to less than four characters). The most complete search method is based on searching the “specification” field. To identify US classes for particular fields of technology, users can access the Manual of US Patent Classification at http://www.uspto.gov/web/classification/. Using the concept of the wiki, the WikiPatents Community contributes to the US patent system by reviewing issued patents and, soon, pending patent applications (http://www. wikipatents.com). The public can add prior art references for a given patent, vote on the relevancy of both original and useradded references, and make comments about how the prior art is related to a patent. It has an area for discussing prior art searching, patent litigation, law changes, and reviews on certain patents. There is an issue resulting from the prior art analysis that may be posted on this site with regard to US patent prosecution. Under US Law 37 C.F.R. 1.56, those materially involved in the preparation and prosecution of a patent application have a duty of candor with the US Patent and Trademark Office (USPTO; http://www.uspto.gov), which requires the submission of relevant references and other information to a patent Examiner during patent prosecution. Information posted on WikiPatents.com is potentially part of that requirement. The review sections also include assessments of potential commercial applicability and indeed of value of the application. This is extended into a section where patents can be offered for licensing or sale (http://www.wikipatents. com/marketplace.php). This is presently in an inchoate state, with featured examples mostly of designs rather than chemical
270
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) or biological patents, but the pace of change in community web sites (such as Wikipedia) can be alarmingly rapid. A repository of general interest in patent literature and intellectual property-related news and decisions, is available through IP News Flash at http://www.ipnewsflash.com. It is updated hourly with up-to-date information on patents and other intellectual property-related matters. The site offers an e-mail news feed with content delivered daily or monthly, as well. There are various other sites providing general information on patents, such as that from the USPTO about patents and patenting procedures (http://www.uspto.gov/web/offices/ pac/doc/general/). Various other useful bits of information about patent terms and procedures in other countries are available from Derwent (http://www.derwent.com/). A comparison table of web patent databases from Duke University is presented by the university library to help users compare the various resources available and assess which is best for each individual’s needs (http://www.lib.duke.edu/chem/patcomp.htm and at http://www.lib.duke.edu/reference/subjects/ patents.htm).
A. Japanese patents The Japanese Patent Office’s (JPO) web site (http://www. jpo.go.jp/) now provides certain information free in English. It provides more information in Japanese including free legal status information from the JPO’s intellectual property digital library (IPDL) pages. There are five methods of searching the IPDL patent database (http://www. ipdl.inpit.go.jp/homepg_e.ipdl). The form for retrieval of patent images based on patent number is somewhat difficult to navigate (http://www4.ipdl.inpit.go.jp/Tokujitu/tjsogodben.ipdl?N0000 115), but is backed up by a useful help area at http://www.ipdl.inpit.go.jp/HELP/tokujitu/db_ en/help_index.html. There are further resources available for English translations of Japanese patent documents. Paterra, Inc., (http:// www.paterra.com) is pleased to present the InstantMT™ service for Japanese patents on the internet. The InstantMT™ service retrieves the requested patent by number and rapidly provides a translated version which is rendered for download in a two-column formatted Acrobat PDF file. The system covers all Japanese Kokai (A documents) published after January 1, 1993 and all granted Japanese patents (Toroku) published since May 27, 1996. New documents are entered into the system within 2 weeks of being published in Japan. In a related development, Protys (http://www.protys.info/) provides a full text English database of the latest Japanese patents in a specialty current awareness database. Paterra have prepared a guide (http://cxp.paterra. com/FTerms/Guide.htm) to compare the Japanese F-term system with the International Patent Classification (IFC) (http://www.wipo.int/classifications/fulltext/new_ipc/) and File Index classification system. The guide is a browseable
front-end to the concordance provided by the Japan Patent Office. Users can scroll through IPC/FI classes and view precisely how they are mapped onto the F-Term themes. Each entry is linked to the corresponding F-Term table on the JPO site so the user can move directly from an IPC/FI classification to the corresponding F-Term table.
XV. TOXICOLOGY There is a good introduction to the subject, which should be interesting to students, called “Toxicology Tutor” at http://sis.nlm.nih.gov/toxtutor.htm. There are a number of toxicology databases available on the internet, and recently there has been an amalgamation of the best in the form of TOXNET (http://toxnet.nlm. nih.gov/), a cluster of databases on toxicology, hazardous chemicals, and related areas. The web site provides access to an impressive array of files containing factual information related to the toxicity and other hazards of chemicals. Users can readily extract toxicology data and literature references, as well as toxic release information, on particular chemicals. Alternately, one can perform a search with subject terms to identify chemicals that cause certain effects. A variety of display and sorting options are available. A summary of further resources in this area (including subsets of the TOXNET database array) is provided in Table 13.9. PubMed now links to chemicals found in TOXNET’s HSDB http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB through a LinkOut feature which appears when a user clicks the “Links” part of any PubMed reference, shown on the far right hand side of the screen. The links now appear as specific chemical names. LinkOut provides PubMed users with connections to full-text articles, consumer health information, and supplementary data, related to a PubMed citation. Another useful resource related to toxicology is sample data for LD50 i.v. (mice) for a short list of compounds at http://toxicity.molecularsociety.org/LD50.htm. The source of these data is the Molecular Society, an “international forum for the collection and exchange of information for [those] who are actively involved in the multi-disciplinary field of Molecular Sciences.” NCMS has a rather interesting free and fairly extensive database of solvents that allows rather extensive input of physical property ranges and various user-specified limits such as “not a carcinogen” or “not listed on the Montreal protocol (ozone)” (http://solvdb.ncms.org/solvdb.htm). Fee-based resources include the updated US EPA Toxic Substance Control Act (TSCA) Chemical Inventory of 62,000 chemicals, which is available cross-referenced with Superfund Amendments and Reauthorization Act (SARA) Title III RCRA reporting requirements on CD-ROM. It features SARA III fields integrated with TSCA information and Adobe™ Acrobat™ (PDF) format for instant
TABLE 13.9
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) Type of database
Database name
URL
Description
Toxicology Data Files
HSDB (Hazardous Substances Data Bank)
http://toxnet.nlm. nih.gov/cgi-bin/sis/ htmlgen?HSDB
International Toxicity Estimates for Risk (ITER) Database
http://toxnet.nlm. nih.gov/cgi-bin/sis/ htmlgen?iter
Factual data bank of over 4,500 potentially hazardous chemicals. In addition to toxicity data, the file covers emergency handling procedures, environmental fate, human exposure, detection methods, and regulatory requirements. The data are fully referenced and peer-reviewed by expert toxicologists and other scientists. A new database within the TOXNET site that contains human health risk values from major organizations worldwide for over 600 chemicals of environmental concern. It is a product of Toxicology Excellence for Risk Assessment (TERA), a non-profit group whose mission is to protect public health by developing and communicating risk assessment values, improving risk methods through research, and educating the public on risk assessment issues.
IRIS (Integrated Risk Information System)
http://toxnet.nlm. nih.gov/cgi-bin/sis/ htmlgen?IRIS.htm
Toxicology Literature Files
Online database of the Environmental Protection Agency (EPA; http://www.epa.gov) containing carcinogenic and non-carcinogenic health risk information on over 500 chemicals. Data have been scientifically reviewed and represent EPA consensus.
CCRIS (Chemical http://toxnet.nlm. Carcinogenesis nih.gov/cgi-bin/sis/ Research htmlgen?CCRIS Information System)
Sponsored by the National Cancer Institute (NCI; http://www.nci. nih.gov/), CCRIS contains scientifically evaluated data derived from carcinogenicity, mutagenicity, tumor promotion and tumor inhibition tests on about 8,000 chemicals.
GENE-TOX (Genetic Toxicology)
http://toxnet.nlm. nih.gov/cgi-bin/sis/ htmlgen?GENETOX
Another EPA database, contains genetic toxicology test results on over 3,000 chemicals. Selectively reviewed for each of the test systems under evaluation. The GENE-TOX data bank is the product of these data review activities.
Columbia Environmental Research Center (CERC)
http://www.cerc. usgs.gov/data/ acute/acute.html
Acute toxicity of over 400 chemicals and 60 aquatic animals. The results have been provided from aquatic acute toxicity tests conducted by the USGS CERC. The acute toxicity test provides a relative starting point for hazard assessment of contaminants and is required for federal chemical registration programs for fungicides, rodenticides and pesticides. Data is organized and searchable by combinations of compound and species data (e.g. LC-50 data for various chemicals and exposure times).
TOXLINE
http://toxnet.nlm. nih.gov/cgi-bin/sis/ htmlgen?TOXLINE
Bibliographic database covering the biochemical, pharmacological, physiological, and toxicological effects of drugs and other chemicals; over 2.5 million citations, almost all with abstracts and/ or index terms and CAS registry numbers.
DART/ETIC (Development and Reproductive Toxicology/ Environmental Teratology Information Center)
http://toxnet.nlm. nih.gov/cgi-bin/ sis/htmlgen? DARTETIC.htm
Bibliographic database covering literature on teratology and other aspects of developmental toxicology, contains over 90,000 references to teratology literature published since 1965.
EMIC http://toxnet.nlm. (Environmental nih.gov/cgi-bin/sis/ Mutagen htmlgen?EMIC Information Center)
Bibliographic database containing over 100,000 references on chemical, biological, and physical agents that have been tested for genotoxicity, covers literature published since 1965.
Toxic Releases Files (TRI)
TRI (Toxic Chemical Release Inventory Files)
http://www.epa. gov/tri/
Contains information on the annual estimated releases of toxic chemicals to the environment for 1995–1997. It is based on data submitted to the EPA from industrial facilities throughout the US and includes the amounts of certain toxic chemicals released into the environment on over 650 chemicals and chemical categories. Pollution prevention data are also reported.
Carcinogenicity
Carcinogenic Potency Project
http://potency. berkeley.edu/ listofpubs.topic. html
A useful resource related to carcinogenicity, includes a wide array of publications from the Carcinogenic Potency Project. The references include papers on methodological analysis of the relevance of carcinogenicity prediction from bioassays, species comparisons, target organs, mechanism of carcinogenesis, risk assessment techniques, possible cancer hazards of natural and synthetic chemicals, and causes and prevention of cancer.
272
CHAPTER 13 Web Alert: Using the Internet for Medicinal Chemistry
Wermuth's The Practice of Medicinal Chemistry (Edc. 2008) search/retrieval. For details see http://www.env-sol.com/ solutions/TSCASARA.HTML. Finally, for prediction of toxicological parameters, the OSIRIS Property Explorer available at http://www.chemexper.com/tools/propertyExplorer/main.html (and listed above in the section “Prediction and calculation of molecular properties”) has some interesting capabilities.
XVI. METASITES AND TECHNOLOGY SERVICE PROVIDER DATABASES Metasites providing access to a range of resources devoted to chemistry that have not already been referred to are listed in Table 13.10.
TABLE 13.10 Title
URL
Comments
Chemistry Section of the WWW Virtual Library
http://www.liv.ac.uk/Chemistry/ Links/links.html
Thorough, up-to-date and accurate listings of a large number of chemistry sites. The chemical database section at http://www.liv.ac.uk/ Chemistry/Links/refdatabases.html gives details of a collection of about 80 chemical databases (among them: Analytical Abstracts, Beilstein, CCDC, CA Selects Plus, ChemFinder, DrugDB, FT-IR Library, STN, etc).
Organic Chemistry Resources Worldwide
http://www.organicworldwide.net
Created by Koen Van Aken, a Belgian chemist, a well organized and highly useful site for all engaged in synthetic organic chemistry research.
Caltech
http://library.caltech.edu/ collections/chemistry.htm
Practically an indispensible point of call for databases and search engines for chemistry.
Liege, Belgium
http://www.ulg.ac.be/libnet/ ud18.htm
An initiative from Simone Jérôme, chemistry librarian at the University of Liège, Belgium.
The Chemical Database Service (CDS)
http://cds.dl.ac.uk/
The CDS provides online access to numerous chemical databases, which are available free of charge to academics at UK universities. The chemistry links cover a large variety of topics (among them general information sites, reference databases, chemical sources, chemical web sites, UK universities, chemistry FTP sites). The solid-phase synthesis database is notable for its description of over 27,000 reactions described.
University of Cincinnati Online Database Collection
http://www.engrlib.uc.edu/ selfhelp/alphlist.htm
Links to engineering, biology and chemistry databases, etc. are listed on this important site.
Chemiedatenbanken
http://www. chemie-datenbanken.de/
An excellent collection of German and international chemical databases e.g. free resources, general collections, and commercial database providers.
CHEMINFO
http://www.indiana. edu/~cheminfo
Metasite for chemical information resources on the internet and elsewhere, originating from the Indiana University chemical information courses. Usage has increased from nearly 100,000 successful requests in 2000 to over a quarter of a million requests in 2006. The main information page currently available at Indiana is Selected Internet Resources for Chemistry (SIRCh), which includes about 31 chemistry resource guides available on the internet (http://www.indiana.edu/ ~cheminfo/ca_gcisd.html). The ` also includes a link to the Chemical Acronyms database (http://www.oscar.chem.indiana.edu/cfdocs/ libchem/acronyms/acronymsearch.html), which currently represents over 11,000 items linked to the full forms of the words.