[PDF Download] Open access databases and datasets for drug discovery methods principles in medicinal

Page 1


Open Access Databases and Datasets

for Drug Discovery Methods Principles in Medicinal Chemistry 1st Edition M. T. Przewosny

Visit to download the full and correct content document: https://textbookfull.com/product/open-access-databases-and-datasets-for-drug-discov ery-methods-principles-in-medicinal-chemistry-1st-edition-m-t-przewosny/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Heterocyclic chemistry in drug discovery 1st Edition Li

https://textbookfull.com/product/heterocyclic-chemistry-in-drugdiscovery-1st-edition-li/

Medicinal Chemistry and Drug Design 1st Edition Deniz Ekinci

https://textbookfull.com/product/medicinal-chemistry-and-drugdesign-1st-edition-deniz-ekinci/

Synthetic methods in drug discovery Volume 2 Blakemore

https://textbookfull.com/product/synthetic-methods-in-drugdiscovery-volume-2-blakemore/

Computational Methods for GPCR Drug Discovery 1st Edition Alexander Heifetz (Eds.)

https://textbookfull.com/product/computational-methods-for-gpcrdrug-discovery-1st-edition-alexander-heifetz-eds/

Green Approaches in Medicinal Chemistry for Sustainable Drug Design 1st Edition Bimal K. Banik (Editor)

https://textbookfull.com/product/green-approaches-in-medicinalchemistry-for-sustainable-drug-design-1st-edition-bimal-k-banikeditor/

Drug Selectivity An Evolving Concept in Medicinal Chemistry 1st Edition Norbert Handler

https://textbookfull.com/product/drug-selectivity-an-evolvingconcept-in-medicinal-chemistry-1st-edition-norbert-handler/

Drug Discovery in Africa Impacts of Genomics Natural Products Traditional Medicines Insights into Medicinal Chemistry and Technology Platforms in Pursuit of New Drugs 1st Edition Barthélemy Nyasse (Auth.)

https://textbookfull.com/product/drug-discovery-in-africaimpacts-of-genomics-natural-products-traditional-medicinesinsights-into-medicinal-chemistry-and-technology-platforms-inpursuit-of-new-drugs-1st-edition-barthelemy-nyasse-au/

Foye s Principles of Medicinal Chemistry Thomas Lemke

https://textbookfull.com/product/foye-s-principles-of-medicinalchemistry-thomas-lemke/

Structural Biology in Drug Discovery Methods Techniques and Practices 1st Edition Jean-Paul Renaud (Editor)

https://textbookfull.com/product/structural-biology-in-drugdiscovery-methods-techniques-and-practices-1st-edition-jean-paulrenaud-editor/

MethodsandPrinciplesinMedicinalChemistry

Editedby R.Mannhold,H.Buschmann,J.Holenz

EditorialBoard

G.Folkers,H.Timmermann,H.vandeWaterbeemd,J.BondoHansen PreviousVolumesoftheSeries

Bachhav,Y.(Ed.)

TargetedDrugDelivery

2022

ISBN:978-3-527-34781-0 Vol.82

Alza,E.(Ed.)

FlowandMicroreactor TechnologyinMedicinal Chemistry

2022

ISBN:978-3-527-34689-9 Vol.81

Rübsamen-Schaeff,H.,andBuschmann,H. (Eds.)

NewDrugDevelopmentfor KnownandEmergingViruses

2022

ISBN:978-3-527-34337-9 Vol.80

Gruss,M.(Ed.) SolidStateDevelopmentand ProcessingofPharmaceutical Molecules

Salts,Cocrystals,andPolymorphism

2021

ISBN:978-3-527-34635-6 Vol.79

Plowright,A.T.(Ed.)

TargetDiscoveryandValidation MethodsandStrategiesfor DrugDiscovery

2020

ISBN:978-3-527-34529-8

Vol.78

Swinney,D.,Pollastri,M.(Eds.) NeglectedTropicalDiseases DrugDiscoveryand Development

2019

ISBN:978-3-527-34304-1 Vol.77

Bachhav,Y.(Ed.) InnovativeDosageForms DesignandDevelopmentat EarlyStage 2019

ISBN:978-3-527-34396-6

Vol.76

Gervasio,F.L.,Spiwok,V.(Eds.) BiomolecularSimulationsin Structure-basedDrugDiscovery

2018

ISBN:978-3-527-34265-5

Vol.75

Sippl,W.,Jung,M.(Eds.) EpigeneticDrugDiscovery

2018

ISBN:978-3-527-34314-0 Vol.74

Giordanetto,F.(Ed.) EarlyDrugDevelopment

2018

ISBN:978-3-527-34149-8 Vol.73

VolumeEditors

AntoineDaina

SIBSwissInstituteofBioinformatics 1015Lausanne Switzerland

MichaelPrzewosny Borngasse43 52064Aachen Germany

VincentZoete

SIBSwissInstituteofBioinformatics UNILUniversityofLausanneand LudwigInstituteforCancerResearch 1015Lausanne Switzerland

SeriesEditors

Prof.Dr.RaimundMannhold † Rosenweg7 40489Düsseldorf Germany

Dr.HelmutBuschmann Sperberweg15 52076Aachen Germany

Dr.JörgHolenz BIAL-Portela&Ca .,S.A. Av.SiderurgiaNacional 4745–457Coronado Portugal

CoverDesignandImages: SCHULZ Grafik-Design

Allbookspublishedby WILEY-VCH arecarefully produced.Nevertheless,authors,editors,and publisherdonotwarranttheinformation containedinthesebooks,includingthisbook, tobefreeoferrors.Readersareadvisedtokeep inmindthatstatements,data,illustrations, proceduraldetailsorotheritemsmay inadvertentlybeinaccurate.

LibraryofCongressCardNo.: appliedfor BritishLibraryCataloguing-in-PublicationData Acataloguerecordforthisbookisavailable fromtheBritishLibrary.

Bibliographicinformationpublishedby theDeutscheNationalbibliothek TheDeutscheNationalbibliotheklists thispublicationintheDeutsche Nationalbibliografie;detailedbibliographic dataareavailableontheInternet at <http://dnb.d-nb.de>

©2024WILEY-VCHGmbH,Boschstraße12, 69469Weinheim,Germany

Allrightsreserved(includingthoseof translationintootherlanguages).Nopartof thisbookmaybereproducedinanyform–by photoprinting,microfilm,oranyother means–nortransmittedortranslatedintoa machinelanguagewithoutwrittenpermission fromthepublishers.Registerednames, trademarks,etc.usedinthisbook,evenwhen notspecificallymarkedassuch,arenottobe consideredunprotectedbylaw.

PrintISBN: 978-3-527-34839-8

ePDFISBN: 978-3-527-83047-3

ePubISBN: 978-3-527-83048-0

oBookISBN: 978-3-527-83049-7

Typesetting Straive,Chennai,India

Contents

SeriesEditorsPreface xiii

RaimundMannhold–APersonalObituaryfromtheSeriesEditors xvii APersonalForeword xxi

1OpenAccessDatabasesandDatasetsforComputer-Aided DrugDesign.AShortListUsedintheMolecularModelling GroupoftheSIB 1

AntoineDaina,MaríaJoséOjeda-Montes,MaiiaE.Bragina,Alessandro Cuozzo,UteF.Röhrig,MartaA.S.Perez,andVincentZoete

References 30

PartISmallMolecules 39

2PubChem:ALarge-ScalePublicChemicalDatabaseforDrug Discovery 41 SunghwanKimandEvanE.Bolton

2.1Introduction 41

2.2DataContentandOrganization 42

2.3ToolsandServices 45

2.3.1PubChemSearch 45

2.3.2SummaryPages 48

2.3.3LiteratureKnowledgePanel 49

2.3.42Dand3DNeighbors 50

2.3.5ClassificationBrowser 51

2.3.6IdentifierExchangeService 52

2.3.7ProgrammaticAccess 52

2.3.8PubChemFTPSiteandPubChemRDF 53

2.4Drug-andLead-LikenessofPubChemCompounds 54

2.5BioactivityDatainPubChem 56

2.6ComparisonwithOtherDatabases 57

2.7UseofPubChemDataforDrugDiscovery 58

2.8Summary 59 Acknowledgments 60 References 60

3DrugBankOnline:AHow-toGuide 67 ChristenM.Klinger,JordanCox,DeniseSo,TeiraStauth,MichaelWilson, AlexWilson,andCraigKnox

3.1Introduction 67

3.2DrugBank 68

3.2.1OverviewofDrugBank 68

3.2.2DrugBankDatasets 69

3.2.2.1DrugCards:AnOverviewandNavigationGuide 70

3.2.2.2Identification 70

3.2.2.3Pharmacology 71

3.2.2.4Categories 73

3.2.2.5Properties 73

3.2.2.6Targets,Enzymes,Carriers,andTransporters 73

3.2.2.7References 77

3.3Protocols 77

3.3.1GeneralWorkflows 77

3.3.1.1UsingDrugBankOnline’sSearchFunctionality 77

3.3.1.2UsingDrugBankOnline’sAdvancedSearchFunctionality 80

3.3.1.3BrowsingDrugsUsingDrugBankOnline’sDrugCategories 83

3.3.2IdentifyingChemicalsandRelevantSequences 86

3.3.2.1SearchingUsingChemicalStructureSearch 86

3.3.2.2UsingSequenceSearchtoFindSimilarTargets 89

3.3.3ExtractingDrugBankDatasetsforML 93

3.4ResearchUsingDrugBank 94

3.5DiscussionandConclusions 95 References 96

4BioisostericReplacementforDrugDiscoverySupportedbythe SwissBioisostereDatabase 101 AntoineDaina,AlessandroCuozzo,MartaA.S.Perez,andVincentZoete

4.1Introduction 101

4.1.1ConceptofIsosterismandBioisosterism 101

4.1.2Classicalvs.Non-classicalBioisostereandFurtherMolecular Replacements 102

4.1.3BioisostericReplacementinDrugDiscovery 105

4.2ConstructionandDisseminationofSwissBioisostere 106

4.2.1IntentionandRequirements 106

4.2.2BioactivityData 107

4.2.3NonsupervisedMatchedMolecularPairAnalysis 108

4.2.4Database 108

4.2.5WebInterface 109

4.3ContentofSwissBioisostere 111

4.3.1GlobalContent 111

4.3.2BiologicalandChemicalContexts 112

4.3.3FragmentShapeDiversity 113

4.4UsageofSwissBioisostere 115

4.4.1WebsiteUsage 115

4.4.2MostFrequentRequests 117

4.4.3ExamplesRelatedtoDrugDiscovery 117

4.4.3.1UseCases 117

4.4.3.2ReplacingUnwantedChemicalGroups 118

4.4.3.3OptimizationofPassiveAbsorptionandBlood–BrainBarrier Diffusion 122

4.4.3.4ReductionofFlexibility 124

4.4.3.5ReductionofAromaticity/EscapefromFlatland 128

4.5ConclusiveRemarks 133 Acknowledgment 133 References 133

PartIIMacromolecularTargetsandDiseases 139

5TheProteinDataBank(PDB)andMacromolecularStructure DataSupportingComputer-AidedDrugDesign 141 DavidArmstrong,JohnBerrisford,PreetiChoudhary,LukasPravda, JamesTolchard,MihalyVaradi,andSameerVelankar

5.1Introduction 141

5.2SmallMoleculeDatainProteinDataBank(PDB)Entries 142

5.2.1WhatDataareinthePDBArchive? 142

5.2.2DefinitionofSmallMoleculesinOneDep 145

5.3SmallMoleculeDictionaries 146

5.3.1wwPDBChemicalComponentDictionary(CCD) 146

5.3.2ThePeptideReferenceDictionary 147

5.4AdditionalLigandAnnotationsinthePDBArchive 148

5.4.1LinkageInformation 148

5.4.2Carbohydrates 149

5.5ValidationofLigandsintheWorldwideProteinDataBank (wwPDB) 150

5.5.1VariousCriteriaandSoftwareUsedforValidatingLigandinValidation Reports 150

5.5.2IdentificationofLigandofInterest(LOI) 151

5.5.3GeometricandConformationalValidation 152

5.5.4LigandFittoExperimentalElectronDensityValidation 152

5.5.5AccessingwwPDBValidationReportsfromPDBeEntryPages 154

5.5.6OtherPlannedImprovementstoEnhanceLigandValidation 154

5.6PDBeToolsforLigandAnalysis 155

5.6.1LigandInteractions 155

5.6.1.1ClassifyingLigandInteractions 155

5.6.1.2DataAvailability 156

5.6.2LigandEnvironmentComponent 156

5.6.3ChemistryProcessandFTP 158

5.6.4PDBeChemPages 158

5.7Ligand-RelatedAnnotationsinthePDBe-KB 158

5.7.1IntroductiontoPDBe-KB 158

5.7.2DataAccessMechanismsforLigand-RelatedAnnotations 160

5.7.3Ligand-RelatedAnnotationsontheAggregatedViewsofProteins 162

5.8CaseStudy:UsingPDBDatatoSupportDrugDiscovery 164

5.9ConclusionsandOutlook 165

5.9.1UpcomingFeaturesandImprovements 166 References 167

6TheSWISS-MODELRepositoryof3DProteinStructuresand Models 175 XavierRobin,AndrewMarkWaterhouse,StefanBienert,GabrielStuder, LeilaT.Alexander,GerardoTauriello,TorstenSchwede,andJoanaPereira

6.1Introduction 175

6.2SMRDatabaseContentandModelProviders 176

6.2.1PDB 177

6.2.2SWISS-MODEL 177

6.2.3AlphaFoldDatabase 179

6.2.4ModelArchive 180

6.3ProteinFeatureAnnotationandCross-ReferencestoComputational Resources 181

6.3.1StructuralFeatures,Ligands,andOligomers 181

6.3.2SWISS-MODELassociatedtools 182

6.3.3WebandAPIAccess 183

6.4QualityEstimatesandBenchmarking 188

6.5BindingSiteConformationalStates 189

6.6SMRandComputer-AidedStructure-basedDrugDesign 190

6.7ConclusionandOutlook 191 References 193

7PDB-REDOinComputational-AidedDrugDesign(CADD) 201 IdadeVries,AnastassisPerrakis,andRobbieP.Joosten

7.1HistoryandConcepts 201

7.1.1X-rayStructureModels 201

7.1.2PDB-REDODevelopment 202

7.1.2.1FirstUniformity 203

7.1.2.2AutomaticRebuildingofProteinBackboneandSideChains 203

7.1.2.3AutomatedModelCompletionApproaches 204

7.1.2.4SystematicIntegrationofStructuralKnowledge 205

7.1.2.5OverviewofPDB-REDOPipeline 205

7.2StructureImprovementsbyPDB-REDO 206

7.2.1ParametrizationandRebuildingEffectsonSmallMoleculeLigands 206

7.2.1.1Re-refinementImprovesLigandConformation 206

7.2.1.2SideChainRebuildingImprovesLigandBindingSites 207

7.2.1.3HistidineFlipandImprovedLigandParameterization 208

7.2.2BuildingofProteinLoopsandLigandsintoProteinStructure Models 210

7.2.2.1LoopBuildingCompletesaBindingSiteRegion 210

7.2.2.2LoopBuildingResultsinImprovedBindingSites 211

7.2.2.3BuildingnewCompoundsintoDensity 212

7.2.3NucleicAcidImprovementsbyPDB-REDO 213

7.2.4GlycoproteinStructureModelRebuilding 214

7.2.5MetalBindingSites 214

7.2.6LimitationsofthePDB-REDODatabank 216

7.3AccessthePDB-REDODatabankandMetadata 218

7.3.1DownloadingandInspectingIndividualPDB-REDOEntries 218

7.3.2DataAvailableinPDB-REDOEntries 220

7.3.3UsageoftheUniformandFAIRValidationData 220

7.3.4CreatingDatasetsfromthePDB-REDODatabank 222

7.3.5SubmittingStructureModelstothePDB-REDOPipeline 223

7.4Conclusions 223

AcknowledgmentsandFunding 224 ListofAbbreviationsandSymbols 224 References 225

8PharosandTCRD:InformaticsToolsforIlluminatingDark Targets 231

KeithJ.Kelleher,TimothyK.Sheils,StephenL.Mathias,Dac-TrungNguyen, VishalSiramshetty,AjayPillai,JeremyJ.Yang,CristianG.Bologa, JeremyS.Edwards,TudorI.Oprea,andEwyMathé

8.1Introduction 231

8.2Methods 233

8.2.1DataOrganization 233

8.2.1.1TargetAlignment 234

8.2.1.2DiseaseAlignment 234

8.2.1.3LigandAlignment 234

8.2.1.4DataandUIUpdates 235

8.2.2ProgrammaticAccessandDataDownload 235

8.2.3UIOrganization 235

8.2.3.1ListPages 236

8.2.3.2DetailsPages 236

8.2.3.3Search 238

8.2.3.4Tutorials 240

8.2.4AnalysisMethodsWithinPharos 240

8.2.4.1SearchingforLigands 240

8.2.4.2FindingTargetsbyAminoAcidSequence 241

8.2.4.3FindingTargetswithSimilarAnnotations 241

8.2.4.4FindingTargetswithPredictedActivity 241

8.2.4.5EnrichmentScoresforFilterValues 241

x Contents

8.3UseCases 242

8.3.1HypothesizingtheRoleofaDarkTarget 242

8.3.1.1PrimaryDocumentation 242

8.3.1.2ListAnalysis 247

8.3.1.3DownloadingData 251

8.3.1.4VariationsonthisUseCase 251

8.3.2CharacterizingaNovelChemicalCompound 251

8.3.2.1FindingPredictedTargets 252

8.3.2.2AnalyzingSimilarLigands 254

8.3.2.3LigandDetailsPages 256

8.3.2.4VariationsonthisUseCase 257

8.3.3InvestigatingDiseases 260

8.4Discussion 262 Funding 264 References 264

PartIIIUsers’PointsofView 269

9MiningforBioactiveMoleculesinOpenDatabases 271 GuillemMacip,JúliaMestres-Truyol,PolGarcia-Segura, BryanSaldivar-Espinoza,SantiagoGarcia-Vallvé,and GerardPujadas

9.1Introduction 271

9.2MainToolsforVirtualScreening 272

9.2.1ADMETandPAINSFiltering 272

9.2.2Protein–LigandDocking 274

9.2.3PharmacophoreSearch 275

9.2.4Shape/ElectrostaticSimilarity 276

9.2.5Protein-StructureDatabases 277

9.2.6TheProteinDataBank 278

9.2.7ThePDB-REDODatabank 278

9.2.8TheSWISS-MODELRepository 279

9.2.9TheAlphaFoldProteinStructureDatabase 279

9.3ValidatingBindingSiteandLigandCoordinatesinThree-Dimensional ProteinComplexes 280

9.4DatabasesforSearchingNewDrugs 281

9.4.1COCONUT 281

9.4.2GDBs 282

9.4.3ZINC20 282

9.5DatabasesofBioactiveMolecules 282

9.5.1TheBindingDBDatabase 283

9.5.2PubChem 283

9.5.3ChEMBL 284

9.6DatabasesofInactive/DecoyMolecules 285

9.6.1CollectingExperimentallyInactiveCompoundsfromPubChem 285

9.6.2CollectingPresumedInactiveCompoundsfromDecoyDatabases 285

9.6.3BuildingCustom-BasedDecoySets 286

9.7MainMetricsforEvaluatingtheSuccessofaVirtualScreening 286

9.8ConcludingRemarks 288

References 289

10OpenAccessDatabases–AnIndustrialView 299 MichaelPrzewosny

10.1Academicvs.IndustrialResearch 299

10.2Scaffold-Hopping 310

10.3Virtual-Screening 311

Abbreviations 312

References 313

Index 317

SeriesEditorsPreface

Theworkofnaturalscientistsinallscientificdisciplineshaschangedalotinthe recentdecade.Accesstoinformationanddatainscientificdatabaseshasbecome essentialforeffectiveandefficientwork.Inadditiontothecommercialdatabases fromprofessionalproviders,openaccessdatabasesfromassociationsandinstitutes havealsobecomeincreasinglypopularformedicinalchemistsinacademiaandpharmaceuticalindustry.

Thelatestvolumeofourbookseriesentitled“OpenAccessDatabasesandDatasets forDrugDiscovery”providesanexemplaryoverviewofsomeofthemostimportant databasesandapplicationsthatshouldbeofgreathelptothemedicinalchemistry communityasinformationsourceandmotivationtoexplorethegrowingandexistingfieldofopenaccessdatabasesandusefuldatasets.Thebooksurelywillsupport alltypeofscientistsworkinginthefieldofdrugdiscoveryandmedicinalchemistry whoneedinformationfromdatabasestosupporttheirwork.

Itallstartedinthelate2010swhenRaimundMannholdsuggestedthistopicinour annualeditormeetingsasalong-cherishedheart’sdesire.Andin2019hewassuccessfultoconvinceAntoineDainaandVincentZoete,whoarewell-knownscientists inthisfield,toeditsuchabook.

Afterindustrialpracticeascomputationalchemistforagrochemicalresearchand academicexperienceaslecturerandresearcherindrugdiscovery,Antoinejoined theSIBSwissInstituteofBioinformaticsin2012.Heisnowseniorscientistin theMolecularModelingGroupinchargeofmethodologicaldevelopmentsinthe SwissDrugDesignprogram,ofsupportingdrugdiscoveryprojectsandofteaching computer-aideddrugdesign.

VincentjoinedtheSIBSwissInstituteofBioinformaticsin2004.HewastheassociategroupleaderoftheSIBMolecularModelingGroupuntil2017andthengroup leaderfrom2017untilnow.Besidesthis,VincentisAssociateProfessorinmolecularmodelingattheUniversityofLausannesince2022andcoordinator/developer ofSwissDock.ch,SwissParam.ch,SwissBioisostere.ch,SwissTargetPrediction.ch, SwissSimilarity.ch,andSwissADME.ch.

TheSwissInstituteofBioinformaticshostingtheClick2DrugWebpageprovides themostcomprehensivecollectionofworldwideavailabledatabasesandapplication toolsinthefieldofdrugdiscovery.

AtthesametimeHelmutBuschmannrememberedhisoldcolleagueMichael PrzewosnyfromourtimetogetheratGrünenthalGmbHlocatedinAachen.Michael hasover20yearsofexperienceinpharmaceuticalresearchanddrugdiscovery. Heheldseveralpositionsaslaboratorymanagerinmedicinalchemistryand processdevelopment.Michaelhascreatedacompetitiveintelligencedepartmentat GrünenthalinAachen,wherehewasresponsibleforsuchdatabaseandapplication toolsasservicefortheentireresearchorganization.Ittooksometimetoconvince Michaelforsuchabookassignment,butfinallyhewasmotivatedtojointhegroup ofAntoineandVincent.

Togethertheybroughtmanyyearsofexperienceinthedevelopmentofsuch databases,reinforcedbymanyyearsofexperienceinusingsuchdatabasesinthe fieldofdrugdiscovery.

JointlyAntoine,Vincent,andMichaelstartedinlate2019withacollectionofideas andagreedafterlongdiscussionsonausefulstructureofsuchabroadresearcharea withanenormousrapiddevelopment.

Afterasuccessfulstart,therewasnowalong,rocky,andchaoticroadaheadof themaccompaniedbytheCovidpandemic.Thereweremanydisappointments,but theynevergaveup.Theyworkedveryhardandwerealwayssuccessfultofindaway forward.Thenanothermajorsetbackfollowed.Raimunddiedunexpectedlyaftera shortillnessonOctober14,2022,andwasnotabletoseethesuccessfulcompletion. We,theserieseditorsandthepublisher,areallthemorepleasedthattheeditors havededicatedthisvolumetohismemory.Raimundaccompaniedthebookseries fromthefirstvolumepublishedearlyas1993untilhisdeathandwasabletoenjoy thepublishingofvolume81inJune2022“FlowandMicroreactorTechnologyin MedicinalChemistry”editedbyEstherAlza,shortlybeforehepassedaway.

Theeditorsmanagedtoeditabookwiththesupportofthebestauthorsinthefield toprovidetheinterestedreaderwithadetailedoverviewofopen-accessdatabases anddatasetsfordrugsfromearlytolatephasesofthelengthydrugdiscoveryprocess. Insuchrapidlygrowingresearchfield,thepictureofopendatabasesanddatasets remainsalwaysincomplete.

Itisallthemoreimportantthattheauthorsmanagedtoeditavolumethatdepicts awidevarietyofresourcesfromthemostgeneralisttomostspecializedones.Sucha volumecanneverbeacompleteandencyclopediccollectionofallexistingdatabases, butitactsmuchmorelikeguidanceandmotivationtodealwithsuchdatabasesand theresultingpossibilities.Indifferentchaptersthemostrelevanttoolsanddatabases andappsaredescribedbyexplainingcasestudiesandexamplestogetaneasyand directintroductiontousethesetools.

Antoine,Vincent,andMichaelhavemanagedwithgreatpassiontopersuade andencouragetheauthorstoprovideasmuchpracticaladviceaspossiblewith step-by-stepguidesandhelpfulusecasesfortheinterestedreaderofalldisciplines involvedindrughunting,bringingnew,powerful,andsafemedicinestothe patients.Theselectedandcompileddatacollectionofdatabasesandappsprovides astrongcomprehensivebasisasakindofguidedtourthroughtheverydensejungle ofpublicavailablescientificinformation.

Theeditorshavestructuredtheguidancebookin3thematicsectionsand 10chapters.Antoine,Vincent,andMichaelhavenotonlyeditedthebookbut alsocontributedwiththeirlongexperienceandgreatknowledgeasauthorsand co-authorsofsomeofthechapters.

AsageneralintroductiontothevolumeeditedbyAntoineandVincentthemselves withthesupportoftheircoworkers,acomprehensiveoverviewtothetopicanda richannotatedlistofdatasourcesentitled“OpenAccessDatabasesandDatasetsfor Computer-AidedDrugDesign.AShortListUsedintheMolecularModellingGroup oftheSIB”isprovided.ThecoreofthebookpresentedinpartIandIIconsistsof sevendiverseandhigh-qualityresourcespresentedbytheirdevelopers,categorized insmallmoleculesormacromoleculartargetsanddiseases.

PartIisdedicatedtosmallmoleculesandcontainsthreechaptersdescribingthe mostpopulardatabasesinthisfield:

● PubChem:ALarge-ScalePublicChemicalDatabaseforDrugDiscovery,editedby SunghwanKimandEvanE.Bolton.

● DrugBankOnline:AHow-toGuide,editedbyChristenM.Klinger,JordanCox, DeniseSo,TeiraStauth,MichaelWilson,AlexWilson,andCraigKnox

● BioisostericReplacementforDrugDiscoverySupportedbytheSwissBioisostere Database,editedbyAntoineDaina,AlessandroCuozzo,MartaA.S.Perez,and VincentZoete

PartIIfocusesonmacromoleculartargetsanddiseasescomprisingthefollowing chapters:

● TheProteinDataBank(PDB)andMacromolecularStructureDataSupporting Computer-AidedDrugDesign,editedbyDavidArmstrong,JohnBerrisford,Preeti Choudhary,LukasPravda,JamesTolchard,MihalyVaradi,andSameerVelankar

● TheSWISS-MODELRepositoryof3DProteinStructuresandModels,edited byXavierRobin,AndrewWaterhouse,StefanBienert,GabrielStuder,LeilaT. Alexander,GerardoTauriello,TorstenSchwede,andJoanaPereira

● PDB-REDOinComputational-AidedDrugDesign(CADD),editedbyIdadeVries, AnastassisPerrakis,andRobbieP.Joosten

● PharosandTCRD:InformaticsToolsforIlluminatingDarkTargets,editedby KeithJ.Kelleher,TimothyK.Sheils,StephenL.Mathias,Dac-TrungNguyen, VishalSiramshetty,AjayPillai,JeremyJ.Yang,CristianG.Bologa,JeremyS. Edwards,TudorI.Oprea,andEwyMathé

PartIIIofthebookisdedicatedtouser’spointofviewworkinginacademiaand pharmaceuticalindustrywithtwochapters:

● MiningforBioactiveMoleculesinOpenDatabases,editedbyGuillemMacip, JúliaMestres-Truyol,PolGarcia-Segura,BryanSaldivar-Espinoza,Santiago Garcia-Vallvé,andGerardPujadas

● OpenAccessDatabases–AnIndustrialView,editedbyMichaelPrzewosny Overall,afteralonganddifficultjourneyanoutstandingcollectionofdatabaseand datasetinformationisprovidedthatwillenabletheinterestedreaderaneasystartto usesuchtoolsortoexpandtheirscopebyanextensionofthepreviousapplication.

Withthis,we–theserieseditors–sincerelybelievethatreaderswouldbehighly benefitedfromthecontentsofthisbook.

WewouldliketothankAntoine,Vincent,andMichaeltoputthebrilliantcontributionsoftheauthorstogetherandtoguidethemthroughanadventurousjourney; allauthorsfortheirbrilliantcontributionsandtheirpatience;andFrankWeinreich, StefanieVolk,andtheircoworkers,especiallyAswiniM.fromthecontentanalysis andrefinementteam,fortheirgreatsupporttomakethisbookfinallypossible.

Aachen,Porto,andBonn,July2023

HelmutBuschmann

JörgHolenz

ChristaMüller

RaimundMannhold–APersonalObituaryfromtheSeries Editors

Source:http://www.raimund-mannhold.de/curriculum-vitae/

RaimundMannholddiedonOctober14,2022,afterashortandseriousillnessat theageof74.Nevertheless,thenewsofhisdeathcameasagreatsurprisetohis immediatefamilyandtous.Raimundaccompaniedthebookseries“Methodsand PrinciplesinMedicinalChemistry”fromthefirstvolumepublishedasearlyas1993 untilhisdeathandwasabletoenjoythepublishingofvolume81inJune2022entitled“FlowandMicroreactorTechnologyinMedicinalChemistry”editedbyEsther Alza,shortlybeforehepassedaway.

Establishedin1993,theseries“MethodsandPrinciplesinMedicinalChemistry” hasbecomeacrucialsourceofinformationwithinthemedicinalchemistrycommunityandbeyond.Authorsandeditorsoftheseriescomefrompharmaceutical industryaswellasfromacademicinstitutions,fosteringamoreactiveexchange betweenthesedomains.

Overtime,Raimundfoundsupportfromanumberofinternationallyrenowned expertsandentrepreneursinmedicinalchemistry.PovlKrogsgaard-Larsen,Hendrik Timmerman,HugoKubinyi,andGerdFolkersasretiredserieseditorshadadecisive influenceonthebookseriesand,likeRaimund,havecontributedtoitbecominga figureheadformedicinalchemistryworldwide.

ThefollowingpictureshowsRaimund(middle)withGerdFolkers(left)andHugo Kubinyiduringthecelebrationofthe25thvolumeofthebookseriesin2005.

xviii RaimundMannhold–APersonalObituaryfromtheSeriesEditors

Fromtheverybeginning,theseriesfocusedontopicalvolumescoveringhotconceptsandtechnologies,andthereaderwillnotmissanyimportanttopicinthe field.Therangeoftopicsisasdiverseasarethechallengesfacingmoderndrug developers,spanningthefieldsoforganicchemistry,pharmacology,toxicology,life science,andanalytics,thelatteralsoincludingbioinformatics,chemoinformatics, andproteomics.

Raimund’sheartbeatforhisbookseries,andhewasnowtheonlyfoundingeditor sincethepublicationofthefirstvolume30yearsago(1993);nowitmustliveon withoutRaimund,notasbefore,butitwillcontinuetoliveoninordertopreserve hislegacy.ThatistheobligationofthecurrentserieseditorsChristaMüller,Jörg Holenz,andHelmutBuschmann.

Ourcommongoalwastobeabletocelebratevolume100together;wewerejust abletopublishvolume81together,butwithoutRaimund,withouthiscommitment, andwithouthisstrongwilltodocumenttheknowledgeofmedicinalchemistryof ourtime,itwillbenotaneasytasktocontinueasusual.Withouthimthereisahard roadaheadofustofulfillhislegacyandwithgreatsadnesstocontinuewithouthim. Butweseeatthesametimeitasagreatobligationtocontinuethebookseriesinhis spirit.

Raimund’slifewasshapedbypharmaceuticalscience.Hewasbornin1948in Haltern(NorthRhineWestphalia,Germany).From1970to1973hestudiedpharmacyattheFrankfurtUniversity,receivedhisdoctoratein1977fromtheUniversityofDüsseldorf,andin1982RaimundreceivedtheVeniaLegendiforthesubject

RaimundMannhold–APersonalObituaryfromtheSeriesEditors xix

Physiology.In1990hewaspromotedtotheprofessorofMolecularDrugResearch atHeinrich-HeineUniversityinDüsseldorfuntilhisretirementonJuly9,2012. Themostimportantstagesofhisscientificcareercanbesummarizedasfollows1 :

1970–1973

October1973–September1987

July1977

December1982

StudyofPharmaceuticalSciencesattheJohann-Wolfgang vonGoetheUniversitätFrankfurt/Main

ScientificassistantattheDepartmentofClinicalPhysiology, Heinrich-Heine-UniversitätDüsseldorf

PhDattheDepartmentofClinicalPhysiology (Heinrich-Heine-UniversitätDüsseldorf,Prof.Dr.R. Kaufmann).Thesis:InvestigationsontheCa-antagonistic modeofactionandthestructure-activityrelationshipsof verapamil

Habilitation,conferredbytheMedicalFacultyofthe Heinrich-Heine-UniversitätDüsseldorf.Titleofmonograph: Ca-antagonistsofthealiphaticaminetypeandstructurally relatedheart-activedrugs–investigationson pharmacologicalandphysicochemicalproperties

Since1984 ContributingEditorof“DrugsofToday”and“Drugsofthe Future”

January1989–October1990

November1990–July2012

Since1993

GuestscientistattheDepartmentforPharmacochemistry, VrijeUniversiteit,Amsterdam,NL(Prof.Dr.Henk Timmerman)

ProfessorshipforMolecularDrugResearchatthe Heinrich-Heine-UniversitätDüsseldorfuntilhisretirement onJuly9,2012

Editorofthebookseries“MethodsandPrinciplesin MedicinalChemistry,”Wiley-VCH,Weinheim,togetherwith HugoKubinyiandHenkTimmerman(andsince2001with GerdFolkers)

Since2001 RegionaleditorofMini-ReviewsinMedicinalChemistry

Since2005 EditorialboardmemberofMedicinalChemistryandCurrent Computer-AidedDrugDesign

October–November2011

VisitingprofessorattheSchoolofPharmaceuticalSciences, UniversityofGeneva,UniversityofLausanne,Switzerland

Hisworkastheserialeditorofhisbookserieswillperpetuatehismemoryinthe pharmaceuticalcommunityworldwide.Hisbookserieshasnowestablisheditself asaninternationallyrecognizedstandard,andmillionsofscientistswillcontinueto seehisnameandappreciatehisworksinthefuture.

WiththedeathofRaimundweloseapartofthespiritofthebookseries,which isverydifficulttogetover.Buthisfootprintonthevolumespublishedsofarwillbe documentedforeverandthusremainavaluablepartofscholarship.

1http://www.raimund-mannhold.de/curriculum-vitae/

xx RaimundMannhold–APersonalObituaryfromtheSeriesEditors

DearRaimund,inadditiontoyourcontent-relatedinput,wewillalsomissthe extremelypreciseplanningforyourbelovedbookseries.

Wepromisetocontinueyourandnowourbookseries“MethodsandPrinciples inMedicinalChemistry”inyourspirit,evenbeyondvolume100.

Withdeepsadness,butfilledwiththethoughtofcarryingyourspiriton,

Bonn,Porto,andAachen,July2023

APersonalForeword

Whenwethinkaboutcomputerstoassistdrugdiscovery,whatcomestomindfor mostofusarethealgorithmsandgraphicstocalculateandvisualizeallsortsof molecularproperties.Whatislessobviousistheknowledgethatcanbeproduced fromthedataitself.Today,alargeamountandavastdiversityofdatarelatedto medicinalchemistryanddrugdiscoveryareavailable.Withafewclicks,anyonecan freelyaccessdownloadablerawdatasetsorbrowsemoresophisticatedstructured databases.

Thisbookaimstoprovidethereaderwithadetailedoverviewofopen-access databasesanddatasetsfordrugdiscovery.Whilethepictureisinevitablyincomplete,itdepictsawidevarietyofresourcesfromthemostgeneralisttothemost specialized.Thevolumebeginswitharichannotatedlistofdatasourcesconsideredofimportancefor(computer-aided)drugdiscoveryandconcludeswithargued userperspectives.Thecoreofthebookconsistsofsevendiverseandhigh-quality resourcespresentedbytheirdevelopers,categorizedin Smallmolecules or Macromoleculartargetsanddiseases.Wehaveencouragedtheauthorstoprovideasmuch practicaladviceaspossiblewithstep-by-stepguidesandhelpfulusecasesformedicinalchemists.Herewewouldliketoexpressourdeepgratitudetoalltheexpert contributorsfortheirremarkablecommitmentandadmirablepatience.

Itallstartedin2019whenthelateProfessorRaimundMannholdcontactedusfor thisproject.Thebookisdedicatedtohismemory.

TheprocessitselfhasbeenalongandchaoticjourneythroughtheCOVID-19 pandemic.

Letuswarmlythanktheserieseditor,Dr.HelmutBuschmannandpeopleat Wiley-VCHwithoutwhomthiswouldnothavebeenpossible,inparticular,Dr. FrankWeinreich,StefanyVolk,SatvinderKaur,andAswiniMurugadass. Wewishyouapleasantandinstructivereading!

AntoineDaina,MichaelPrzewosny,andVincentZoete

OpenAccessDatabasesandDatasetsforComputer-Aided DrugDesign.AShortListUsedintheMolecularModelling GroupoftheSIB

AntoineDaina 1 ,MaríaJoséOjeda-Montes 1 ,MaiiaE.Bragina 2 ,Alessandro Cuozzo 2 ,UteF.Röhrig 1 ,MartaA.S.Perez 1 ,andVincentZoete 1,2

1 SIBSwissInstituteofBioinformatics,MolecularModelingGroup,QuartierUNIL-Sorge,Bâtiment Amphipôle,CH-1015Lausanne,Switzerland

2 UniversityofLausanne,LudwigInstituteforCancerResearch,DepartmentofOncologyUNIL-CHUV,Route delaCorniche9A,CH-1066Epalinges,Switzerland

Theroleofcomputer-aideddrugdesign(CADD)inmoderndrugdiscovery[1–15]is tosupportitsvariousprocesses,includinghitfinding,hit-to-lead,leadoptimization, andtheactivitiespreludingtopreclinicaltrials,throughnumerousinsilicopredictorsandfilters.Thesetoolshaveawidevarietyofobjectives,suchasenrichingthe familiesofmoleculesthatwillbesubmittedtoexperimentalscreeningwithpotentiallyactivecompounds,identifyingmoleculesthatmaybeproblematicsuchastoxic moietiesorthosewithnonspecificactivities,generatingideasonthechemicalmodificationstobemadetothecompoundstoincreasetheiraffinityforthetherapeutic targetortoimprovetheirpharmacokinetics[16–19],orfinallyassistinginthevariousselectionprocessesaimedatidentifyingandpromotingthemostpromising molecules.Theseapproachesaregenerallydividedintotwomainfamilies[20].

Structure-basedapproaches[8,21–23]usethethree-dimensionalstructureofthe targetedprotein,forexample,toestimateviatheuseofadockingsoftwarehow andhowstronglyasmallmoleculewillbindtoit.Avoidingthenecessitytoresort solelytoanexperimentalmethod(e.g. X-raycrystallography,NMR,orcryo-electron microscopy)toobtainthisinformationmakesitpossibletoprocessalargenumberofmoleculesveryquicklyandatamoderatecost.Inturn,thisinformationcan beusedtodeterminehowtomodifythechemicalstructureofasmallmoleculeto optimizerationallytheintermolecularinteractionswiththeproteintarget.Itisthen possibletoselectthemostpromisingcompoundsforexperimentalvalidations,creatingacyclicoptimizationprocess,thankstothisfeedbackloopbetween insilico and invitro approaches.

Ligand-basedapproachestakeadvantageofalreadyknownmoleculeswithcertainbioactivitiesorphysicochemicalproperties,inordertoderivetheinformation necessarytopredictthebioactivityorpropertiesofothercompounds,realorvirtual. Indeed,CADDhasbeenapioneeringresearchareainthedevelopmentandapplicationofmachinelearningmethods[24–32],withtheemergence,asearlyasthe OpenAccessDatabasesandDatasetsforDrugDiscovery,FirstEdition. EditedbyAntoineDaina,MichaelPrzewosny,andVincentZoete. ©2024WILEY-VCHGmbH.Published2024byWILEY-VCHGmbH.

1OpenAccessDatabasesandDatasetsforComputer-AidedDrugDesign 1960s[33],ofquantitativestructure–activityrelationships(QSAR[34])orquantitativestructure–propertyrelationships(QSPR).

Toperformthesetasks,CADDbenefitsfromnumerousdatabasesanddatasets ofsmallmolecules,bioactivitiesandbiologicalprocesses,3Dstructuresofsmall compoundsandbiomacromolecules,ormolecularproperties–someofwhichbeing relatedtopharmacokineticsortoxicity[13,35–38].Createdin1971,theProteinData Bank(PDB)[39],whichstoresthethree-dimensionalstructuraldataoflargebiologicalmoleculessuchasproteinsandnucleicacids,isaprecursorinthefieldof freelyandpubliclyavailabledatabaseswithpossibleapplicationsinCADD.CurrentlymanagedbythewwPDB[40]organizationanditsfivemembers,RCSBPDB [41],PDBe[42],PDBj[43],EMDB[44]andBMRB[45],thePDBcontinuestoprovide theCADDcommunitywithnumerousvaluable3Dstructuresoftherapeuticallyrelevantproteinsintheapoformorincomplexwithsmalldrug-likemolecules,which canbeusedtonurturestructure-basedapproaches.Severalsubsetsinvolvingsuch structureshavebeencreatedovertime,forinstance,toprovidereferencesetsto benchmarkdockingsoftware,suchastheAstex[46]ortheIridium[47]datasets. Foraverylongtime,ligand-basedapproachesweregenerallylimitedtotheuseof smalldatasets,collectedonacase-by-casebasisduringspecificdrugdesignprojects, thusprecludingtheirapplicationbeyondthebuildingoffocusedmodelswithlimitedscope.Thissituationdramaticallychangedduringthe2000swiththeriseof large-scaledatabasescreatedspecificallyforthebenefitofdrugdiscoveryingeneralandCADDinparticular.ChEMBL[48,49]releasedin2008orPubChem[50] in2004,whichcollectmoleculesandtheiractivitiesinbiologicalassayssystematicallyextractedfrommedicinalchemistryliterature,patentpublications,orexperimentalhigh-throughputscreeningprograms,arecertainlyamongtheforerunners ofthistrend.SuchdatabasespavedthewayforCADDapproachesaddressing,for instance,thepredictionofbioactivitiesonaverylargescale,includingligand-based methods.ZINC[51],freelyaccessiblefrom2004,isanotherlarge-scaledatabaseof smallmolecules,thistimepreparedespeciallyforvirtualscreening.Thisimportant resourcefocusesonthecompilationandstorageofcommerciallyavailablechemical compounds.DrugBank[52],whosefirstversiondatesbackto2006,isanexampleof adatabasegatheringnumerouscuratedandhigh-qualityinformationaboutagroup ofmoleculesofbiologicalinterest,inthiscasemainlybutnotexclusively,approved ordevelopmentaldrugs.AlthoughsmallerthanChEMBLorPubChemforinstance, thistypeofresources,becauseofthequality,thestructureandthepracticalityofthe informationprovided,alsoplaysancriticalroleinthedevelopmentofnewCADD techniquesandfilters,orformoredirectapplicationsinvirtualscreening.

ResearchersworkinginCADDcanbeconsideredtohavetwomainactivities: oneconsistsindesigning,validating,andbenchmarkingnew insilico approaches, theotherisapplyingexistingtoolstosupportdrugdescoveryprojects.Thenature ofthedatabasesreflectsthisduality.Someareclearlyorientedtowardanapplicativeusage.Withvirtualscreeninginmind,thisisthecaseforresourcesgatheringa largeamountofcommercialorvirtualmolecules,suchasZINC[51]orGDB-17[53], whosemainpurposeistobeusedasasourceofmoleculestofeedvirtualscreening campaigns.Attheoppositeendofthespectrum,wefindmolecularsetsconstructed specificallyforbenchmarkingscreeningmethods,suchasDUD-E[54]orDEKOIS [55].Thesecontainalimitednumberofcompounds,knowntobeactiveorinactive

oncertainproteintargets,andcarefullychosentoavoidanybiasinmanymolecularpropertiesthatwouldallowascreeningsoftwaretoidentifytheactiveones tooeasily.Betweenthesetwoextremes,wecanfinddatabases,suchasChEMBL, PubChem,orTCRD/Pharos[56],containingalargenumberofknownbioactive molecules.Thesegeneralistdatabasescannotonlybeusedtodevelopalargerange ofCADDmethods,includingscreeningorreversescreeningapproaches,suchas SimilarityEnsembleApproach(SEA)[57,58]orSwissTargetPrediction[59,60],but alsoconstituteasourceof real moleculestobevirtuallyscreened.

Bydefinition,theinterestformanyCADD-relateddatabasesliesintheircapacity tostoreapossiblylargequantityofmolecules,alongwithusefulannotations,and intheirefficientdiffusiontothepublic.Thiswasmadepossiblebythedevelopment anddisseminationofwidelyacceptedspecificfileformats.Themostcommonfile forrepresentingmoleculesasstringsareinSMILES[61,62]andInChI[63,64]formats.Theseone-lineformatshavethegreatadvantageofusinglittlediskormemory resources,facilitatingthestorage,andrapidtransferoflargenumbersofmolecules. Itshouldbenoted,however,thatseveralSMILESstringscanrepresentthesame molecule.Thiscanbeproblematicandpotentiallygenerateredundancywhen compoundsfromdifferentsourcesaregathered.Toavoidthiskindofsituation,it ispossibletoproducecanonicalSMILESbyawell-chosensoftware,whichareby definitionuniqueforeachmolecule,ortousetheUniChem[65]databasethatprovidespointersbetweenthemoleculesofmostcommondatabases.Structure-based approaches,suchasmoleculardocking,3Dfingerprinting[66],orpharmacophores [67,68],requireaspatialrepresentationofsmallmolecules.Themostfrequently employedfiledefinitions,includingtridimensionalatomiccoordinates,arethe StructuralDataFile(SDF),theMDLMol,andTriposMol2formats.Compounds areoftenavailableinsuchformatsinthemajorsmall-moleculedatabases,such asZINC[51],Chemspider[69],orDrugBank[52],whichallowtheirdirectuse in3D-basedapproaches.Otherformatsareavailabletostore3Dstructuresof biomacromolecules,takingadvantageofthefactthatlargebiomoleculesarebased ontherepetitionofasmallnumberofresidues.ThePDBandmmCIF[70]formats areamongthestandardsandprovidedbythewwPDBconsortium,andbyother majordatabasesof3Dstructuresofmacromolecules,includingPDBRedo[71,72], aswellastheSWISS-MODEL[73],MODBASE[74],andAlphaFold[75,76] repositoriesofstructuralmodels.

TobevaluableinthecontextofCADD,adatabaseshouldmeetseveralcriteriain additiontothenatureofitscontent.Thesecriteriaareveryclosetothefindability, accessibility,interoperability,andreuse(FAIR)principles[77].

First,adatabasemustbemaintainedandmadeavailableforthelongterm, ideallyviaapersistentURL,sothatitcanbeemployedforsustainableprojects anddevelopments.Unfortunately,alargefractionofnewdatabasesanddatasets disappearonlyafewyearsaftertheirinitialrelease,duetolackofresourcesto maintainthemorlackofinterest.Attwoodandcolleaguesstudiedthe18-year survivalstatusof326databasespublishedbefore1997andfoundthat62.3%were dead,14.4%werearchived(andnotupdated),andonly23.3%werestillaliveunder theiroriginalidentityorafterrebranding[78].Thisfirstanalysiswasindependently confirmedbyFinkelsteinetal.whofoundthatofthe518originaldatabases publishedinthejournal Database between2009and2016,35%werealreadyno

1OpenAccessDatabasesandDatasetsforComputer-AidedDrugDesign longeraccessiblein2020[79],andbyImkerwhoobservedthatamongthe1727 databasespublishedbetween1991and2016in NucleicAcidsResearch’s“Database Issue,”40%weredeadin2018[80].Theyfoundthatdatabaseswithhighercitation countsandfromresearcherswithhigherh-indexwithinrenownedinstitutions weremorelikelytosurvive.Inadditiontostraightforwardonlineaccessibilityover thelongterm,databasesshouldideallyberegularlyupdatedtoincludethelatest usefulinformation.Inordertomakethisprocessefficientandcompatiblewith thereproducibilityoftheresearchprojectsthatneedthedatabases,theseupdates shouldbeclearlyversionedandpreviousreleasesarchivedforthelongterm.In addition,uniqueidentifiersshouldbeassignedtoindividualdatabaseentriesand maintainedpersistentlyacrossallversions.

Second,thedatabaseshouldbeeasilysearchableandretrievable.Mostofthose mentionedinthischaptercanbeaccessedviaaGraphicalUserInterface(GUI) developedtobrowseandsearchdataeasily,forinstancebytypingkeywordsina searchbox,providingaquerymoleculeinSMILESformatorasafile,orbydrawingcompoundsormolecularfragmentswithinamolecularsketcher.Suchinterfaces areparticularlyefficienttosearchforinformationaboutafewgivenmoleculesand todisplaytheminawell-designedgraphicalrepresentation.However,suchinterfacesbecomeinefficientwhenaprojectrequiresalargeamountofdata,whichwill eventuallyhavetobeanalyzedbytheuserthroughdedicatedscriptsandprograms. Inthesecases,theinformationshouldbesearchableandmassivelyretrievableby commandlines,forexample,withanAPIthroughspecificsearchanddownload commands.Ideally,thewholedatabasecontentshouldbedownloadableforlocal usebyclassicdatabasemanagementsystems,suchasMySQLorPostgreSQL,in ordertobeeasilydeployedandmanagedonthecomputersofadvancedusers.

Third,CADDdatabasesanddatasetsshoulduserenownedandwell-acceptedformatstostoreanddelivermoleculestotheusers.Asmentionedabove,severalstrings andfileformatsarealreadyavailableforthispurpose,includingSMILES,InChI, SDF,Mol,Mol2,PDB,andmmCIF.Theseformatsarereadilyprocessedbymost CADDsoftware,makingtheuseofthedatabasesordatasetscontentstraightforward.

Fourth,tomaketheinteroperabilitybetweendatabaseseasier,theyshouldinclude asmuchaspossiblewell-accepteduniqueidentifiersfromlong-standingkeyplayers inthefield.Forinstance,theUniProt[81]IDprovidesavaluablesolutiontoidentify proteins.Inaddition,smallmoleculescanbeidentifiedinmanycasesbyoneofthe identifierspresentinUniChem.Thisdoesnotpreventtheauthorsofnewdatabases tocreatetheirownuniqueidentifiers,formoreflexibility.Forexample,ChEMBL usesitsownuniqueidentifierforproteinsandensuresinteroperabilitywithother resourcesbyprovidingafilemappingtheseChEMBLIDswithUniProt[81]IDs.

Fifth,accurateinformationregardingtheoriginofthedatastoredinthedatabase ordatasetshouldbeprovided,aswellasadetaileddescriptionofthemanualor automaticcurationprocessesappliedtoit.

Sixth,databasesanddatasetsshouldhaveaclearusagelicense.Free-and open-accessresourcesareoftenfavoredinacademicenvironment,wherefunding maybelimited,becausetheyincreasethevisibility,maximizetheuseandimpactof data,andfacilitatethereuseofresearchresults(Table1.1).

Table1.1 Listofdatabasesanddatasets,alongwiththeirmainusageandURL.Whenappropriate,thekeypurposeisreminded:trainingandvalidation ofnewapproaches,orapplicativeusage.VS:virtualscreening. NameMainusagesDescriptionAvailability/URLReferences

Databasesofexperimentallydetermined3Dstructuresofbiomacromoleculesandrelatedresources PDBeDocking Structure-basedVS Targetprediction Bindingfreeenergyestimation (Application,training,and validation)

PDB-RedoDocking Structure-basedVS Targetprediction Bindingfreeenergyestimation (Application,training,and validation)

ChemicalComponent Dictionary Docking Ligand-basedVS Structure-basedVS(Application, training,andvalidation)

LigandExpoDocking Ligand-basedVS

Structure-basedVS(Application, training,andvalidation)

AsamemberofthewwPDB,PDBe collects,organizes,anddisseminates dataonbiologicalmacromolecular structures.Containsmorethan 190,000entries.

ThePDB–REDOdatabankcontains optimizedversionsofexistingPDB entrieswithelectrondensitymaps,a descriptionofmodelchanges,anda wealthofmodelvalidationdata.

Externalreferencefiledescribingall residueandsmallmolecule componentsfoundinPDBentries, maintainedbythewwPDB Foundation.

Provideschemicalandstructural informationaboutsmallmolecules withinthestructureentriesofthe ProteinDataBank(about37,000as of2022).MaintainedbytheRCSB.

Canbefreelysearchedhere: https://www.ebi.ac.uk/pdbe RESTAPI:https://www.ebi .ac.uk/pdbe/pdbe-rest-api Canbedownloadedhere: https://www.ebi.ac.uk/pdbe/ services/ftp-access [42]

Canbefreelysearchedhere: https://pdb-redo.eu APIanddownloadhere: https://pdb-redo.eu/ download-info.html

Freelyaccessiblehere: https://www.wwpdb.org/ data/ccd

[71,72]

[82]

Freelyaccessiblehere:http:// ligand-expo.rcsb.org Downloadableherein mmCIF,SDF,MOL,PDB, SMILES,andInChi:http:// ligand-expo.rcsb.org/lddownload.html [83] (continued)

Table1.1 (Continued)

NameMainusagesDescriptionAvailability/URLReferences

PDBeChemDocking Ligand-basedVS Structure-basedVS(Application, training,andvalidation)

Databasesofmodeled3Dstructuresofbiomacromolecules AlphaFold ProteinStructure Database Docking Structure-basedVS(Application)

ModBaseDocking Structure-basedVS(Application)

SWISS-MODEL Repository Docking Structure-basedVS(Application)

Provideschemicalandstructural informationaboutsmallmolecules withinthestructureentriesofthe ProteinDataBank(morethan 38,000asof2022).Maintainedby PDBEurope.

AlphaFoldDBprovides200million protein3Dstructurespredictedby AlphaFold,coveringtheproteomes of48organismsincludinghumans.

Databaseofannotatedcomparative proteinstructuremodelsobtained usingtheMODELLERprogram.

Databaseofannotated3Dprotein structuremodelsgeneratedbythe SWISS-MODELhomology-modeling pipeline.Contains2,250,005models fromSWISS-MODELforUniProtKB targetsaswellas180,763structures fromPDBwithmappingto UniProtKB.

Freelyaccessiblehere: https://www.ebi.ac.uk/pdbesrv/pdbechem/

Canbefreelysearchedhere: https://alphafold.ebi.ac.uk Setsofmodelscanbe downloadedhere:https:// alphafold.ebi.ac.uk/download

[75,76]

Canbefreelysearchedhere: https://modbase.compbio .ucsf.edu [74]

Canbefreelysearchedhere: https://swissmodel.expasy .org/repository [73]

Databasesofexperimentallydetermined3Dstructuresofsmallmolecules

CambridgeStructure Database(CSD) Ligand-basedVS Structure-basedVS

CODLigand-basedVS Structure-basedVS

Dataandinformationonproteins

UniProtKB/Swiss-ProtTargetprediction Targetvalidation

TheCSDrepositorycontainsover onemillionaccurate3Dsmall moleculesoforganicand metal–organicstructuresfromx-ray andneutrondiffractionanalysis. Simplesearchisfree,moreadvanced optionsrequirealicense.

COD(CrystallographyOpen Database)providesacollectionof 491,107crystalstructuresoforganic, inorganic,metal–organic compounds,andminerals,excluding biopolymers.

UniProtKB/Swiss-Protisamanually annotated,nonredundantprotein sequencedatabasetoprovideall knownrelevantinformationabouta particularprotein. Bycombiningnumerousresources, thedatabasebecameoneofthe majortoolsforbiomedicalresearch anddrugtargetidentification.

Freelyaccessiblehere: https://www.ccdc.cam.ac.uk/ solutions/csd-core/ components/csd/

Freelyaccessiblehere:http:// www.crystallography.net/cod

Canbefreelysearchedhere: https://www.uniprot.org Canbedownloadedfreely here:https://www.uniprot .org/uniprotkb?query=*

(continued)

Table1.1 (Continued) NameMainusagesDescriptionAvailability/URLReferences

neXtProtTargetprediction Targetvalidation

TCRD/PharosLigand-basedVS Structure-basedVS Targetprediction Bindingfreeenergy estimation(Application,training, andvalidation)

Dataandinformationondrugs

neXtProtisacomprehensive human-centricdiscoveryplatform, offeringitsusersaseamless integrationandnavigationthrough protein-relateddata,forinstance, functionrelationshipswithother diseasesandmolecularpartnerslike drugsorchemicals.

Asection,inparticular,isdedicated toprotein–proteinandprotein–drug interactiondata.

TheTargetCentralResource Database(TCRD)contains informationabouthumantargets, withspecialemphasisonpoorly characterizedproteinsthatcan potentiallybemodulatedusing smallmoleculesorbiologics. Pharosisthewebinterface.

CancerDrugs_DBLicensedcancerdrugsOpenaccessdatabaseoflicensed cancerdrugswithlinkstoDrugBank andChEMBL.IDsaswellas informationontargetsand associateddisease.

Canbefreelysearchedhere: https://www.nextprot.org [87]

Freelyaccessiblehere:https:// pharos.nih.gov/ TCRDcanbedownloadedhere: http://juniper.health.unm.edu/ tcrd/download/

Freelyaccessiblehere:http:// www.redo-project.org/cancerdrugs-db/ Amachine-readableversionof thisdatabasecanbe downloadedhere:https:// acfdata.coworks.be/ cancerdrugsdb.txt

TheReDOdatabaseof repurposingcandidatesin oncologycanbeaccessedhere: https://www.anticancerfund .org/en/redo-db

Another random document with no related content on Scribd:

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.

Section 3. Information about the Project Gutenberg Literary Archive Foundation

The Project Gutenberg Literary Archive Foundation is a non-profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact

Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation

Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no

prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project Gutenberg™ electronic works

Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.

Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.

Most people start at our website which has the main PG search facility: www.gutenberg.org.

This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.