[FREE PDF sample] New theory of discriminant analysis after r fisher advanced research by the featur by Ebook Home

New

Theory of

Discriminant

Analysis After R Fisher Advanced Research by the Feature Selection Method for Microarray Data 1st Edition Shuichi Shinmura (Auth.)

Visit to download the full and correct content document: https://textbookfull.com/product/new-theory-of-discriminant-analysis-after-r-fisher-adv anced-research-by-the-feature-selection-method-for-microarray-data-1st-edition-shuic hi-shinmura-auth/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

High-dimensional Microarray Data Analysis: Cancer Gene Diagnosis and Malignancy Indexes by Microarray Shuichi Shinmura

https://textbookfull.com/product/high-dimensional-microarraydata-analysis-cancer-gene-diagnosis-and-malignancy-indexes-bymicroarray-shuichi-shinmura/

Metaprogramming in R: Advanced Statistical Programming for Data Science, Analysis and Finance 1st Edition Thomas Mailund

https://textbookfull.com/product/metaprogramming-in-r-advancedstatistical-programming-for-data-science-analysis-andfinance-1st-edition-thomas-mailund/

Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science 1st Edition Timothy Masters

https://textbookfull.com/product/modern-data-mining-algorithmsin-c-and-cuda-c-recent-developments-in-feature-extraction-andselection-algorithms-for-data-science-1st-edition-timothymasters/

Functional Programming in R: Advanced Statistical Programming for Data Science, Analysis and Finance 1st Edition Thomas Mailund

https://textbookfull.com/product/functional-programming-in-radvanced-statistical-programming-for-data-science-analysis-andfinance-1st-edition-thomas-mailund/

Feature Engineering and Selection A Practical Approach for Predictive Models 1st Edition Max Kuhn

https://textbookfull.com/product/feature-engineering-andselection-a-practical-approach-for-predictive-models-1st-editionmax-kuhn/

Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint Mark K. Hinders

https://textbookfull.com/product/intelligent-feature-selectionfor-machine-learning-using-the-dynamic-wavelet-fingerprint-markk-hinders/

Advanced Object-Oriented Programming in R: Statistical Programming for Data Science, Analysis and Finance 1st Edition Thomas Mailund

https://textbookfull.com/product/advanced-object-orientedprogramming-in-r-statistical-programming-for-data-scienceanalysis-and-finance-1st-edition-thomas-mailund/

Introduction

to Data Science Data Analysis and Prediction Algorithms with R 1st Edition By Rafael A. Irizarry

https://textbookfull.com/product/introduction-to-data-sciencedata-analysis-and-prediction-algorithms-with-r-1st-edition-byrafael-a-irizarry/

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist 1st Edition Thomas Mailund

https://textbookfull.com/product/beginning-data-science-in-rdata-analysis-visualization-and-modelling-for-the-datascientist-1st-edition-thomas-mailund/

Shuichi Shinmura

New Theory of Discriminant Analysis After R. Fisher

Advanced Research by the Feature-Selection Method for Microarray Data

NewTheoryofDiscriminantAnalysis AfterR.Fisher

ShuichiShinmura

ShuichiShinmura FacultyofEconomics

SeikeiUniversity

Musashinoshi,Tokyo

Japan

ISBN978-981-10-2163-3ISBN978-981-10-2164-0(eBook) DOI10.1007/978-981-10-2164-0

LibraryofCongressControlNumber:2016947390

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart ofthematerialisconcerned,speciﬁcallytherightsoftranslation,reprinting,reuseofillustrations, recitation,broadcasting,reproductiononmicroﬁlmsorinanyotherphysicalway,andtransmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthis publicationdoesnotimply,evenintheabsenceofaspeciﬁcstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthis bookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernorthe authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor foranyerrorsoromissionsthatmayhavebeenmade.

Printedonacid-freepaper

ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerScience+BusinessMediaSingaporePteLtd.

Preface

Thisbookintroducesthenewtheoryofdiscriminantanalysisbasedonmathematical programming(MP)-basedoptimallineardiscriminantfunctions(OLDFs)(hereafter, “theTheory”)afterR.Fisher.Thereare fiveseriousproblemsofdiscriminantanalysis inSect. 1.1.2.Idevelop fiveOLDFsinSect. 1.3.AnOLDFbasedonaminimum numberofmisclassification(minimumNM,MNM)criterionusingintegerprograming(IP-OLDF)revealsfourrelevantfactsinSect. 1.3.3. IP-OLDFtellsusthe relationbetweenNMandLDFclearlyinadditiontoamonotonicdecreaseofMNM. IP-OLDFandanOLDFusinglinearprograming(LP-OLDF)arecomparedwith Fisher ’sLDFandaquadraticdiscriminantfunction(QDF)usingIrisdatainChap. 2 andcephalo-pelvicdisproportion(CPD)datainChap. 3.However,becauseIP-OLDF maynot fi ndatrueMNMifdatadonotsatisfythegeneralpositionrevealedbystudent datainChap. 4 (Problem1),IdevelopRevisedIP-OLDF,RevisedLP-OLDF,and RevisedIPLP-OLDFthatisamixturemodelofRevisedLP-OLDFandRevised IP-OLDF.OnlyRevisedIP-OLDFcan findtrueMNMcorrespondingtoaninterior pointofoptimalconvexpolyhedron(optimalCP,OCP)definedonthediscriminant coefficientspaceinSect. 1.3. BecauseallLDFsexceptforRevisedIP-OLDFcannot discriminatecasesonthediscriminanthyperplaneexactly(Problem1),NMsofthese LDFsmaynotbecorrect.IP-OLDF findsSwissbanknotedatainChap. 6 havingsix variablesislinearlyseparabledata(LSD)andtwovariablessuchas(X4,X6)is minimumlinearlyseparablemodelbyexaminationofall63modelsmadebysix independentvariables.RevisedIP-OLDFconfirmsthisresult,later.Bymonotonic decreaseofMNM,16modelsincluding(X4,X6)arelinearlyseparablemodels.This factisveryimportantforustounderstandthegeneanalysis.OnlyRevisedIP-OLDF andahard-marginsupportvectormachine(H-SVM)candiscriminateLSDtheoretically(Problem2).Problem3isthedefectofgeneralizedinverseof variance-covariancematricesthatcausesatroubleforQDFandaregularizeddiscriminantanalysis(RDA).IsolveProblem3thatisexplainedbythepass/fail determinationsusing18examinationscoresinChap. 5.AlthoughthesedataareLSD, errorratesofFisher ’sLDFandQDFareveryhighbecausethesedatasetsdonotsatisfy Fisher ’sassumption.Thesefactstellusseriousproblemthatwehadbetter

re-evaluatedthediscriminantresultsofFisher ’sLDFandQDF.Inparticular,weshall re-evaluatethemedicaldiagnosis,andvariousratingsbecausethesedatahavethe sametypeoftestdatahavingmanycasesonthediscriminanthyperplane.Because Fisherneverformulatedtheequationofstandarderror(SE)oferrorratesanddiscriminantcoeffi cient(Problem4),Idevelopa100-foldcross-validationforsmall samplemethod(hereafter, “theMethod1”).TheMethod1offersthe95%confidence interval(CI)ofdiscriminantcoefficientanderrorrate.Moreover,Idevelopapowerful modelselectionproceduresuchasthebestmodelwithminimummeanoferrorratein thevalidationsamples(M2).BestmodelsofRevisedIP-OLDFarebetterthanother sevenLDFsusingsixdatasetsincludingJapanese-automobiledatainadditionto above fivedatasets.Therefore,wemisunderstandIestablishtheTheoryin2015. However,whenRevisedIP-OLDFdiscriminatessixmicroarraydatasets(thedatasets) inNovember2015,RevisedIP-OLDFcannaturallyselectfeatures.AlthoughRevised IP-OLDFcanmakefeature-selectionnaturallyforSwissbanknotedataand Japanese-automobiledatainChap. 7,Idonotthinkitisaveryimportantfactbecause thebestmodelofferstheusefulmodelselectionprocedureforcommondata.Over thantenyears,manyresearchersarestrugglingintheanalysisofgenedatasetsbecause therearehugenumbersofgenesanditisdifficultforustoanalyzebycommon statisticalmethods(Problem5).IdevelopaMatroskafeature-selectionmethod (hereafter, “theMethod2”)andLINGOprogram.TheMethod2revealsthedataset consistsseveraldisjointsmalllinearlyseparablesubspaces(smallMatroska,SMs) andotherhigh-dimensionalsubspacethatisnotlinearlyseparable.Therefore,wecan analyzeeachSMbyordinarystatisticalmethods.We fi ndProblem5inNovember 2015andsolveitinDecember2015.

Thebookrepresentsmylife'swork/research,towhichIhavededicatedover44 yearsofmylife.AftergraduatingfromKyotoUniversityin1971,Iwasemployed bySCSKCorp.inJapanasasystemintegrator.NaojiTuda,thegrandsonofthe second-generationgeneraldirectorTeigoIbaofSumitomoZaibatsu,wasmyboss andhebelievedthatmedicalengineering(ME)isanimportanttargetforthe information-processingindustries.Throughhisdecision,Ibecameamemberofthe projectfortheautomaticdiagnosticsystemofelectrocardiogram(ECG)datawith theOsakaCenterforCancerandCardiovascularDiseasesandNEC.Theproject leader,Dr.YutakaNomura,orderedmetodevelopthemedicaldiagnosticlogicfor ECGdatathroughtheFisher ’sLDFandQDF.AlthoughIhadhopedtobecomea mathematicalresearcherwhenIwasaseniorstudentinhighschool,Ifailedthe entranceexaminationofgraduateschoolatKyotoUniversitybecauseIspentmuch moretimepursuingtheactivitiesoftheswimmingclubintheuniversity. AlthoughIdidnotbecomeamathematicalresearcher,IstartedresearchwithME. TheresearchIconductedfrom1971to1974usingFisher ’sLDFandQDFwas inferiortohisexperimentaldecisiontreelogic.Initially,Ibelievedthatmystatisticalabilitywaspoor.However,IsoonrealizedthatFisher ’sassumptionwastoo strictformedicaldiagnosis.Iproposedtheearthmodel(Shinmura,1984)1 for 1SeethereferencesinChap. 1

medicaldiagnosisinsteadofFisher ’sassumption.Therefore,thisexperiencegave methemotivationtodeveloptheTheory.Shinmuraetal.(1973,1974)proposeda spectrumdiagnosisusingBayesiantheorythatwasthe ﬁrsttrialfortheTheory. However,logisticregressionwasmoresuitablefortheearthmodel.

Shimizuetal.(1975)requestedmetoanalyzephotochemicalairpollutiondata byHayashiquantiﬁcationtheory,andthisbecamemy ﬁrstpaper.Dr.Takaichirou Suzuki,leaderoftheEpidemiologyGroup,providedmewithseveralthemesfor manytypesofcancers(Shinmuraetal.1983).

In1975,ImetProf.AkihikoMiyakefromtheNihonMedicalSchoolatthe workshoporganizedbyDr.ShigekotoKaihara,ProfessorEmeritusoftheMedical SchoolofTokyoUniversity.MiyakeandShinmura(1976)studiedtherelationship betweenpopulationandsampleerrorrateinFisher ’sLDF.Next,Miyakeand Shinmura(1979)developedanOLDFbasedontheMNMcriterionbyaheuristic approach.ShinmuraandMiyake(1979)discriminatedCPDdatawithcollinearities. Afterwerevisedapapertwoorthreetimes,astatisticaljournalrejectedourpaper. However,MiyakeandShinmura(1980)wasacceptedbyJapaneseSocietyfor MedicalandBiologicalEngineering(JSMBE).FormereditorswhojudgedOLDF basedontheMNMcriterionoverestimatedthevalidationsample,andFisher ’sLDF didnotoverestimatethesamplebecauseFisher ’sLDFwasderivedfromthenormal distributionwithoutexaminationofrealdata.Iwasdeeplydisappointedthatmany statisticiansdislikedrealdatareviewandstartedtheirresearchfromanormal distributionbecauseitwasverycomfortableforthemwithouttheexaminationof realdata(lotuseating).However,IcouldnotdevelopasecondtrialoftheTheory becauseofpoorcomputerpowerandadefectintheheuristicapproach.

Shinmuraetal.(1987)analyzedthespeciﬁ csubstancemycobacterium(SSM, commonlyknownasMaruyamavaccine).From270,000patients,wecategorized 152,289cancerpatientsintofourpostoperativegroups.Thosepatientsthatwere administeredSSMwithinoneyearaftersurgeryweredividedintofourgroups everythreemonthsatthestartoftheSSMadministration.WeassumedthatSSMis onlywaterwithoutsideeffects,andthiswasthenullhypothesis.Thesurvivaltime forthe ﬁrstgroupwaslongerthanforthefourthgroupfromninemonthsto12 monthsaftersurgeryandthenullhypothesiswasrejected.

In1994,Prof.KazunoriYamaguchiandMichikoWatanabestronglyrecommendedmetoapplyforthepositionatSeikeiUniversity.Afterorganizingthe9th SymposiumofJSCSinSCSKatRyogokunearRyogokuKokugikaninMarch 1995,IbecameaprofessorattheEconomicDepartmentinAprilofthesameyear. Dr.TokuhideDoipresentedalong-termcareinsurancesystemthatemployeda decisiontreemethodasadvisedbyme.(DoctorKaiharaplannedthissystemasan advisortotheMinistryofHealthandWelfare,andIadvisedDr.Doitousethe decisiontree.)

In1997,Prof.TomoyukiTarumiadvisedmetoobtainadoctoratedegreein scienceathisgraduateschool.Withoutexaminingthepreviousresearch,IdevelopedIP-OLDFandLP-OLDFthatdiscriminatedIrisdata,CPDdata,and115

randomnumberdatasets.IP-OLDFfoundtworelevantfactsabouttheTheory. Therefore,weconﬁrmedtheMNMcriterionwasessentialforthediscriminant analysisandcompletetheTheoryin2015.TheTheoryisusefulforthegene datasetsassameastheordinarydatasets.Redearscandownloadallmyresearch fromresearchmapandTheoryfromresearchgate. https://www.researchgate.net/pro ﬁle/Shuichi_Shinmura http://researchmap.jp/read0049917/?lang=english

Musashinoshi,JapanShuichiShinmura

Acknowledgments

Iwishtothankallresearcherswhocontributedtothisbook:LinusSchrage,Kevin Cunningham,HitoshiIchikawa,JohnSall,NorikiInoue,KyokoTakenaka, MasaichiOkada,NaohiroMasukawa,AkiIshii,IanB.Jeffery,TomoyukiTarumi, YutakaTanaka,KazunoriYamaguchi,MichikoWatanabe,YasuoOhashi,Akihiko Miyake,ShigekotoKaihara,AkiraOoshima,TakaichirouSuzuki,Tadahiko Shimizu,TatuoAonuma,KunioTanabe,HiroshiYanai,TojiMakino,Jirou Kondou,HiroshiTakamori,HidenoriMorimura,AtsuhiroHayashi,IebunYun, HirotakaNakayama,MikaSatou,MasahiroMizuta,SouichirouMoridaira,Yutaka Nomura,NaojiTuda.

Iamgratefulformyfamilies,inparticular,thelegacyofmylatefather,who supportedtheresearch:OtojirouShinmura,ReikoShinmura,MakikoShinmura, HidekiShinmura,andKanaShinmura.

IwouldliketothankEditage(www.editage.jp )forEnglishlanguageediting.

1.3.2BeforeandAfterSVM

1.3.3IP-OLDFandFourNewFactsofDiscriminant

1.5.2Pass/FailDetermination

2IrisDataandFisher

2.3.1ComparisonofMNMandEightNMs

2.3.3LINGOProgram1:SixMP-BasedLDFs

2.4.1FourTrialstoObtainValidationSample

2.4.1.1GenerateTrainingandValidationSamples

2.4.1.220,000NormalRandomSampling

2.4.2BestModelComparison

3Cephalo-PelvicDisproportionDatawithCollinearities

3.3100-FoldCross-Validation

3.3.1BestModel

4.4.1ComparisonofMNMandNine

5.3.1MNMandNineNMs

5.3.2ErrorRateMeans(M1andM2)

5.4Pass/FailDeterminationbyExaminationScores(90%Level in2012)

5.4.1MNMandNineNMs

5.4.2ErrorRateMeans(M1andM2)

5.4.395%CIofDiscriminantCoef

5.5Pass/FailDeterminationbyExaminationScores(10%Level in2012)

5.5.1MNMandNineNMs

5.5.2ErrorRateMeans(M1andM2)

5.5.395%CIofDiscriminantCoef

6BestModelforSwissBanknoteData ..........................

6.1Introduction ..........................................

6.2SwissBanknoteData ...................................

6.2.1DataOutlook ....................................

6.2.2ComparisonofSevenLDFforOriginalData ...........

6.3100-FoldCross-ValidationforSmallSampleMethod

6.3.1BestModelComparison

6.3.295%CIofDiscriminantCoefﬁcient

6.3.2.1Considerationof27Models

6.3.2.2RevisedIP-OLDF

6.3.2.3Hard-MarginSVM(H-SVM) andOtherLDFs

6.4Explanation1forSwissBanknoteData

6.4.1MatroskainLinearlySeparableData

6.4.2Explanation1ofMethod2bySwissBanknoteData

6.5Summary ............................................

7.2.2ComparisonofNineDiscriminantFunctions forNon-LSD ....................................

7.2.3ConsiderationofStatisticalAnalysis

7.3100-FoldCross-Validation(Method1)

7.3.1ComparisonofBestModel

7.3.295%CIofCoefﬁcientsbySixMP-BasedLDFs

7.3.2.1RevisedIP-OLDFVersusH-SVM

7.3.2.2RevisedIPLP-OLDF,RevisedLP-OLDF, andotherLDFs

7.3.395%CIofCoefﬁcientsbyFisher ’sLDFandLogistic Regression

7.4MatroskaFeature-SelectionMethod(Method2)

7.4.1Feature-SelectionbyRevisedIP-OLDF ................

7.4.2Coefﬁ cientofH-SVMandSVM4 ....................

8MatroskaFeature-SelectionMethodforMicroarrayDataset (Method2) ...............................................

8.1Introduction ..........................................

8.2MatroskaFeature-SelectionMethod(Method2) ...............

8.2.1ShortStorytoEstablishMethod2 ...................

8.2.2ExplanationofMethod2byAlonetal.Dataset

8.2.2.1Feature-SelectionbyEightLDFs

8.2.2.2ResultsofAlonetal.Dataset UsingtheLINGOProgram

8.2.3SummaryofSixMicroarrayDatasetsin2016

8.2.4SummaryofSixDatasetsin2015

8.3ResultsoftheGolubetal.Dataset

8.3.1OutlookofMethod2bytheLINGOProgram3

8.3.2FirstTrialtoFindtheBasicGeneSets

8.3.3AnotherBGSintheFifthSM

8.4HowtoAnalyzetheFirstBGS ............................

8.5StatisticalAnalysisofSM1...............................

8.5.1One-WayANOVA

9LINGOProgram2ofMethod1 .............................

9.1Introduction

9.2Natural(Mathematical)NotationbyLINGO

9.3IrisDatainExcel

9.4SixLDFsbyLINGO

9.5DiscriminationofIrisDatabyLINGO

9.6HowtoGenerateResamplingSamplesandPrepareData inExcelFile

9.7SetModelbyLINGO

Symbols

StatisticalDiscriminantFunctionsbyJMP

JMPStatisticalsoftwaresupportedbytheJMPdivisionofSAS Institute,Japan

JMPscriptJMPscriptsolvesFisher ’sLDFandlogisticregressionby Method1

LDFlineardiscriminantfunctionssuchasFisher ’sLDF,logistic regression,twoOLDFs,threerevisedOLDFs,andthree SVMs

Fisher ’sLDFFisher ’slineardiscriminantfunctionunderFisher ’s assumption

LogistLogisticregression;inthetable, “Logist” isoftenused

QDF* Quadraticdiscriminantfunction

RDA* Regularizeddiscriminantanalysis

*QDFandRDAdiscriminateordinarydatainthisbook

MathematicalProgramming(MP)byLINGO andWhat’sBest!

What’sBest!Exceladd-insolver

LINGOMPsolverthatcansolveLP,IP,QP,NLP,andstochastic programming

LINGOProgram1LINGOthatsolvestheoriginaldatabysixMP-based LDFsexplainedin 2.3.3

LINGOProgram2LINGOthatsolvessixMP-basedLDFsbyMethod 1explainedinChap. 9

LINGOProgram3LINGOthatsolvessixMP-basedLDFsbyMethod2

LPLinearProgrammingdevelopsReviselLP-OLDF

IPIntegerProgrammingdevelopsRevisedIP-OLDF

QPQuadraticProgrammingdevelopsthreeSVMs

NLPNonlinearProgrammingdeﬁ nesFisher ’sLDF

MP-basedLDFs

SVMSupportvectormachine

H-SVMHard-marginSVM

S-SVMSoft-marginSVM

SVM4**

S-SVMforpenaltyc=10000

SVM1** S-SVMforpenaltyc=1

**Becausethereisnoruletodecideaproper “c”,we compareresultsbySVM4andSVM1

OLDF:OptimalLDF

LSDLinearlyseparabledata,MNMofwhichiszero;LSD includesseverallinearlyseparablemodels(orsubspaces) MatroskaIngeneanalysis,wecallalllinearlyseparablespaceand subspacesasMatroska BigMatroskathemicroarraydatasetisLSDandincludessmaller MatroskainitbymonotonicdecreaseofMNM

SMsmallMatroskafoundbyLINGOprogram3notexplained inthisbook

BGSbasicgenesetorsubspacethatisthesmallestMatroskain eachSM

NMNumberofmisclassi ﬁcations

MNMMinimumNM

CPConvexpolyhedron;theinteriorpointofCPhasunique NManddiscriminatessamecasesdeﬁnedbyIP-OLDF, notRevisedIP-OLDF

OCPOptimalCP;theinteriorpointofOCPhasuniqueMNM IP-OLDFOLDF-basedMNMcriterionusingIP;ifdataarenot generalposition,IP-OLDFmaynot ﬁndtrueMNM

RevisedIP-OLDFIt ﬁndstheinteriorpointofOCPandsolvesProblem1 LP-OLDFOLDFusingLP;oneofL1-normLDF

RevisedLP-OLDFOneofL1-normLDF;Althoughitisfasterthanother MP-basedLDFs,itisweakforProblem1

RevisedIPLP-OLDFAmixturemodelofRevisedLP-OLDFinthe ﬁrststepand RevisedIP-OLDFinthesecondstep

DATA

Datancasesbyp-independentvariables xi ithp-independentvariablevector(fori=1,…,n) yi Objectvariable;yi=1forclass1andyi=-1forclass2

Hi(b)Hi(b)=yi × (txi × b +1)isalinearhyperplane(fori=1, …,n)anddividep-dimensionalcoefﬁcientspaceinto ﬁ nite CP(twohalf-planessuchasHi(b)<0andHi(b)>0.)

Hi(b)<0Minushalf-planeofHi(b):IfHi(bk)<0,Hi(bk)=yi × (txi × bk +1)=yi × (tbk × xi +1)<0andcase xi is misclassiﬁed.Ifinteriorpoint bk islocatedinh-minus half-plane,NM=h.ThisLDFmisclassi ﬁesthesame h–cases

OrdinaryorcommonData

IrisdataFisherevaluatesFisher ’sLDFbythesedata CPDdataCephalo-pelvicdisproportiondatawithcollinearities

StudentdataPass/faildeterminationusingstudentattribute LSDLinearlyseparabledatathatincludelinearlyseparable modelsinit.Ingeneanalysis,wecallLSDandlinearly separablemodelsareMatroskas

SwissbanknotedataIP-OLDF ﬁndsthesedataareLSD;WeexplainProblem2 and5

TestdataPass/faildeterminationusingexaminationscores;these datasetsareLSD,andweexplainatrivialLDF

Japanese-automobile data LSD;weexplainProblem3and5

ThedatasetsSixmicroarraydatasets

TheoryandMethod

TheoryNewtheoryofdiscriminantanalysisafterR.Fisher Method1100-foldcross-validationforsmallsamplemethod

Method2Matroskafeature-selectionmethodformicroarraydatasets

M1Themeanoferrorrateinthetrainingsample

M2Themeanoferrorrateinthevalidationsample

BestmodelM2ofbestmodelisminimumamongallpossiblemodels ofeachLDF

LOOprocedureAleave-one-outmodelselectionprocedure

ThebestmodelThemodelwithminimumM2insteadofLOObyMethod 1

Diff1Thedifferencedeﬁnedas(NMofninediscriminant functions MNM)

DiffThedifferencedeﬁnedas(M2 M1)

M1DiffThedifferencedeﬁnedas(M1ofninediscriminant functions-M1ofRevisedIP-OLDF)

M2DiffThedifferencedeﬁnedas(M2ofninediscriminant functions-M2ofRevisedIP-OLDF)

FiveProblemsofDiscriminantAnalysis

Problem1AllLDFs,withtheexceptionofRevisedIP-OLDF,cannot discriminatethecasesonthediscriminant hyperplane.NMsoftheseLDFsmaynotbecorrect

Problem2AllLDFs,withtheexceptionofH-SVMandRevised IP-OLDF,cannotrecognizeLDFtheoretically;Although RevisedLP-OLDFandRevisedIPLP-OLDFcanoften discriminateLSD,weneverdiscussinChap. 8 becauseof thisreason

Problem3Thedefectofthegeneralizedinversematricestechnique; QDFmisclassi ﬁesallcasesasotherclassesforaparticular case.Addingasmallrandomnoisetotheconstantvalues solvesProblem3

Problem4Fisherneverformulatedanequationforthestandarderror oftheerrorrateanddiscriminantcoefficient.Method1 offers95%confidenceinterval(CI)fortheerrorrateand coefficient.

Problem5Formorethantenyears,manyresearchershavestruggled toanalyzethemicroarraydatasetthatisLSD.OnlyRevised IP-OLDFcanmakefeature-selectionnaturally.Idevelop theMatroskafeature-selectionmethod(Method2)that ﬁndsasurprisingstructureofthemicroarraydatasetwhere suchstructureisthedisjointunionsofseveralsmalllinearly separablesubspaces(smallMatroska,SMs).Nowwecan analyzeeachSMveryquickly.Studentlinearlyseparable, Swissbanknote,andJapanese-automobiledatashowthe naturalfeature-selectionofRevisedIP-OLDF.Therefore, Irecommendthatresearchersoffeature-selectionmethods, suchasLASSO,evaluateandcomparetheirtheorythrough thesedatasetsinChaps. 4, 6–8.Iomittheresultsofthe pass/faildeterminationusingexaminationscoresthat consistonlyfourvariables

Chapter1

NewTheoryofDiscriminantAnalysis

1.1Introduction

1.1.1TheoryTheme

Thisbookintroducesanewtheoryofdiscriminantanalysis(hereafter, “the Theory”)afterR.Fisher.Thischapterexplainshowtosolvethe fiveserious problemsofdiscriminantanalysis.Tothebestofmyknowledge,thisisthe fi rst bookthatcompareseightlineardiscriminantfunctions(LDFs)usingseveraldifferenttypesofdata.TheseeightLDFsareasfollows:Fisher ’sLDF(Fisher 1936, 1956),logisticregression(Cox 1958),hard-marginSVM(H-SVM)(Vapnik 1995), twosoft-marginSVMs(S-SVMs)suchasSVM4(penalty c =10,000)andSVM1 (penalty c =1),andthreeoptimalLDFs(OLDFs).At first,IdevelopanOLDF basedonaminimumnumberofmisclassi fications(minimumNM(MNM))criterionusingintegerprogramming(IP-OLDF)andanOLDFusinglinearprogramming(LP-OLDF)(Shinmura 2000b, 2003, 2004, 2005, 2007).However,becauseI findthedefectofIP-OLDF,IdevelopthreerevisedOLDFssuchasRevised IP-OLDF(Shinmura 2010a, 2011a),RevisedLP-OLDF,andRevisedIPLP-OLDF (Shinmura 2010b, 2014b).IrisdatainChap. 2 arecriticaltestdatabecauseFisher evaluatesFisher ’sLDFwiththesedata(Anderson 1945).Cephalo-pelvic Disproportion(CPD)data(MiyakeandShinmura 1980)inChap. 3 aremedical datawiththreecollinearities.AlthoughStudentdatainChap. 4 employasmalldata sample(Shinmura 2010a),wecanunderstandProblem1becausethedataarenot generalpositions.The18pass/faildeterminationsusingexaminationscoresin Chap. 5 arelinearlyseparabledata(LSD).NoneoftheLDFs,withtheexceptionof H-SVMandRevisedIP-OLDF,candiscriminateLSDtheoretically.Idemonstrate that18errorratesofFisher ’sLDFandthequadraticdiscriminantfunction (QDF)areveryhigh(Shinmura 2011b);nevertheless,thesedataareLSD. Moreover,sevenLDFs,withtheexceptionofFisher ’sLDF,becometrivialLDF (Shinmura 2015b).Swissbanknotedata(FluryandRieduyl 1988)inChap. 6 and

S.Shinmura, NewTheoryofDiscriminantAnalysisAfterR.Fisher, DOI10.1007/978-981-10-2164-0_1

Japanese-automobiledata(Shinmura 2016c)inChap. 7 arealsoLSD.AlthoughI developaMatroskafeature-selectionmethodformicroarraydataset(Method2),it isdifﬁcultforustounderstandthemeaningofMethod2ifwedonotknowLSD discriminationverywell.IcallLSDasbigMatroska.AssameasbigMatroska includesseveralsmallMatroska,themicroarraydataset(thedatasets)includes severallinearlyseparablesubspaces(smallMatroska(SM))init(thelargest Matroska).Therefore,IexplainthisideausingcommondatainChaps. 6 and 7. WhenIdiscriminatethedatasets,onlyRevisedIP-OLDFcanselectfeaturesnaturallyand ﬁndsthesurprisingstructureofthedatasets(Shinmura 2015e–s, 2016b). Moreover,Idevelopa100-foldcross-validationforsmallsamplemethod (Method1)(Shinmura 2010a, 2013, 2014c)insteadoftheleave-one-out(LOO) procedure(LachenbruchandMickey 1968).Wecanobtaintwoerrorratemeans, M1and M2,fromthetrainingandvalidationsamples,respectively,andproposea simplemodelselectionproceduretoselectthebestmodelwithminimum M2.The bestmodelofRevisedIP-OLDFisbetterthanthesevenother M2sfromthe previousdataexceptfortheIrisdata.

Wecannotdiscriminatecasesonthediscriminanthyperplane(Problem1).Only RevisedIP-OLDFcansolveProblem1.Moreover,onlyH-SVMandRevised IP-OLDFcandiscriminateLSDtheoretically(Problem2).Problem3isthedefect ofthegeneralizedinversematrixtechniqueandQDFofmisclassifyingallcasesto anotherclassforaparticularcase.IsolveProblem3.Fisherneverformulatedan equationforthestandarderrors(SEs)oftheerrorrateanddiscriminantcoeffi cient (Problem4).TheMethod1offersthe95%confidenceinterval(CI)oftheerrorrate andcoeffi cient.Formorethantenyears,manyresearchershavestruggledtoanalyzethe dataset thatisLSD(Problem5).OnlyRevisedIP-OLDFcanmake feature-selectionnaturally.TheMethod2 findsthesurprisingstructureofthe datasetthatisthedisjointunionsofseveralsmallgenesubspaces(SMs)thatare linearlyseparablemodels.Ifwecanrepairthespecifi cgenesfoundbyMethod2, wemightovercomecancerdiseases.Now,wecananalyzeeachSMveryquickly. Wecallthelinearlyseparablemodelingeneanalysis, “Matroska.” Ifthedatasets areLSD,thefullmodelisthelargestMatroskathatcontainsallsmallerMatroskain it.WealreadyknowthatthesmallestMatroska(thebasicgenesetorsubspace (BGS))candescribetheMatroskastructurecompletelybymonotonicdecreaseof MNM.Ontheotherhand,LASSO(BuhlmannandGeer 2011;Simonetal. 2013) attemptstomakethefeature-selectionsimilartoMethod2.Thisbookoffersuseful datasetsandresultsforLASSOresearchersfromthefollowingperspective:

1.CanLDFobtainedbyLASSOdiscriminatethreedifferenttypesofLSDsuchas Swissbanknotedata,Japanese-automobiledata,andsixmicroarraydatasets exactly?

2.CanLDFobtainedbyLASSO ﬁndtheMatroskastructurecorrectlyandlistall BGSs?

IfLASSOcannot ﬁndSMsorBGSinthedataset,itcannotexplainthedata structure.

1.1.2FiveProblems

TheTheorydiscussesonlybinaryortwo-class,class1orclass2,discriminationby eightLDFssuchasRevisedIP-OLDF,RevisedLP-OLDF,RevisedIPLP-OLDF, H-SVM,SVM4,SVM1,Fisher ’sLDF,andlogisticregression.Thevaluesofclass 1andclass2are1and 1,respectively.Weconsiderthesevaluesasobjectvariable ofdiscriminantanalysisandregressionanalysis.Let f(x)beLDFand f(xi)bea discriminantscorefor xi.Althoughtherearemanydifficultstatisticsindiscriminant analysis,weshouldfocusonthediscriminantrulethatisquitedirect:If yi × f (xi)>0, xi isclassi fiedintoclass1/class2correctly.If yi × f(xi)<0, xi ismisclassi fied.If yi × f(xi)=0,wecannotdiscriminate xi correctly.Thisunderstanding ismostimportantfordiscriminantanalysis.Thereare fiveseriousproblemshidden inthissimplisticscenario(Shinmura 2014a, 2015c, d).

1.1.2.1Problem1

Wecannotadequatelydiscriminatebetweencaseswhere xi liesonthediscriminant hyperplane(f(xi)=0).TheStudentdatainChap. 4 showthisfactclearly.Thusfar, thishasbeenanunresolvedproblem.However,mostresearchersclassifythese casesintoclass1withoutlogicalreason.Theymisunderstandthediscriminantrule asfollows:If f(xi) ≥ 0, xi isclassifiedintoclass1correctly.If f(xi)<0, xi is classi fiedintoclass2properly.Therearetwomistakesintheirrule.The fi rst mistakeistoclassifythecasesonthediscriminanthyperplanetoclass1without logicalexplanation.Thesecondmistakeiswecannotdeterminethecaseswith positivediscriminantscoreasclassifi edintoclass1andthosewithanegativevalue asclassi fiedintoclass2aprioribecausethedatadeterminethis,notresearchers. OtherstatisticiansproposedeterminingProblem1randomly(i.e.,akintothrowing dice)becausestatisticsisthestudyofprobabilities.Ifuserswouldknowofthis claim,theymightbesurprisedanddisappointedindiscriminantanalysis.Inparticular,medicaldoctorsmightbeupsetbecausetheydonotgamblewithmedical diagnoses,giventhattheyattempttoseriouslydiscriminatecasesbasedonthe discriminanthyperplane.Moststatisticalresearchersarelackofthisfactofmedical diagnosis.Ifweconsiderpass/faildeterminationusingthescoresoffourtestswhere thepassingmarkis50points,wecanobtaintrivialLDFsuchas f = T1+ T2+T3+T4 50.If f ≥ 0,agivenstudenthaspassedtheexamination.On theotherhand,if f <0,thestudenthasfailedtheexamination.Becausewecan describethediscriminantruleby(independent)variablesclearly,wecancorrectly includesuchstudentonthediscriminanthyperplaneinthepassingclass.Wehave ignoredthisunresolvedproblemuntilnow.TheproposedRevisedIP-OLDFbased onMNMcantreatProblem1appropriately(Shinmura 2010a).Indeed,withthe exceptionofRevisedIP-OLDF,noLDFscancorrectlycountthenumberofmisclassi fications(NMs).Therefore,wemustcountthenumberofcaseswhere f(xi)=0 anddisplaythisnumber “h” alongsideNMofallLDFsintheoutput.Wemust

estimateatrueNMthatmightincreaseto(NM+ h).Aftershowingmanyexamples ofProblem1,somestatisticiansclaimthattheprobabilityofcasesonthediscriminanthyperplaneiszerowithoutatheoreticalreason.Theyerroneouslybelievethat wediscriminatedataonacontinuousspace.

1.1.2.2Problem2

OnlyH-SVMandRevisedIP-OLDFcanrecognizeLSDtheoretically.1 OtherLDFs mightnotdiscriminateLSDexactly.WhenIP-OLDFdiscriminatesSwissbanknote datainChap. 6,I ﬁndthatthesedataareLSD.Inaddition,Japanese-automobiledataare LSDinChap. 7.Throughbothdata,IexplaintheMatroskafeature-selectionmethod (Method2)inChap. 8.Wecanobtainexaminationscoreseasily,andthesedatasetsare alsoLSDs.Moreover,thereistrivialLSD.However,severalLDFscannotdetermine pass/failusingexaminationscorescorrectly(Shinmura 2015b).Inparticular,theerror ratesofFisher ’sLDFandQDFareveryhigh.Table 1.4 listsallthe18errorratesof Fisher ’sLDFandQDFthatarenotzerointhepass/faildeterminationsfrom2010to 2012.Thisfactsuggeststhatreviewthediscriminantanalysisofpastimportantresearch becauseerrorratesmaydecrease.Inmedicaldiagnosis,researchersgaveuptheir researches,errorratesofwhichwereovertenpercent.However,RevisedIP-OLDFmay tellthemerrorratesarezero.Moreover,discriminantfunctionsthatcannotdiscriminate LSDcorrectlyarenothelpfulforgeneanalysis.

1.1.2.3Problem3

Ifthevariance–covariancematrixissingular,Fisher ’sLDFandQDFcannotcalculateitbecauseinversematricesdonotexist.BecauseJMP(Salletal. 2004) adoptedthegeneralizedinversematrixtechnique,IhadbelievedthatFisher ’sLDF andQDFcouldcalculategeneralizedinversematrixwithoutproblems.WhenI discriminatedmathexaminationscoresamong56examinationdatafromthe NationalCenterforUniversityEntranceExaminations(NCUEE),QDFanda regularizeddiscriminantanalysis(RDA)(Friedman 1989)misclassi fiedallstudents inthepassingclassasthefailingclass.Ifweexchangeclass1andclass2,QDFand RDAmisclassifiedallstudentsinthefailingclassasthepassingclassdecidedby JMPspeci fication.WhenQDFcausedseriousproblemswithproblematicdata,JMP switchedQDFtoRDAautomatically.Afterthreeyearsofsurveys,Ifoundthat RDAandQDFdonotworkcorrectlyforaparticularcasewherethevaluesofthe variablesthatbelongtooneclasshaveaconstantvaluebecauseallthestudentsin thepassingclassansweredtheparticularquestioncorrectly.Ifuserscanselect

1Empirically,RevisedLP-OLDcandiscriminateLSDcorrectly.However,itisveryweakfor Problem1.LogisticregressionandSVM4discriminateLSDcorrectlyformanyexaminations. Fisher ’sLDF,QDF,andSVM1aresevereforLSDdiscriminations.Irecommendresearchers reviewtheiroldresearchesusingthesethreediscriminantfunctions.

appropriateoptionsforamodifi edRDAdevelopedforthisparticularcase,RDA worksbetterthantheQDFlistedinTable 1.5,whichisexplainedbytheresultsof theJapanese-automobiledata.However,JMPdoesnotcurrentlyofferamodi fied QDF.Therefore,Ijudgedthiswasthedefectofgeneralizedinversematrix.Ifwe addslightrandomnoisetotheconstantvalue,QDFcandiscriminatethedata exactly.Becauseitisthebasicstatisticalknowledgeforus,thedatavariedandI trustthequalityofJMP;Ineedthreeyearsto findthereason.Problem3has providedawarningforourstatisticalunderstandingdataalwayschange.

1.1.2.4Problem4

Somestatisticianserroneouslybelievethatdiscriminantanalysisistheinferential statisticalmethodthatissimilartoregressionanalysis.However,Fishernever formulatedanequationofSEsfordiscriminantcoefficientsorerrorrates. Nonetheless,ifweusetheindicator yi ofmathematicalprogramming-basedlinear discriminantfunctions(MP-basedLDFs)inEq.(1.7)astheobjectvariableand analyzethedatabyregressionanalysis,theobtainedregressioncoefficientsare proportionaltothecoefficientsofFisher ’sLDFbytheplug-inrule1.Therefore,we useamodelselectionprocedure,suchasstepwiseprocedures,andallpossible combinationmodels(Goodnight 1978)withstatisticssuchasAIC,BIC,andCpof regressionanalysis.Inthisbook,IproposeMethod1andthenewmodelselection proceduresuchasthebestmodel.Iset k =100andselectthemodelwithminimum M2asthebestmodel;thisisaverydirectandpowerfulmodelselectionprocedure comparedwithLOO.First,weselectthebestmodelineachLDF.Next,weselect themodelwithminimum M2amongsixMP-basedLDFsasthe fi nalbestmodel. Weclaimthatthe fi nalbestmodelhasgeneralizationability.Moreover,weobtain the95%CIofthediscriminantcoefficient.Althoughwecoulddemonstratein2010 thatthebestmodelwasuseful(Shinmura 2010a),Icouldnotexplaintheuseful meaningofthe95%CIofthediscriminantcoefficientbefore2014.However,ifwe divideallcoefficientsbytheLDFinterceptandsettheintercepttoone,six MP-basedLDFsandlogisticregressionbecometrivialLDFs,andonlyFisher ’s LDFisfarfromtrivial(Shinmura 2015b).Moreover,Icanexplaintheuseful meaningofthe95%CIofSwissbanknoteandJapanese-automobiledata (Shinmura 2016a, c)moreprecisely.

1.1.2.5Problem5

Formorethantenyears,manyresearchershavestruggledtoanalyzethedatasets (Problem5).However,tothebestofmyknowledge,therehasbeennoresearchon LSDdiscriminationthusfar.Iexamine ﬁvedifferenttypesofLSDs,suchasSwiss banknotedata,pass/faildeterminationof18examinationdata,Japanese-automobile data,studentlinearlyseparabledataandsixmicroarraydatasets.WhenIdiscriminatethedatasets,mostofthecoefﬁcientsofRevisedIP-OLDFbecomezero.Only

RevisedIP-OLDFcanselectfeaturesnaturallyand ﬁ ndsthesurprisingstructureof thedatasets.ThedatasetsareAlonetal.(1999),Chiarettietal.(2004),Golubetal. (1999),Shippetal.(2002),Singhetal.(2002),andTianetal.(2003).Jefferyetal. (2006)analyzedthesedatasetsanduploadthesedatasetsontheirHP.2 Ishiietal. (2014)analyzedthesedatasetsbyprincipalcomponentanalysis(PCA).I ﬁndthe Matroskastructureinthedatasets,withMNMofzero.TheMethod2canreducethe high-dimensionalgenespaceintoseveralsmallMatroskas(SMs)(Shinmura 2015e–s, 2016a).WecananalyzetheseSMsbyordinarystatisticalmethodssuchas t test,one-wayANOVA,clusteranalysis,andPCA.Becausetherehasbeenno researchonLSDdiscriminationthusfar(tothebestofourknowledge),many researchershavestruggledandhavenotobtainedgoodresults.IexplainMethod2 withtheresultsofSwissbanknotedatainChap. 6 andJapanese-automobiledatain Chap. 7 becauseRevisedIP-OLDFcanselectvariablesnaturallyforordinarydata.

1.1.2.6Summary

RevisedIP-OLDFsolvesProblems1,2,and5.Problem3isthedefectofthe generalizedinversematrixtechnique,andQDFnowcausesProblem3.Ifweadd slightrandomnoisetotheconstantvalue,wecansolveProblem3easily.Ipropose Method1andcomparetwostatisticalLDFsbyJMPscriptandsixMP-basedLDFs bytheLINGOProgram2(Schrage 2006)usingsixdifferenttypesofdata.Through manyresults,IcanconﬁrmthatMethod1solvesProblem4usingacomputerintensiveapproach.Problem5isthecomplexanalysisofmicroarraydatasets.Only RevisedIP-OLDFcanmakefeature-selectionofthedatasetsnaturallyand ﬁ ndthe datasetsthatconsistofseveraldisjointunionsofSMs.WecananalyzeeachSMin thedataseteasilybecauseeachSMisasmallgenesubspace.Itisquitestrangethree SVMscannotselectfeaturenaturally.

1.2MotivationforOurResearch

1.2.1ContributionbyFisher

FisherdescribedFisher ’sLDFusingvariance–covariancematricesandfoundedthe statisticaldiscriminanttheory.Heassumedthattwoclasses(orgroups)havethe samevariance–covariancematrices,andtwomeansaredifferent(Fisher ’s assumption).However,becauseFisher ’sassumptionistoostrictforactualdata, QDFwasdeﬁ nedastwoclasseshavingdifferentvariance–covariancematrices. Thisfactindicatesthatstatisticiansareawarethatthereexistdatathatdonotsatisfy Fisher ’sassumption.Moreover,multiclassdiscriminationthatusesthe

2http://www.bioinf.ucd.ie/people/ian/

Mahalanobisdistancehasbeenproposed.Inthequalitycontrol,Taguchiand Jugular(2002)consideredthatoneclass(thenormalstate)hasavariance–covariancematrix,andanotherclass(theuncontrolledstate)consistsofonlyonecase. Theydiscriminateddatathroughmulticlassdiscriminationandclaimedthatthe typicaluncontrolledcaseisfarfromthenormalstatewithlargeMahalanobisdistance.Theirclaimissimilartothe “earthmodel” inmedicaldiagnosis(Shinmura 1984).Becausestatisticalsoftwarepackageseasilyimplementthesediscriminant functionsbasedonvariance–covariancematrices,weapplydiscriminantanalysisto manyapplicationsinscience,technology,andindustry,suchasmedicaldiagnosis, patternrecognition,andvariousratings.However,realdatararelysatisfyFisher ’s assumptions.Therefore,itiswellknownthatlogisticregressionisbetterthan Fisher ’sLDFandQDFbecauseitdoesnotassumeaparticulartheoreticaldistribution,suchasanormaldistribution.Itisverystrangeandunfortunateforusthat thereisnodiscussiononthismatterbyresearchersandusersoflogisticregression.

1.2.2DefectofFisher ’sAssumptionforMedicalDiagnosis

AftergraduatingfromKyotoUniversityin1971,Ibecameamemberoftheproject thatdevelopedtheautomaticdiagnosticsystemforelectrocardiogram(ECG)data from1971to1974.Aprojectleaderwhowasamedicaldoctorrequestedmeto discriminateoverten3 abnormalsymptomsfromnormalsymptomusingFisher ’s LDFandQDF.Ourfouryearsofresearchwereinferiortomedicaldoctor ’s experimentaldecisiontreelogic.First,IbelievedthatmyresultsusingFisher ’s LDFandQDFwereinferiortodecisiontreelogicresultsbecausemyknowledge andexperiencewaspoor.Later,IrealizedthatFisher ’sassumptionwasnotadequateformedicaldiagnosis.Isummarizedtworeasonsformyfailure,bothof whicharedescribedbelow.Ontheotherhand,thereisnoactualtestforFisher ’s assumption.IdemonstratethatNMofFisher ’sLDFisclosetoMNMinIrisdata. WecanusethistrendinsteadoftheteststatisticsofFisher ’shypothesis.

FirstReason:Inmedicaldiagnosis,typicalcasesinabnormalsymptomsarefar fromthediscriminanthyperplane.Iexplainedmedicaldiagnosisasthe “earth model” wherethenormalsymptomistheland,abnormalsymptomsarethe mountains,andthediscriminanthyperplanesarehorizon.TheMahalanobis–Taguchistrategyissimilartotheearthmodel.ThisclaimviolatesFisher ’s assumption.Inastatisticalconcept,weunderstandthattypicalcasesinbothclasses aretwoaveragesoftwonormaldistributions.Therefore,Ibelievedthatthediscriminantfunctionsbasedonthevariance–covariancematricesarenotadequatefor medicaldiagnosisanddevelopedaspectrumdiagnosticmethod(Shinmuraetal. 1973, 1974).Iknewthatlogisticregressionisremarkablysuccessfulinmedical diagnosisandunderstoodthatitissuperiortothespectrumdiagnosticmethod.

3Icannotrecollecttheexactnumberofabnormalsymptoms.

Currently,Japanesemedicalresearchersdiscriminatedatabylogisticregression insteadofFisher ’sLDFandQDF.Iregretthatasresearchersandusersoflogistic regression,theydidnotdiscussmyclaim.

SecondReason:Therearemanycasesclosetothediscriminanthyperplane. IconcludedthatFisher ’sLDFandQDFarefragileforthediscriminationofparticulardata,suchaspass/faildeterminationusingexaminationscores(Shinmura 2011b)andtheratingofbonds,stocks,andestatesinadditiontomedicaldata. Thesedataalsohavethecharacteristicfeatureofhavingmanycasesclosetothe discriminanthyperplane.NoneoftheLDFs,withtheexceptionofRevised IP-OLDF,candiscriminatethecasesonthediscriminanthyperplanecorrectly (Problem1).Recently,becauseIcouldnotaccessmedicaldataforourresearch,I usedpass/faildeterminationwithexaminationscoresinsteadofmedicaldata.

1.2.3ResearchOutlook

After1975,IdiscriminatedmanydatausingFisher ’sLDF,QDF,logisticregression, multiclassdiscriminationusingMahalanobisdistance,decisiontreelogic(orpartitioning),andthequanti ﬁcationtheorydevelopedbyDr.Hayashi(Shimizuetal. 1975;NomuraandShinmura 1978;Shinmuraetal. 1983).Throughthesestudies,I foundProblems1and4(Shinmura 2014a, 2015c, d).In1973,wedevelopedthe spectrumdiagnosticmethodusingBayesiantheory.However,logisticregression wasmoresophisticatedthanthespectrumdiagnosticmethod.Next,wedeveloped OLDFbasedontheMNMcriterion(MiyakeandShinmura 1979, 1980;Shinmura andMiyake 1979),whichisaheuristicapproach.BecauseWarmackandGonzalez (1973)comparedseveraldiscriminantfunctions,theirresearchencouragedour research.Wewerenotabletodeveloptheresearchbecausewehadlowcomputer powerandbecauseofthedefectoftheheuristicapproach.

Startingin1997,IdevelopedIP-OLDF(Shinmura 1998; 2000a, b;Shinmura andTarumi 2000).BecauseIdefi nedIP-OLDFinthediscriminantcoefficient spaces,Ifoundtwoimportantfactsofdiscriminantanalysis.The firstisOCP.The secondis “themonotonicdecreaseofMNM.” However,therewasaseriousdefect inIP-OLDFusingStudentdatathatarenotgeneralpositions.Ifdataarenotgeneral positions,IP-OLDFmightnotsearchforthevertexofatrueOCP.Thisdefect meansthattheobtainedMNMmightnotbetrueMNM,andProblem1causedthis defect.In2007,RevisedIP-OLDFsolvedthedefectbecauseitcan findtheinterior pointoftrueOCPandavoidProblem1.Therefore,IcouldsolveProblem1 completely.Until2007,IwasnotabletoevaluateeightLDFsusingvalidation samplesbecauseourresearchdataweresmallsamples.

After2007,IdevelopedMethod1.Throughthisbreakthrough,Iwasableto solveProblem4andendedthebasicresearch.RevisedIP-OLDFsolvesProblems1 and2.AlthoughIcanevaluateeightLDFsby M2,Icannotexplaintheuseful meaningofthe95%CIofdiscriminantcoefﬁ cients.After2010,Istartedapplied researchonLSDdiscrimination.IfoundthatProblem3isthedefectofthe

generalizedinversematrixtechniquebythepass/faildeterminationthatuses examinationscores(Shinmura 2011b).WithregardtoIP-OLDF,Isettheintercept ofIP-OLDFtooneandwasabletoobtaintwoimportantfactssuchasOCPand monotonicdecreaseofMNM.Therefore,Idividedallcoefficientsbytheintercept andsettheintercepttoone.Throughthesecondbreakthrough,sevenLDFs,with theexceptionofFisher ’sLDF,becametrivialLDFbythepass/faildetermination thatusesexaminationscores,andIwasabletoexplaintheusefulmeaningofthe coefficientofRevisedIP-OLDFusingSwissbanknotedata.Therefore,Ihave solvedthefourproblemsandcanconfi rmtheendofourresearch.However,whenI discriminatedShippetal.datasetinOctober2015,IfoundthatRevisedIP-OLDF canmakefeature-selectionnaturallyandcansolveProblem5quickly.

1.2.4Method1andProblem4

Ifweset “k =100” intheMethod1,wecanobtain100LDFsand100errorrates fromthetrainingandvalidationsamples.Fromthe100LDFs,weobtainthe95% CIofdiscriminantcoefﬁcients.Fromthe100errorrates,weobtainthe95%CIof errorratesandtwomeansoferrorrates, M1and M2,fromthetrainingandvalidationsamples.Weconsiderthemodelwithminimum M2amongallpossible combinationmodelstobethebestmodel.Thisstandardisadirectandpowerful modelselectionprocedurecomparedwiththeLOOprocedure.

Weshoulddistinguishsuchcomputer-intensiveapproachesfromtraditionalinferentialstatisticswiththeSEequationbasedonnormaldistribution.Statisticians withoutcomputerpowerestablishedinferentialstatisticsmanually.Today,wecan utilizethepowerofacomputerwithstatisticalandMPsolvers,suchasJMPand LINGO.IdevelopedtheMethod1(Program2)ofFisher ’sLDFandlogistic regressionwiththeJMPscriptsupportedbytheJMPdivisionofSASInstitute Japan.Inaddition,IdevelopedMethod1forsixMP-basedLDFswithLINGO. ThoseareRevisedIP-OLDF,RevisedIPLP-OLDF,RevisedLP-OLDF,H-SVM, SVM4,andSVM1.IexplaintheLINGOProgram2inChap. 9.Thoseresearchers whowanttoanalyzetheirresearchdatacanobtainthe95%CIfortheerrorrate anddiscriminantcoefﬁ cients.Thesestatisticsprovidepreciseanddeterministic judgmentonmodelselectionprocedurecomparedwiththeLOOprocedure.Tothis point,IcannotvalidateandevaluateRevisedIP-OLDFwithsevenotherLDFs becauseIonlyhavesmalloriginaldataandnovalidationsamples.Researcherswith smallsamplescanvalidateandassesstheirresearchdatawithMethod1andthe bestmodel.

MiyakeandShinmura(1976)discussed “errorratesoflineardiscriminant function” bythetraditionalapproach.Ontheotherhand,KonishiandHonda(1992) discussed “errorrateestimationusingthebootstrapmethod.” Their computer-intensiveapproachesarenottraditionalinferentialstatisticsanddonot offerthe95%CIoftheerrorratesandcoefﬁcientsforindividualdata.Although logisticregressionoutputsthe95%CIofthecoefﬁcientthroughmaximum

likelihoodproposedbyR.Fisher,thisisalsoacomputer-intensiveapproach.Onthe otherhand,wecanselectthebestmodeland95%CIoftheerrorratesand coefﬁcientsforsixLDFsbytheMethod1andthebestmodel.Manyresearchers whowanttodiscriminatesmallsampleshavethePhilosopher ’sStone.

1.3DiscriminantFunctions

IcomparetwostatisticalLDFsbyJMPandsixMP-basedLDFsbyLINGO.Iomit akernelSVMbecauseitisanonlineardiscriminantfunction.However,Ievaluate QDFandRDAwitheightLDFsonlyfortheoriginalsixdifferentdata,withthe exceptionofthedatasets.Next,IcomparetwostatisticalLDFsandsixMP-based LDFsforresamplingsamplesifthedataareLSD.IfthedataarenotLSD,we cannotdiscriminatethedatabyH-SVMbecauseitcauseserrorfornon-LSD.

1.3.1StatisticalDiscriminantFunctions

Fisherdeﬁ nedFisher ’sLDFbymaximizationofthevarianceratio(between/within classes)inEq.(1.1).Nonlinearprogramming(NLP)cansolvethisequation.

IfweacceptFisher ’sassumption,thesameLDFisobtainedinEq.(1.2)by anotherplug-inrule2.ThisequationdefinesFisher ’sLDFexplicitly,whereas Eq.(1.1)definesLDFimplicitly.Therefore,statisticalsoftwarepackagesadoptthis equation.Somestatisticianserroneouslybelievethatdiscriminantanalysisisinferentialstatistics,similartoregressionanalysis.Discriminantanalysisisnottraditionalinferentialstatisticsbasedonthenormaldistributionbecausethereareno SEsforthediscriminantcoefficientsanderrorrates(Problem4).Therefore, LachenbruchandMickeyproposedtheLOOprocedureforselectingagooddiscriminantmodel,asindicatedinTable 1.6

MostrealdatadonotsatisfyFisher ’sassumption.Whenthevariance–covariance matricesoftwoclassesarenotthesame(Σ1 ≠ Σ2),theQDFde ﬁnedinEq.(1.3) canbeused.Thisfactiscriticalforus.Previousstatisticianshaveknownthatmost realdatadonotsatisfyFisher ’sassumption.WeusetheMahalanobisdistancein Eq.(1.4)forthediscriminationofmulticlasses.TheMahalanobis–Taguchimethod ofqualitycontrolisoneoftheapplications.

Fisher'sLDF

WeuseFisher ’sLDFandQDFinmanyareas,butcannotcalculatewhether somevariablesremainconstant.Therearethreecases.First,somevariablesthat belongtobothclassesarethesameconstant.Second,somevariablesthatbelongto bothclassesaredifferent,butconstant.Third,somevariablesthatbelongtoone classareconstant.Moststatisticalsoftwarepackagesexcludeallvariablesinthese threecases.Ontheotherhand,JMPenhancesQDFusingthegeneralizedinverse matrixtechnique.Therefore,QDFcantreatthe ﬁrstandsecondcasescorrectly,but cannotmanagethethirdcaseproperly(Problem3).

Recently,thelogisticregressioninEq.(1.5)hasbeenusedinsteadofFisher ’s LDFandQDFfortworeasons.First,itiswellknownthattheerrorrateoflogistic regressionisoftenlessthanthatofFisher ’sLDFandQDFbecauseitisderived fromrealdata,insteadofsomenormaldistributionfreefromreality.Let “ p ” bethe probabilityofbelongingtoaclassofdiseases.Ifthevalueofsomevariableis increasing/decreasing, “ p ” increasesfromzero(normalclass)toone(abnormal class).Thisrepresentationisveryusefulinmedicaldiagnosis,aswellasforratings inrealestatesandbonds.Onthecontrary,Fisher ’sLDFassumesthatcasescloseto theaverageofthediseasesarerepresentativecasesofthediseases’ class.Medical doctorsneverpermitthisclaim.Althoughthemaximum-likelihoodprocedure calculatesSEofthelogisticcoefficient,weshoulddistinguishthecomputerintensiveapproachfromthetraditionalinferentialstatisticsbasedonthetheoretical distributioninducedmanually.Firth(1993)indicatedthattheSEofalogistic coefficientbecomeslargeandtheconvergencecalculationbecomesunstablefor LSD.IfIobservethefollowingpoints:(1)Ican findNM=0bychangingthe discriminanthyperplaneonROC,(2)MNM=0,(3)SEsbecomelarge,and(4)the convergencecalculationbecomesunstable,Icandeterminethatlogisticregression canrecognizeLSD.IconfirmthatlogisticregressioncanalmostrecognizeLSDby thistediouswork:

1.3.2BeforeandAfterSVM

TherearemanytypesofresearchonMP-baseddiscriminantanalysis.Glover (1990)deﬁ nedmanylinearprogramming(LP)discriminantmodels.Rubin(1997) proposedMP-baseddiscriminantfunctionsusingIP.Stam(1997)summarized Lp-normdiscriminantmethodsin1997andansweredthequestion, “Whyhave statisticiansrarelyusedLp-normmethods?” Heprovidedfourreasons:communication,promotion,andterminology;softwareavailability;therelativeaccuracyof

Another random document with no related content on Scribd:

THE HAWAIIANS’ ACCOUNT OF THE

FORMATION OF THEIR

ISLANDS

AND ORIGIN OF THEIR RACE WITH THE TRADITIONS OF THEIR MIGRATIONS,

E., AS GATHERED FROM ORIGINAL SOURCES

BY ABRAHAM FORNANDER

Author of “An Account of the Polynesian Race”

WITH TRANSLATIONS EDITED AND ILLUSTRATED WITH NOTES BY THOMAS G. THRUM

Memoirs of the Bernice Pauahi Bishop Museum

Volume V—Part I

HONOLULU, H. I.

B M P

1918

[Contents]

L K.

CHAPTER PAGE

I. His Birth and Early Life—Change to Oahu and Fame Attained There 2

II. Kalonaikahailaau—Kawelo Equips Himself to Fight Aikanaka—Arrival at Kauai 20

III. Commencement of Battle Between Kawelo and the People of Kauai 38

IV. Kaehuikiawakea—Kaihupepenuiamono and Muno—Walaheeikio and Moomooikio 42

V. Kahakaloa—His Death by Kawelo 48

VI. Kauahoa—Kawelo Fears to Attack Him— Seeks to Win Him by a Chant—Kauahoa Replies 52

VII. Size of Kauahoa—Is Killed by Kawelo— Kawelo Vanquishes Aikanaka 56

VIII. Division of Kauai Lands—Aikanaka Becomes a Tiller of Ground 60

IX. Kaeleha and Aikanaka Rebel Against Kawelo—Their Battle and Supposed Death of Kawelo 62

X. Temple of Aikanaka—How Kawelo Came to Life Again—He Slaughters His Opponents and Becomes Again Ruler of Kauai 66

S P.

His High Office Laamaomao, His Wind Gourd In Disfavor with the King He Moves to Molokai Has a Son Whom He Instructs Carefully Dreams of Keawenuiaumi Setting Out in Search for Him Prepares with His Son to Meet the King 72

L K

I. Prepares to Meet Keawenuiaumi in Search of Pakaa—Canoe Fleet of Six District Chiefs, Recognized, are Taunted as They Pass— Keawenuiaumi, Greeted with a Chant, Is Warned of Coming Storm and Invited to Land —On Advice of the Sailing-masters the King Sails on 78

II. Kuapakaa Chants the Winds of Hawaii—The King, Angered, Continues on—Winds of Kauai, Niihau and Kaula; Of Maui, Molokai, Halawa—Chants the Names of His Master, Uncle and Men—Pakaa Orders the Winds of Laamaomao Released 92

III. Swamping of the Canoes—They Return to Molokai and Land—The King is Given Dry Apparel, Awa and Food—Storm-bound, the Party is Provided with Food—After Four Months They Prepare to Embark 108

IV. Departure from Molokai—Names of the Six Districts of Hawaii—The King Desires Kuapakaa to Accompany Him—The Boy Consents Conditionally—Setting out they meet with Cold, Adverse Winds—The Sailing-masters Fall Overboard 118

V. At Death of Pakaa’s Enemies Calm Prevails —The Boy is Made Sailing-master—He Directs the Canoes to Hawaii—The Men Are Glad, but the King is Sad at His Failure— Kuapakaa Foretells His Neglect—Landing at Kawaihae, and Deserted, he Joins two Fishermen—Meeting a Six-manned Canoe He Wagers a Race, Single-handed, and Wins —He Hides His Fish in the King’s Canoe— They Plan Another Race to Take Place in Kau, Life to be the Forfeit 124

VI. The Canoe Race in Kau—Kuapakaa Offers to Land Four Times Before His Opponents’ First, and Wins—The King Sends for the Boy and Pleads for the Lives of His Men— Kuapakaa Reveals Himself and Pakaa—The Defeated Men Ordered Put to Death— Keawenuiaumi Orders Kuapakaa to Bring Him Pakaa—Pakaa Demands Full Restitution First—The King Agrees, and on Pakaa’s Arrival Gives Him the Whole of Hawaii 128

Legend of Palila 136

Legend of Puniakaia 154

Legend of Maniniholokuaua and Keliimalolo 164

Legend of Opelemoemoe 168

Legend of Kulepe 172

Legend of Kihapiilani 176

[Contents]

Legend of Hiku and Kawelu 182

Legend of Kahalaopuna 188

Legend of Uweuwelekehau 192

Legend of Kalaepuni and Kalaehina 198

Legend of Kapakohana 208

Legend of Kapunohu 214 [1]

PREFACE.

In this second series of the Fornander Collection of Hawaiian Folklore, with the exception of a few transpositions, as mentioned in the preceding volume, the order of the author has been observed in the main, by grouping together, first, the more important legends and traditions of the race, of universal acceptance throughout the whole group, followed by the briefer folk-tales of more local character.

A few of similar names occur in the collection, indicating, in some cases, different versions of the same story, a number of the more popular legends having several versions.

The closing part of this volume, to embrace the series of Lahainaluna School compositions of myth and traditional character, it is hoped will be found to possess educational value and interest.

No liberties have been taken with the original text, the plan, as outlined, being to present the various stories and papers as written, regardless of historic or other discrepancies, variance in such matters being treated in the notes thereto.

T. G. T, E. [2]

[Contents]

L K.

CHAPTER I.

B E L

K.—H C

O F

A T.

Maihuna was the father and Malaiakalani was the mother of Kawelo, who was born in Hanamaulu,1 Kauai. There were five children in the family. The first was Kawelomahamahaia;

H M

K H

W K

K N .—

K H O

L H M.

Omaihuna ka makuakane, o

Malaiakalani ka makuahine, o Hanamaulu i Kauai ka aina hanau o Kawelo. Elima ka nui o ko Kawelo mau hanauna; o ka mua, o Kawelomahamahaia; o

MOKUNA

the second was Kaweloleikoo. These two were males; after these two came Kaenakuokalani, a female; next to her was Kaweloleimakua and the last child was Kamalama. Kaweloleimakua, or Kawelo is the subject of this story.

The parents of Malaiakalani [the mother] were people who were well versed in the art of foretelling the future of a child, by feeling of its limbs, and by looking over the child, they could tell whether it would grow up to be brave and strong, or whether it would some day rule as king. At the birth of the two older brothers of Kawelo, these old people examined them, but found nothing wonderful about them. This examination was followed by the two on Kawelo, upon his birth. After the examination the old people called the parents of Kawelo and said to them: “Where are you two? This child of yours is going to be a soldier; he is going to be a very powerful man and shall some day rule as king.” Because of these wonderful traits, the old

kona muli, o Kaweloleikoo; he mau keiki kane laua, mahope hanau o Kaenakuokalani, he wahine ia. O kona muli mai o Kaweloleimakua, a o kona muli iho o Kamalama, o ka mea nona keia moolelo o Kaweloleimakua, oia o Kawelo.

O na makua o Malaiakalani, he mau mea akamai laua i ka haha a me ka nana i ka wa uuku o ke keiki, aole e nalo ia laua ke ano a me ka hana a ke keiki ke nui ae, ke koa a me ka ikaika, ke keiki ku i ka moku. Pela ka hana a ua mau makua nei, i na kaikuaana o Kawelo, a hiki ia Kawelo, haha no laua a hai aku i kona ano a me kana hana, i na makua o Kawelo: “E, auhea olua, o keia keiki a olua, he keiki koa, he keiki ikaika, he keiki e ku ana i ka moku.” Nolaila lawe ae la laua ia Kawelo a hanai iho la. Mahope o laila, hanau o Kamalama ko Kawelo kaikaina ponoi.

people took Kawelo and attended to his bringing up themselves. It was after this that Kamalama, the younger brother of Kawelo was born.

Shortly after the birth of Kamalama, the grandparents of Kawelo moved over to Wailua, where they took up their residence, taking their grandchild Kawelo along with them. At this time, while Kawelo was being brought up, Aikanaka, the son of the king of Kauai was born, and also Kauahoa of Hanalei. All these three were born and brought up together.2

Kawelo as a child was a very great eater; he could not satisfy his hunger on anything less than all the food of one umu to a meal. Kawelo ate so much that his grandparents began to get tired of keeping him in food, so at last they began to search for something to entice Kawelo away from the house and in that way get him to forget to eat. One day they went up to the woods and hewed out a canoe. After it was brought down to the sea

Mahope o laila, hoi ae la na kupuna o Kawelo i Wailua e noho ai, me ka laua moopuna o Kawelo. I keia wa e hanai ia nei o Kawelo, hanau o Aikanaka he keiki alii, a hanau no hoi o Kauahoa no Hanalei ia, akolu lakou ia wa hookahi i hanai ia ai.

He keiki ikaika loa o Kawelo ma ka ai ana, hookahi umu hookahi ai ana, pela aku, a pela aku, a ana na kupuna o Kawelo, i ke kahumu ai na Kawelo, nolaila, imi iho la laua i mea e walea ai o Kawelo. Pii aku la laua i ke kalai waa, a hoi mai la, kapili a paa, haawi aku la ia Kawelo, hoehoe iho la o Kawelo i uka i kai o Wailua, a lilo iho la ia i mea nanea ia ia i na la a pau loa.

shore it was rigged up and given to Kawelo. As soon as Kawelo got the canoe he paddled it up and down the Wailua river, and after this it became an object of great interest to him every day

When Kauahoa saw Kawelo with his canoe day after day enjoying himself, he got it into his mind to make himself something to enjoy himself with; so he made [4]himself a kite, and after it was completed he flew it up. When Kawelo saw the kite he took a liking to it and so went home to his grandparents and requested them to make him a kite.3 The grandparents thereupon made Kawelo a kite and after it was completed he took it out and flew it up. When Kauahoa saw Kawelo with a kite he came with his and they flew them together. While they were flying their kites, Kawelo’s kite became entangled with Kauahoa’s kite which caused Kauahoa’s to break away and it was carried by the wind till it landed at Koloa, to the west. The name of the place where the kite landed is known as Kahooleinapea to this day,

Ma keia hana a Kawelo, ike mai la o Kauahoa i ka Kawelo mea nanea, he waa, hana iho la ia i lupe hoolele nana, a hoolele ae la, a ike o Kawelo i keia mea, [5]makemake iho la ia, hoi aku la olelo i na kupuna e hana i lupe nana. A hana iho la na kupuna o Kawelo i lupe nana, a paa, hoolele ae la o Kawelo i kana lupe, a ike o Kauahoa hoolele pu ae la i na lupe a laua. Ma keia lele like ana o na lupe a laua, hihia ae la ka Kawelo lupe me ka Kauahoa, a moku iho la ka Kauahoa lupe, a lilo aku la i ka makani, a haule i Koloa ma ke komohana; o kahi i haule ai, o Kahooleinapea, a hiki i keia la, no ka haule ana o ka pea a Kauahoa, kela inoa o ia wahi.

because of the fall of Kauahoa’s kite there.

After Kauahoa’s kite was broken away, Kawelo looked at Kauahoa with the belief that surely Kauahoa would come and attack him; but since Kauahoa did not come Kawelo said within himself: “Kauahoa will never overcome me if we should ever meet in any future battle.”

Kauahoa was a much larger boy than Kawelo, still he was afraid of him.4

After flying their kites, they went in swimming and riding down the rapids. In this Kawelo again showed himself to be more skilful than Kauahoa, which caused Kawelo to be more sure in his belief that Kauahoa will never overcome him in the future. Kawelo and Kauahoa were not separated from one another in the matter of their relationship; they were connected, and so was the young chief, Aikanaka. He was connected in blood to the two boys, a fact which made Aikanaka something like an older

Ma keia moku ana o ka lupe a Kauahoa ia Kawelo, nana aku la o Kawelo i ko Kauahoa kii mai e pepehi ia ia, a liuliu, noonoo iho la o Kawelo, aole no e pakele o Kauahoa ia ia, ina laua e kaua mahope, no ka mea, he nui o Kauahoa, he uuku o Kawelo, aka, ua makau nae o Kauahoa ia Kawelo.

A mahope o ka hoolele lupe, hookahekahe wai iho la laua, a oi aku la no ko Kawelo i mua o Kauahoa, nolaila, noonoo iho la no o Kawelo, aole no e pakele o Kauahoa ia ia mahope aku ke kaua. O Kawelo a me Kauahoa, aole laua i kaawale aku, ua pili no ma ka hanau ana, a pela no ke ’lii o Aikanaka, ua pili no ia laua, nolaila, lilo o Aikanaka i kaikuaana haku no laua. Ma na mea a pau a Aikanaka e olelo mai ai, malaila laua e hoolohe ai, ina he kui lei, a he mea e ae paha, aole a laua hoole, he ae wale no.

brother and lord to them. Everything Aikanaka wished was granted to him, whether in stringing wreaths, or other things, they never denied him anything.

While Kawelo and his grandparents were living at Wailua with Aikanaka and the others, Kawelo’s older brothers, together with their grandparents, left Kauai and came to live in Waikiki, Oahu. Kakuhihewa was the king of Oahu at this time. There was living with Kakuhihewa, a very strong man who was a famous wrestler. This man used to meet the older brothers of Kawelo in several wrestling bouts but they never could throw him down. The brothers of Kawelo were great surf riders, and they often went to ride the surf at Kalehuawehe.5 After the surf ride they would go to the stream of Apuakehau and wash, and from there they would go to the shed where the wrestling bouts were held and test their skill with Kakuhihewa’s strong man; but in all their trials

Ia Kawelo ma e noho ana i

Wailua me Aikanaka ma, holo mai la na kaikuaana o Kawelo me ko laua mau kupuna, mai Kauai mai a noho i Waikiki ma Oahu nei. O Kakuhihewa ke ’lii o Oahu nei e noho ana ia wa, a aia hoi me Kakuhihewa, he kanaka ikaika loa i ka mokomoko. A o ua kanaka la, oia ka hoa mokomoko o na kaikuaana o Kawelo, aole nae he hina i na kaikuaana o Kawelo. A he mea mau i na kaikuaana o Kawelo ka heenalu, i ka nalu o Kalehuawehe, a pau ka heenalu, hoi aku la a ka muliwai o Apuakehau auau, a pau, hoi aku la a ka hale mokomoko, aole nae he hina o ke kanaka o Kakuhihewa i na kaikuaana o Kawelo.

they never once were able to throw him.

While living separated from each other, the older brothers of Kawelo being in Oahu, their grandparents, who were with Kawelo in Wailua, after a while, began to long for a sight of the other grandchildren, so one day they sailed for Oahu, bringing Kawelo with them, and they landed at Waikiki where they were met by the older brothers of Kawelo. After deciding to make their home in Waikiki, Kawelo took up farming and also took unto himself a wife, Kanewahineikiaoha, the daughter of Kalonaikahailaau, and they lived together as husband and wife.

While Kawelo was one day working in his fields, he heard some shouting down [6]toward the beach, so he inquired of his grandparents: “What is that shouting down yonder?” The grandparents answered: “It is your brothers; they have been out surf riding and are now wrestling with Kakuhihewa’s

Ma keia noho kaawale ana o na kaikuaana o Kawelo i Oahu nei, hu ae la ke aloha i na kupuna o lakou e noho ana me Kawelo i Wailua, nolaila, holo mai la na kupuna me Kawelo i Oahu nei, a pae ma Waikiki, ike iho la i na kaikuaana, a noho iho la i laila. Ma keia noho ana i laila, mahiai o Kawelo, a moe iho la i laila i ka wahine, oia o Kanewahineikiaoha, kaikamahine a Kalonaikahailaau, a noho pu iho la laua he kane a he wahine.

Ia Kawelo e mahiai ana, lohe aku la ia i ka pihe uwa o kai, uwa ka pihe a [7]haalele wale, alaila, ninau aku o Kawelo i na kupuna: “Heaha kela pihe o kai e uwa mai nei?” I mai la na kupuna: “Ou kaikuaana; hele aku la i ka heenalu, a hoi mai la mokomoko me ke kanaka ikaika o Kakuhihewa, a hina iho la

strong man. One of them must have been thrown, hence the shouting you hear.” When Kawelo heard this he became very anxious to go down and see it; but his grandparents would not consent.6 On the next day, however, Kawelo went down on his own account and saw his older brothers surf riding with many others at Kalehuawehe. He asked for a board which was given him and he swam out with it to where his brothers were waiting for the surf, and they came in together. After the surf riding, they went to the stream of Apuakehau and took a fresh water bath; and from there they went to the shed where the wrestling bouts were to be held.

Upon their arrival at the shed Kawelo stood up with the strong man to wrestle. At sight of this Kawelo’s older brothers said to him: “Are you strong enough to meet that man? If we whose bones are older cannot throw him, how much less are the chances of yourself, a mere youngster.” Kawelo, however, paid no heed to the remarks made by his brothers, but stood

kekahi, uwa ae la, a nolaila, kela pihe au e lohe la i ka uwa.” A lohe o Kawelo, olioli iho la ia e iho e ike, aka, aohe ae o na kupuna ona, nolaila, i kekahi la, iho aku la o Kawelo ma kona manao a hiki i kai, e heenalu ana na kaikuaana a me ka lehulehu i ka nalu o Kalehuawehe. Nonoi aku la o Kawelo i papa nona, a loaa mai la, au aku la ia i ka heenalu a loaa na kaikuaana, hee iho la lakou i ka nalu, a pau ka heenalu ana, hoi aku la lakou a ka muliwai o Apuakehau auau wai, a pau ka auau ana, hoi aku la lakou i ka hale mokomoko. A hiki lakou i ka hale, ku ae la o Kawelo me ke kanaka ikaika i ka mokomoko. I mai na kaikuaana: “He ikaika no oe e ku nei, a hina ka hoi maua na mea i oo ka iwi, ole loa aku oe he opiopio?” Aole o Kawelo maliu aku i keia olelo a kona mau kaikuaana, ku iho la no o Kawelo, a pela no hoi ua kanaka la. Ia wa, olelo mai ua kanaka ikaika la ia Kawelo, penei: “Ina wau e kahea penei, ‘Kahewahewa, he ua!’ alaila, kulai kaua.” Hai aku la no hoi o Kawelo i kana olelo hooulu, penei: “Kanepuaa! Ke nahu nei!

there facing the strong man. At this show of bravery the strong man said to Kawelo: “If I should call out, ‘Kahewahewa, it is raining’,7 then we begin.” Kawelo then replied in a mocking way: “Kanepuaa, he is biting, wait awhile, wait awhile. Don’t cut the land of Kahewahewa, it is raining.”8 While Kawelo was having his say, the strong man of Kakuhihewa was awarded the privilege of taking the first hold; and using his whole strength he attempted to throw Kawelo. Kawelo was almost thrown, but through his great strength and skill he was not. Kawelo, after mocking the man, took his hold and threw the strong man, who was thrown with Kawelo on top of him. This delighted the people so much that they all shouted.

When the older brothers of Kawelo saw how the strong man was thrown by their younger brother they were ashamed, and they returned home weeping and tried to deceive their grandparents. When they arrived at the house the grandparents asked them: “Why these tears?”

Alia! Alia i oki ka aina o Kahewahewa, he ua!” Ia Kawelo e olelo ana peia, lilo iho la ka olelo mua i ke kanaka ikaika o Kakuhihewa, a i ke kulai ana, aneane no e hina o Kawelo, a no ka ikaika no o Kawelo, aole i hina. Ia manawa hoomakaukau o Kawelo i kana olelo hooulu, a i ko Kawelo kulai ana hina iho la ia ia a kau iho la o Kawelo maluna, a uwa ae la na kanaka a pau loa.

A ike na kaikuaana o Kawelo, i ka hina ana o ke kanaka ikaika i ko laua kaikaina, hilahila iho la laua, a hoi aku la i ka hale me na olelo hoopunipuni i na kupuna, me ka uwe, a me ka waimaka. Ninau mai la na kupuna: “He waimaka aha keia?” I aku la laua: “I pehi ia mai nei maua e

They replied: “Kawelo threw stones at us. We are therefore going back to Kauai.” After the brothers of Kawelo had returned to Kauai, Kawelo and his wife and younger brother Kamalama lived on at Waikiki.

Not very long after this Kawelo began to learn dancing, but being unable to master this he dropped it and took up the art of war under the instruction of his father-in-law, Kalonaikahailaau. Kamalama also took up this art as well as Kanewahineikiaoha. After Kawelo had mastered the art of warfare, he took up fishing. Maakuakeke of Waialae was the fishing instructor of Kawelo.

Early in the morning Kawelo would get up and start out from Waikiki going by way of Kaluahole, Kaalawai, and so on to Waialae where he would chant out:

Say, Maakuakeke, Fishing companion of Kawelo, Wake up, it is daylight, the sun is shining, [8] The sun has risen, it is up.

Kawelo i ka pohaku, nolaila, e hoi ana maua i Kauai.” A hoi na kaikuaana o Kawelo i Kauai, noho iho la o Kawelo me kana wahine, a me kona pokii me

Kamalama. Mahope o laila, ao o Kawelo i ka hula, a o ka loaa ole o ia, haalele o Kawelo ia mea, a ao iho la i ke kaua me kona makuahunowai me

Kalonaikahailaau; ao iho la no hoi o Kamalama, a me

Kanewahineikiaoha. A pau ke ao ana i ke kaua, ao iho la o Kawelo i ka lawaia. O

Maakuakeke he kumu lawaia a Kawelo, no Waialae.

I ke kakahiaka nui, ala ae la o Kawelo a hele aku la mai Waikiki aku, a Kaluahole, Kaalawai, hiki i Waialae, paha aku la o Kawelo penei:

E Maakuakeke, Hoa lawaia o Kawelo nei la, E ala, ua ao, ua malamalama, [9]

Ua hiki ka la aia i luna; Lawe mai na kihele makau,

Bring along our hooks

Together with the fishing kit

As well as our net.

Say, Maakuakeke,

The rattling paddles, The rattling top covering, The rattling bailing cup, wake up, it is daylight.

While Kawelo was chanting, Maakuakeke’s wife heard it, so she woke her husband up saying: “Wake up, I never heard your grandparents chant your name so pleasingly as has Kawelo this morning. No, not even your parents. This is the first time that I have heard such a pleasing chant.” Maakuakeke then woke up, made ready everything called out by Kawelo in the chant, went out, boarded the canoe and they set out. As they were going along, Maakuakeke called out to Kawelo in a chant as follows:

Say, Kawelo-lei-makua, stop. Say, offspring of the cliffs of Puna,

The eyes of Haloa are above, My lord, my chiefly fisherman of Kauai.

Me ka ipu holoholona pu mai, Me ka upena mai a kaua; E Maakuakeke, Ka hoe nakeke, Ke kuapoi nakeke, Ke ka nakeke, e ala ua ao.

Ma keia paha a Kawelo, lohe ka wahine a Maakuakeke, hoala aku la i kana kane: “E, e ala, aole au i lohe i ka lealea o ko inoa i kou mau kupuna, aole hoi i na makua, a ia Kawelo akahi no au a lohe i ka lea o kou inoa.”

Ala ae la o Maakuakeke, hoomakaukau i na mea a pau a Kawelo i kahea mai ai, hele aku la a kau i luna o ka waa, a holo aku la laua. Ia laua e holo ana, kahea mai o Maakuakeke ia Kawelo, penei:

E Kawelo-lei-makua, e pae, E kama hana a ka lapa o Puna, Na maka o Haloa i luna, Kuu haku, kuu lawaia alii o Kauai.