Machine translation 12th china workshop cwmt 2016 urumqi china august 25 26 2016 revised selected pa

Page 1


August 25 26 2016

Visit to download the full and correct content document: https://textbookfull.com/product/machine-translation-12th-china-workshop-cwmt-2016 -urumqi-china-august-25-26-2016-revised-selected-papers-1st-edition-muyun-yang/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Information Security Applications: 17th International Workshop, WISA 2016, Jeju Island, Korea, August 25-27, 2016, Revised Selected Papers 1st Edition Dooho Choi

https://textbookfull.com/product/information-securityapplications-17th-international-workshop-wisa-2016-jeju-islandkorea-august-25-27-2016-revised-selected-papers-1st-editiondooho-choi/

Machine Learning and Intelligent Communications First International Conference MLICOM 2016 Shanghai China August 27 28 2016 Revised Selected Papers 1st Edition Huang Xin-Lin (Eds.)

https://textbookfull.com/product/machine-learning-andintelligent-communications-first-international-conferencemlicom-2016-shanghai-china-august-27-28-2016-revised-selectedpapers-1st-edition-huang-xin-lin-eds/

Radio Frequency Identification and IoT Security 12th

International Workshop RFIDSec 2016 Hong Kong China November 30 December 2 2016 Revised Selected Papers 1st Edition Gerhard P. Hancke

https://textbookfull.com/product/radio-frequency-identificationand-iot-security-12th-international-workshop-rfidsec-2016-hongkong-china-november-30-december-2-2016-revised-selectedpapers-1st-edition-gerhard-p-hancke/

Information Security and Cryptology: 12th International Conference, Inscrypt 2016, Beijing, China, November 4-6, 2016, Revised Selected Papers 1st Edition Kefei Chen

https://textbookfull.com/product/information-security-andcryptology-12th-international-conference-inscrypt-2016-beijingchina-november-4-6-2016-revised-selected-papers-1st-editionkefei-chen/

Computer Engineering and Technology 20th CCF Conference

NCCET 2016 Xi an China August 10 12 2016 Revised Selected Papers 1st Edition Weixia Xu

https://textbookfull.com/product/computer-engineering-andtechnology-20th-ccf-conference-nccet-2016-xi-an-chinaaugust-10-12-2016-revised-selected-papers-1st-edition-weixia-xu/

Digital TV and Wireless Multimedia Communication: 13th International Forum, IFTC 2016, Shanghai, China, November 9-10, 2016, Revised Selected Papers 1st Edition Xiaokang Yang

https://textbookfull.com/product/digital-tv-and-wirelessmultimedia-communication-13th-international-forumiftc-2016-shanghai-china-november-9-10-2016-revised-selectedpapers-1st-edition-xiaokang-yang/

Management of Information, Process and Cooperation: Third International Workshop, MiPAC 2016, Hangzhou, China, September 23, 2016, Revised Selected Papers 1st Edition Jian Cao

https://textbookfull.com/product/management-of-informationprocess-and-cooperation-third-international-workshopmipac-2016-hangzhou-china-september-23-2016-revised-selectedpapers-1st-edition-jian-cao/

E Learning and Games 10th International Conference

Edutainment 2016 Hangzhou China April 14 16 2016

Revised Selected Papers 1st Edition Abdennour El Rhalibi

https://textbookfull.com/product/e-learning-and-games-10thinternational-conference-edutainment-2016-hangzhou-chinaapril-14-16-2016-revised-selected-papers-1st-edition-abdennourel-rhalibi/

Knowledge Graph and Semantic Computing Semantic

Knowledge and Linked Big Data First China Conference CCKS 2016 Beijing China September 19 22 2016 Revised Selected Papers 1st Edition Huajun Chen

https://textbookfull.com/product/knowledge-graph-and-semanticcomputing-semantic-knowledge-and-linked-big-data-first-chinaconference-ccks-2016-beijing-china-september-19-22-2016-revisedselected-papers-1st-edition-huajun-chen/

12th China Workshop, CWMT 2016

Urumqi, China, August 25–26, 2016

Revised Selected Papers

Communications inComputerandInformationScience668

CommencedPublicationin2007

FoundingandFormerSeriesEditors: AlfredoCuzzocrea,Dominik Ślęzak,andXiaokangYang

EditorialBoard

SimoneDinizJunqueiraBarbosa

Ponti ficalCatholicUniversityofRiodeJaneiro(PUC-Rio), RiodeJaneiro,Brazil

PhoebeChen

LaTrobeUniversity,Melbourne,Australia

XiaoyongDu

RenminUniversityofChina,Beijing,China

JoaquimFilipe

PolytechnicInstituteofSetúbal,Setúbal,Portugal

OrhunKara

TÜBİTAKBİLGEMandMiddleEastTechnicalUniversity,Ankara,Turkey

IgorKotenko

St.PetersburgInstituteforInformaticsandAutomationoftheRussian AcademyofSciences,St.Petersburg,Russia

TingLiu

HarbinInstituteofTechnology(HIT),Harbin,China

KrishnaM.Sivalingam

IndianInstituteofTechnologyMadras,Chennai,India

TakashiWashio

OsakaUniversity,Osaka,Japan

Moreinformationaboutthisseriesathttp://www.springer.com/series/7899

12thChinaWorkshop,CWMT2016 Urumqi,China,August25–26,2016

RevisedSelectedPapers

ISSN1865-0929ISSN1865-0937(electronic) CommunicationsinComputerandInformationScience

ISBN978-981-10-3634-7ISBN978-981-10-3635-4(eBook) DOI10.1007/978-981-10-3635-4

LibraryofCongressControlNumber:2016963154

© SpringerNatureSingaporePteLtd.2016

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe materialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsor omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations.

Printedonacid-freepaper

ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. Theregisteredcompanyaddressis:152BeachRoad,#21-01/04GatewayEast,Singapore189721,Singapore

Preface

Followingtheprevioussuccessfulworkshopsintheseries,the12thChinaWorkshop onMachineTranslation(CWMT)washeldduringAugust25–26,2016,inUrumqi, China.Thisworkshopprovidesanopportunityforresearchersandpractitionersto communicateandexchangeideas,andaimstoimprovetheresearchofmachine translationinChina.

Wewereabsolutelythrilledthat76submissionsweresubmittedtotheconference. Allofthemwerecarefullyreviewedinadouble-blindmannerandeachpaperwas assignedtoatleasttwoindependentreviewers.Finally,15ChineseandnineEnglish paperswereaccepted,yieldinganoverallacceptationrateof31.6%.WithanEnglish versionofoneChinesepaper,thisproceedingsvolumecomprisestenpublications. AsistraditionallythecasewithCWMT,thepaperscoverawiderangeofsubjects, includingstatisticalMT,hybridMT,MTevaluation,postediting,alignment,aswellas inducingbilingualknowledgefromcorpora.

Thisyear,CWMT2016featuredthreekeynotespeechesdeliveredbyrenowned expertsinthe fi eldofMTandfourinvitedtalksexploringedge-cuttingtechnologiesby youngresearchers.Acommontopicthatwashighlightedinthesetalksistheemergenceofneural-basedMTparadigms,seeminglyovertakingthestatisticalMT approachthat firstappearedinthisconference11yearsago.

Thereareanumberofpeoplewewouldliketothank.Firstly,thisconferencewould nothavebeenpossiblewithouttheenormouseffortsofDr.ChengqingZong,Dr.LeSun, Prof.TiejunZhao,Prof.JingboZhu,andProf.XiaodongShi.Wewouldespeciallylike tothanktheSpringerforpublishingtheproceedingsagain(forthesecondtimeinCWMT series).Secondly,ourheartfeltthanksgotoProf.ZhangMin,thegeneralchair,andthe membersoftheProgramCommittee,who,asusual,didsterlingworkoverandabove whatmighthavereasonablybeenexpectedfromthem.Finally,wewouldliketothank Dr.YatingYangandherteaminXinjiangTechnicalInstituteofPhysicsandChemistry, CAS,whoseexcellentconferenceorganizationimpressedtheattendeesofCWMT2016.

August2016MuyunYang

Organization

SteeringCommittee

ChengqingZongInstituteofAutomationofChineseAcademyofSciences,China LeSunInstituteofSoftwareChineseAcademyofSciences,China

TiejunZhaoHarbinInstituteofTechnology,China

JingboZhuNortheasternUniversity,China XiaodongShiXiamenUniversity,China

ConferenceChair

MinZhangSoochowUniversity,China

ProgramChair

MuyunYangHarbinInstituteofTechnology,China

PublicationChair

ShujieLiuMicrosoftResearchAsia

ProgramCommittee

HailongCaoHarbinInstituteofTechnology,China YidongChenXiamenUniversity,China

YufengChenBeijingJiaotongUniversity,China ChongFengBeijingInstituteofTechnology,China YuhangGuoBeijingInstituteofTechnology,China YanqingHeInstituteofScienti ficandTechnicalInformationofChina,China ZhongjunHeBaidu,Inc.

ShujianHuangNanjingUniversity,China WenbinJiangInstituteofComputingTechnology, ChineseAcademyofSciences,China HongfeiJiangDinfoBeijingTechnologyDevelopmentCo.,Ltd.,China JianfengLiUSTCiflytekCo.,Ltd.

LemaoLiuNationalInstituteofInformationandCommunications Technology,China

ShujieLiuMicrosoftResearchAsia YangLiuTsinghuaUniversity WeihuaLuoAlibabaInc.

CunliMaoKunmingUniversityofScienceandTechnology

HaitaoMiIBMT.J.WatsonResearchCenter,USA

HideyaMinoNationalInstituteofInformationandCommunications Technology

JinsongSuXiamenUniversity,China

ZhaopengTuNoah’sArkLab

TongXiaoNortheasternUniversity

DeyiXiongSoochowUniversity,China

MoYuIBMT.J.WatsonResearchCenter,USA

YatingYangXinjiangTechnicalInstituteofPhysics andChemistryChineseAcademyofSciences,China

ConghuiZhuHarbinInstituteofTechnology,China

HaoZhangGoogle,Inc.

JiajunZhangInstituteofAutomationofChineseAcademyofSciences,China YuZhouInstituteofAutomationofChineseAcademyofSciences,China

YunZhuBeijingNormalUniversity,China

LocalOrganizationChair

YatingYangXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

LocalOrganizingCommittee

XiaoboWangXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

WeiYangXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

RuiDongXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

ChenggangMiXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

KamaliXinjiangTechnicalInstituteofPhysicsandChemistry, ChineseAcademyofSciences,China

Organizers

ChineseInformationProcessingSocietyofChina

Co-organizer

XinjiangTechnicalInstituteofPhysicsandChemistry,ChineseAcademyofSciences

Sponsors

GTCTechnologyCo.,Ltd.

GuangxiDaringE-CommerceServices Co.,Ltd.

ShenyangYaTransNetworkTechnologyCo., Ltd.

BeijingLingosailTechCo.,Ltd.

Contents

MinKSR:ANovelMTEvaluationMetricforCoordinatingHuman TranslatorswiththeCAT-OrientedInputMethod....................1 GuopingHuang,ChunluZhao,HongyuanMa,YuZhou, andJiajunZhang

Pivot-BasedSemanticSplicingforNeuralMachineTranslation...........14 DiLiu,ConghuiZhu,TiejunZhao,XiaoxueWang,andMuyunYang

Re-rankingforBilingualLexiconExtractionwithBi-directionalLinear TransformationfromComparableCorpora.........................25 ChunyueZhangandTiejunZhao

LearningBilingualSentenceRepresentationsforQualityEstimation ofMachineTranslation......................................35 JunguoZhu,MuyunYang,ShengLi,andTiejunZhao

ResearchonDomainAdaptationforSMTBasedonSpecific DomainKnowledge.........................................43 YanqingHe,LiangDing,andYingLi

AutomaticConstructionofDomainTerminologyKnowledgeBase forHowNetBasedontheHeadword.............................61 ChuangWu,LinWang,NaYe,GuipingZhang,andDongfengCai

BuildingtheVietnamesePhraseTreebankbyImprovedProbabilistic Context-FreeGrammars......................................75 YingLi,JianyiGuo,ZhengtaoYu,YantuanXian,andYonghuaWen

ClassifyingCommasforPatentMachineTranslation..................91 HongzhengLiandYunZhu

BLEUS-syn:Cilin-BasedSmoothedBLEU.........................102 JuntingYu,WuyingLiu,HongyeHe,andMianzhuYi

ResearchontheCalculationMethodofSemanticSimilarityBased onConceptHierarchy.......................................113 KaiWang AuthorIndex

GuopingHuang1,2(B) ,ChunluZhao3 ,HongyuanMa3 ,YuZhou1 , andJiajunZhang1

1 NationalLaboratoryofPatternRecognition,InstituteofAutomation, ChineseAcademyofSciences,Beijing,China {guoping.huang,yzhou,jjzhang}@nlpr.ia.ac.cn

2 UniversityofChineseAcademyofSciences,Beijing,China 3 CNCERT/CC,Beijing,China chunluzhao@cert.org.cn,mahongyuan@foxmail.com

Abstract. Inordertoimprovetheefficiencyofhumantranslation,there isanincreasinginterestinapplyingmachinetranslation(MT)tocomputerassistedtranslation(CAT).ThenewlyproposedCAT-oriented inputmethodissuchatypicalapproach,whichcanhelptranslatorssignificantlysavekeystrokesbyexploitingMTdeepinformation,suchasnbestcandidates,hypothesesandtranslationrules.Inordertofurthersave morekeystrokes,weproposeinthispaperanovelMTevaluationmetric forcoordinatinghumantranslatorswiththeinputmethod.ThisevaluationmetrictakesMTdeepinformationintoaccount,andmakeslonger perfectfragmentscorrespondtofewerkeystrokes.Extensiveexperiments showthatthenovelevaluationmetricmakesMTsubstantiallyreducethe keystrokesoftranslatingprocessbyaccuratelygraspingdeepinformationfortheCAT-orientedinputmethod,anditsignificantlyimproves theproductivityofhumantranslationcomparedwithBLEUandTER.

1Introduction

Computerassistedtranslation(CAT)isacommonwayonlanguagetranslation inwhichahumantranslatorusesasoftwaretoperformandfacilitatethetranslationprocess.Inordertoimprovetheefficiencyofhumantranslations,bridging machinetranslation(MT)andCAThasdrawnmoreandmoreattention.For instance,thenewlyproposedCAT-orientedinputmethod,whichiscalledCoCat in[7],issuchatypicalapproach.Theinputmethodcanexploitdeepinformation usedbytheunderlyingstatisticalmachinetranslation(SMT)system,including translationrules,decodinghypothesesandn-bestcandidates,tosignificantly savekeystrokesandspeeduptranslatingforlanguageswithcomplexcharacters, suchasChineseandJapanese.IntheCATscenario,MTresultsvaryconsiderablyinquality.Asaresult,themostnotableadvantageoftheCAT-oriented inputmethodisthattranslatorsdon’thavetoproofreadsuchMTresults.

c SpringerNatureSingaporePteLtd.2016 M.YangandS.Liu(Eds.):CWMT2016,CCIS668,pp.1–13,2016. DOI:10.1007/978-981-10-3635-4 1

Fig.1. TheoverviewofCoCatinputmethodandMTevaluationmetric.

Accordingto[7],Fig. 1 demonstrateshowtheCoCatinputmethodworks intranslationfromEnglishtoChinese.InZoneA,ifthetranslatoradopts thewildlyacceptedGooglePinyintoperformtranslationfromscratch,the abbreviatedChinesetypingletters“zgklgg”(theacronymChinesePinyin)cannotelicitthecorrecttranslation.BecausetheGooglePinyincannotperceive whatexactlythecurrentusertranslating.Instead,inZoneB,theCoCatinput methodcancorrectlydecodethesameabbreviatedlettersintothedesiredresult (“ ”,Chinaconsiderstoreform)withthehelpoftranslationrules andhypotheses(inZoneC),allofwhichareusedbytheunderlyingSMTdecoder (inZoneE).What’smore,theCoCatinputmethodprovidesann-grampredictionlist(5-basedordinalinZoneB)basedonthen-bestlist(inZoneD)providedbytheSMTdecoder.Inthisway,theCAT-orientedinputmethodgreatly improvestheproductivityofhumantranslation.

Andtheinputmethodhasalotofrootforimprovementsinkeystrokessaving. Itneedstobefurtherupgraded.Aswecansee,boththeinputmethoddecoding resultsandthen-grampredictionlistsheavilydependontheintermediateand finalMTresults,suchasn-bestcandidates,hypothesesandtranslationrules. Obviously,betterMTresultscanhelptheCAT-orientedinputmethod.

InordertocoordinatehumantranslatorswiththeCAT-orientedinput method,itraisesthequestionofhowtogenerateandranktheintermediate andfinalMTresults.InspiredbyFig. 1,themostnaturalwayistoselectamore appropriateMTevaluationmetric.Undertheguidanceofabetterevaluation metric,theSMTdecoderisexpectedtogeneratemoresuitabledeepinformation requiredbytheCAT-orientedinputmethod,andhumantranslatorscanhave higherefficiency.

However,almostalltheevaluationmetrics,suchasBLEU[12],TER[13], KSR[11]andKSMR[1],donottakesuchusefuldeepinformationintoaccount.

Toachieveourgoals,weproposeinthispaperanovelMTevaluationmetric whichconsidersdeepinformationusedbytheunderlyingMTsystem,includingn-bestcandidates,hypothesesandtranslationrules.Itmakeslongerperfect fragmentsembeddedindeepinformationcorrespondtofewerkeystrokes.Wecall thismetricMinKSR:MinimumKeystrokeSavingRatio.Theword“minimum” meansthatthevalueofMinKSRshowshowmanykeystrokescanbesavedat least.ThehigherthevalueofMinKSR,themorethekeystrokescanbesaved.

WiththeguidanceofMinKSR,theMTsystemistunedtogeneratethe most“suited”deepinformationfromthepointofviewoftheCAT-oriented inputmethod,ratherthanthegeneral“best”finalresultsintermsofBLEUand TER.Thatistosay,MinKSRisaspecialMTevaluationmetricforcoordinating humantranslatorswiththeCAT-orientedinputmethods.

Incontrast,MinKSRaimsatestimatingtheminimumkeystrokesthatcan besavedbytheunderlyingMTsystemtocompletetypingthetranslationusing theCAT-orientedinputmethod.Sofarasweknow,MinKSRisthefirstMT evaluationmetricfortheCAT-basedinputmethod.

Insummary,thispapermakesthefollowingcontributions:

(1)MinKSRevaluatesboththefinalMTresultsandtheintermediateMT results,includingn-bestcandidates,hypothesesandtranslationrules.

(2)UndertheguidanceofMinKSR,theMTsystemgeneratesmoreuseful deepinformationforcoordinatinghumantranslatorswiththeCAT-oriented inputmethod.Ithelpstheinputmethodsubstantiallyandfirmlyreducethe keystrokesoftranslatingprocess.

(3)MinKSRisanupgradedversionofBLEU.ItoptimizestheintermediateMT resultsrequiredbytheCAT-orientedinputmethod,whiletheBLEUscores ofthefinalMTresultsdon’treallychange.

2Background

InFig. 1,theSMTdecoderscorestranslationrules,translationhypothesesand n-bestcandidatesusingalinearmodel.Generallyspeaking,thefeaturesofthe

Fig.2. ThecomparisonofkeysequencesbasedontheMTtunedbyBLEUand MinKSR.

linearmodelaretheprobabilitiesfromlanguagemodels,translationmodels,and reorderingmodels,plusotherfeatures.Tuningistheprocessoffindingtheoptimalweightsforthislinearmodel.IntheSMTtuningprocess,theperformance ofMTsystemisusuallymeasuredwithacertainevaluationmetricwhichcomparesthefinalMTresultswiththespecifiedreferences.Andthen,duringSMT decoding,theoptimalweightsdirectlyinfluencethedeepinformationrequired bytheCAT-orientedinputmethod.

TheexistingMTevaluationmetrics,includingBLEU,TER,KSRand KSMR,failedtocoversuchdeepinformation.Thewell-knowncorpus-levelmetricBLEUisbasedonthen-grammatchingbetweenthefinalMToutputand thereferencetranslations.AnotherpopularmetricTERmeasurestheamountof editingneededinmodifyingtheMToutputtoexactlymatchareferencetranslation,andworkswellinthepost-editingscenario[2, 10, 16].KSR(keystroke ratio)andKSMR(keystrokeratioplusmouse-actionratio)isusedtoestimate theeffortneeded,especiallyintheIMTscenario[4],toproducecorrecttranslations.Besides,KSMRandKSRdon’ttakeaccountoflanguageswithcomplex characters,forwhichaninputmethodisrequired,suchasChineseandJapanese.

IntheCATscenariowiththeinputmethod,wefindthattranslatorsprefer todirectlyselectthecorrectn-grampredictions.Asaresult,itisimportantto generatetheperfectbeginningsandlongermatchedfragments,evenifthescores ofBLEUorTERdon’treallychange.

ToillustratehowMinKSRworksintheCATscenario,let’sconsiderthe exampleinFig. 2.Thehumantranslationreferstothetargetsentenceinmind. Forthesakeofsimplicity,deepinformationoftheSMTdecoderandthefull

n-bestcandidatesareomitted.Aswecansee,ittakes13keystrokestoinputthe humantranslationwiththeMTsystemtunedbyBLEUasshowninFig. 2(a).

Incontrast,inFig. 2(b),thenumberofkeystrokeswillbereducedto5by usingMinKSR.ThekeyisStep1and3.Step1generatestheidealprediction “ ”bygraspingtheperfectbeginningfragment.AndStep3hitsthe perfectprediction(“ ”,thewelfaresystemofcivilservice)by searchingtheoptimizedMTdeepinformation.Thisindicatesthatthedeep informationisveryimportantforcoordinatinghumantranslatorswiththeCATorientedinputmethod.

3TheMinKSRMetric

ThepurposeofMinKSRmetricistomeasurehowmanykeystrokescanbe reducedatleastbytheMTsystemintegratedintheMT-basedinputmethod. ThecoreideaofMinKSRistomapthelongerperfectfragmentstofewerkeystrokes.ToautomaticallyevaluatetheMTsystem,MinKSRiscalculatedby theempiricalkeystrokesratherthanthepracticalkeystrokes.Therearethree sufficientstatisticstocalculateMinKSR:

(1) mknorm (tm 1 ):thecountofminimumkeystrokestoinputthehumantranslation tm 1 = t1 t2 ...tm usingthegeneralinputmethodcharacter-by-character withouttheaidoftheMTsystem.

(2) ek (Q,tm 1 ):thecountofempiricalkeystrokestoinputthehumantranslation tm 1 usingtheMT-basedinputmethodwiththeaidoftheintermediateand finalMTresult Q.

(3) pk (tm 1 ):thecountofminimumidealkeystrokestoinputthehumantranslation tm 1 usingtheMTinputmethodwiththeaidoftheperfectMTcandidate cn 1 = tm 1

Let sj 1 = s1 s2 ...sj denotesthesourcesentence, C denotesthen-bestlist {cn 1 = c1 c2 ...cn }Z1 1 , H denotesthehypothesislists {hZ2 1 },and L denotesthe translationrulelists {l Z3 1 },where j referstothewordnumberofsourcesentence, Z1 referstothelengthlimitationofthen-bestlist, Z2 referstothelength limitationofthehypothesislistforeachphrase, Z3 referstothelengthlimitation ofthetranslationrulelistforeachphrase.ThentheMTintermediateandfinal resultcanbedefinedasthetriple Q =(C,H,L).Andthenumberofphrases is j ×(j +1) 2 forphrase-basedSMTsystems.Giventhereferencetranslation tm 1 = t1 t2 ...tm ofthesourcesentence,MinKSR, r ,isgivenasfollows:

r (Q,tm 1 )=

mknorm (tm 1 ) ek (Q,tm 1 ) mknorm (tm 1 ) pk (tm 1 ) (1)

Ifthereisonlythecandidate cn 1 = c1 c2 ...cn intheMTresultwithoutdepth information,MinKSRdegeneratesintoageneralMTevaluationmetricandperformssimilarlytoBLEU.Ifthereismorethanonereference,wejustsimply selecttheminimum ek (Q,tm 1 ).TocalculatethethreetermsinEq. 1,weintroducethefollowingnotations:

sn:thecharacternumberofseparatorsbetweenwords. sn =0forChinese, sn =1forEnglish(thespacebetweenwords).

– kc:thecountofkeystrokestoselectacertaincandidateorpredictionfromthe inputmethod.Ingeneral, kc =1withoutpageturning.

– rk :aratiodividingthecharacternumberofawordbynumberofkeystrokes withthehelpofaninputmethod.Thevalueof rk variesfromlanguageto language.ForChinese,inthispaper,wechoosesimply rk =2toguarantee thatMinKSRisindeedalowerbound.

ThenwecancountthethreesufficientstatisticsinEq. 1 asfollows:

(1)Thecountofminimumkeystrokesusingageneralinputmethodwithout theMTsystem:

where mkwnorm (ti )denotestheminimumkeystrokesofword ti usinga generalinputmethod:

where len(ti )isthecharacternumberof ti .

(2)ThecountofempiricalkeystrokesusingtheMT-basedinputmethodwith theMTsystem:

(Q,tm 1 )=min

where CP (tm 1 )denotesthesetofallpartitions,eachofwhichbreaksthe humantranslation tm 1 intonon-emptycontiguoussub-sequences, cp denotes aspecificpartitionmemberofset CP (tm 1 ), q referstothenumberofsubsequencesin cp, ek (Q,cpi )denotestheempiricalkeystrokestoinputthesubsequence cpi .Let P denotetothen-grampredictionlist,then, ek (Q,cpi ) canbedefinedas:

ek (Q,cpi )=

kccpi ∈ P len(ti )+ kccpi / ∈ P,cpi ∈ Q mkwnorm (cpi )+ kccpi / ∈ P,cpi / ∈ Q (5)

(3)Giventhemaximumlengthofn-grampredictionlist W ,thecountofminimumidealizedkeystrokesusingtheMT-basedinputmethodwiththeaid oftheperfectMTcandidate tm 1 :

pk (tm 1 )= m W × (kc + sn) snmmodW =0 m W × (kc + sn)+ kcmmodW =0 (6)

Thedefaultvalueof W is4inthispaper.

Inconclusion,withthesufficientstatisticsabove,thevalueofMinKSRon sentence-levelcanbecalculatedbyEq. 1.Inaddition,thevalueofMinKSRon corpus-levelisgivenbytheequation:

where T denotesthesetoftranslationreferences.

Both pk (tm 1 )and ek (Q,tm 1 )emphasizethen-grammatching.Asaresult, MinKSRisanextensiontoBLEU.MinKSRoptimizesdeepinformationrequired bytheCAT-orientedinputmethod,includingn-bestcandidates,hypotheses, translationrules,andtheperfectbeginningsofthefinalMTresults.Andthe BLEUscoresofthefinalMTresultsdon’treallychange.

What’smore,followingtheEq. 3,MinKSRcanbeadjustedtosuitother languagesbychangingthevalueof rk accordingto[3, 5].

4MinKSRwithLengthPenalty

WehaveconsideredthewordchoiceandthewordorderinbaselineMinKSR. NowwefocusonthelengthofMTcandidates.Let c betheaveragelengthof thefinalMTcandidatesinthen-bestlistand t bethereferencelength.Inspired byBLEU,wecomputethebrevitypenaltyBP:

Then,

ThevalueofMinKSRrangesfrom0to1.ThehigherthevalueofMinKSR is,themorethekeystrokescanbesaved.The perfect MTresultsforthesource sentencewillattainascore1ifitisidenticaltothehumantranslation.

5Experiments

Weconducttheexperiments,includingcomparisontests,thecorrelationtests andthehumanproductivitytests,tocompareMinKSRwithtwopopularmetricsBLEUandTER.Tohaveacomprehensiveunderstanding,wemeasurethe humanproductivityfromthreeperspectives:translationtime,keystrokesand translationquality.

5.1ExperimentalSetup

TheexperimentsareconductedonEnglish-to-Chinesetranslation.Thestatisticalsignificancetestisperformedbythere-samplingapproach[8].

8G.Huangetal.

Firstly,were-implementCoCatinputmethodandasimilarphrase-based MTsystemaccordingto[7, 14].TheintegratedMTsystemistrainedonabout 10,000,000parallelsentencepairsofEnglish-Chinesenews.

Secondly,inordertoconducttheproductivitytests,were-implementasimilarCATplatformaccordingto[7].Thisplatformallowsustoanalyzethetranslationtime,keystrokesandtranslationqualityindetailafterwards.

Next,wewillintroducethepractitionersandexperimentaldataofthehuman productivitytests:

ProfessionalTranslationPractitioners: Followingtheconvention,we recruited12professionaltranslatorsforourstudy.Wedividedthe12translatorsinto4groupsevenly(A/B/C/D).Eachtranslatortranslatedthesame setofsentencesfromEnglishtoChinese.Alloftheprofessionaltranslatorsare Chinesenativespeakers.

HumanTranslationExperimentalData: Wechoose480sentencesfrom Chinanews(priortoDecember2014)ofChinaDailyasthetestsetforhuman translators.Thistestsetcontains11,869Englishwords.Eachsentenceranges from23to26words.Then,wesplitthetestdatainto12subsetsrandomlyand evenlyasshowninTable 1.InTable 1,eachsubset,including40sentences,for onemetric.Allthetranslatorsinthesamegroupsruntheexactsametest.

Theprofessionaltranslatorswereaskedtotranslatethetextwithfourdifferentassistanttools:(1)theGooglePinyin(“Google”);(2)theCoCatinput method(“CoCat”);(3)post-editingwiththeGooglePinyin(“PE+Google”); (4)post-editingwiththeCoCatinputmethod(“PE+CoCat”).Naturally,for eachhumantranslator,he/sheshouldtranslatedifferentsentenceswhenusing differentassistanttoolsandevaluationmetrics.AndTable 2 showsthedetails aboutthepermutationofassignmentsinspiredbythepreviousworks[6, 9].

Table1. Thestatisticsofthe4groupsofhumantranslationtestsubsetdata M1 /M2 /M3 /M4 .Eachgroupoftestdatacontains3subsets,andeachsubsetscontains40sentencesforonemetric.

English-Chinese #translators 12 male/female 6/6

Words

Table2. Thepermutationofassignmentsforeachmetric.Translationsubsets M1 –M4 areassignedtothehumantranslatorgroupsA–Dunderthevariousassistances.

A B C D

Table3. ThecomparisonofBLEU,TERandMinKSR.

2

• Part1and2areparallelexperiments.

• “**”meansthescoresaresignificantlybetterthanthecorrespondingpreviouslineswith p< 0 05.

Intherealworld,therearemanyfactorswhichmayinfluenceourexperimentalresults,suchasthedifferentcharacteristicofthetranslators.Toexcludethe translationirrelevantfactorsandretainconsistency,weprocesstheuserdata purelyfollowing[7].

5.2ResultsandAnalysis

(1)TheComparisonTests. TohaveageneralunderstandingaboutMinKSR, wefirstcomparetestresultswithtwopopularmetricsBLEUandTER.We chooseasetof4,040Englishnewssentences(56,149words),whichwastranslatedintoChinese(81,113characters,36,995words)byprofessionaltranslators, fromChinaDaily,andrandomlysplitthemintotwoparts,e.g.,therepeated experiments“Part1”and“Part2”inTable 3.Eachpartisrandomlydivided intotwogroups:developmentset(Dev)including1,000sentencepairs,andtest set(Test)including1,020pairs.TheintegratedMTsystemistunedbythecorrespondingdevelopmentsetusingZMERT[15]withtheobjectivetooptimize BLEU,TERandMinKSRrespectively.Thecorrespondingsystemsandresults aredenotedas“BLEU”,“TER”and“MinKSR”.Then,alltranslationresultsof thedevelopmentsetsandtestsetsareevaluatedwithBLEU,TERandMinKSR. Inaddition,wecountthenumberofsentenceswhichhaveperfectbeginningfragments,andtheresultsarelabeled“PerfectBegin.”Wereportalltheresultsin Table 3

10G.Huangetal.

Ifwefocusonlyontheboldfigures(e.g.,22.78vs.22.62)inTable 3,wecan findthatMinKSRperformsverysimilarlywithBLEUoncorpus-levelevaluation whilethedifferencebetweenMinKSRandTERismuchbigger(e.g.,22.62vs. 21.49).AnditisreasonablesincebothMinKSRandBLEUemphasizethen-gram matchingasmentionedbefore.IncontrasttoBLEUandTER,theFiguresin Table 3 showthatwecanincreaseatleast1.79and0.23MinKSRscoresby tuningtheMTsystemwithMinKSRonthetestset.Thescoresshowthatit hasthepotentialtoreducemorekeystrokesthroughresettingafittingevaluation metric.

IfwefocusontheunderlinefiguresinTable 3,wecanfindthatMinKSRcan increaseatleast7.5and10.7%ofperfectbeginningfragmentsoverTERand BLEU.Asmentionedbefore,perfectbeginningfragmentsareveryimportantto theCATscenario.Thus,theresultsareverysignificant.

Tosumup,MinKSRisanextensiontoBLEU,andperformsverysimilarlyto BLEU.ItoptimizestheintermediateMTresults,suchastheperfectbeginning fragments,requiredbytheCAT-orientedinputmethod,whiletheBLEUscores ofthefinalMTresultsdon’treallychange.

(2)TheCorrelationTests. WefurthertestwhetherMinKSRscoresarepositivelycorrelatedwiththepracticalkeystrokesavingratio(PKSR)oftranslation process.Thetranslatorsretyped2,040pre-translatedtarget(Chinese)sentences ofthetestsetunderdifferenthelpersettings(atotalof8times):theGoogle Pinyininputmethod(denotedas“Google”),thepureCoCatinputmethod withoutMT(“CoCat-MT”),CoCatwiththeMTsystembutn-grampredictiondisabled(“CoCat( P)+MT”),full-featuredCoCatwiththeMTsystem (“CoCat(+P)+MT”).Duringtheanalysis,WereportalltheresultsinTable 4. TheboldfiguresinTable 4 revealthattheCAT-orientedinputmethodintegratedwithMinKSRincreasesover1.10and0.44PKSRscorescomparedto TERandBLEUonthetestset.

Thecorrelationtestsshowthatthereisindeedapositivecorrelationbetween theMinKSRscoresandthepracticalkeystrokesavingsratio.

(3)TheHumanProductivityTests. Atlast,wetesttheperformance ofthreemetricsontheultimategoalofMinKSR,namely,improvingthe

Table4. Thepracticalkeystrokesavingsratio(%)basedontheMTsystemtunedby BLEU,MinKSRandTER.

• Part1and2aretherepeatedexperimentss

• “**”meansthescoresaresignificantlybetterthanthecorrespondingpreviouscolumnswith p< 0.05.

Table5. Translationtime,keystrokesandtranslationquality.

“**”meansthescoresaresignificantlybetterthanthecorrespondingpreviouslines with p< 0 01.

productivityofhumantranslators.Weanalyzethehumanproductivityinterms oftranslationtime,keystrokesandtranslationquality.Toimprovetherobustness,weaveragetheresultvaluesofrepeatedmeasurements.Alltheresultsare reportedinTable 5.Toimproveclarity,thecomparisonstatisticsoftranslation time,keystrokesandtranslationqualityovervariousassistancearereportedin Fig. 3.AswecanseeinFig. 3,onaverage,humantranslatorsarefasterandalso achievebettertranslationqualityusingCoCatwithMinKSR(translatingfrom scratchorpost-editing).Thus,theresultsinTable 5 verifiedfurtherthefeasibilityoftheMinKSRevaluationmetricandtheCAT-orientedinputmethod.

Fortranslationtimeandkeystrokes,theunderlinefiguresandtheboldfiguresinTable 5 showthatourproposedMinKSRalwayshelpshumantranslatorsusingCoCatinputmethodsignificantly(with p< 0.01),savingmorethan 1.59%timeandover4.85%keystrokescomparedwiththestrongbaseline,i.e., (line“MinKSRPE+CoCat”vs.line“BLEUPE+CoCat”andline“MinKSR PE+CoCat”vs.line“TERPE+CoCat”).

Fortranslationquality,thefiguresinTable 5 demonstratethatMinKSRcan helphumantranslatorsusingtheCoCatinputmethodimprovethetranslation qualitysignificantlyaswell(with p< 0 01)bymorethan1.6absoluteBLEU scoresoverthestrongbaseline,i.e.,(line“MinKSRPE+CoCat”vs.line“BLEU PE+CoCat”).

Inshort,thehumanproductivitytestsestablishthatMinKSRimproves actuallytheproductivityofhumantranslationsusingtheCAT-orientedinput method.

Insummary,theresultsofalltheaboveexperimentsareverypromising. UndertheguidanceofMinKSR,theunderlyingMTgeneratesmoreusefuldeep informationfortheCAT-orientedinputmethod.IntheCATscenario,MinKSR helpstheinputmethodreducesubstantiallyandfirmlythekeystrokesoftranslatingprocessforcoordinatinghumantranslators,andsignificantlyimprovesthe actualproductivityofhumantranslations.

Fig.3. Thecomparisonsoftranslationtime(seconds),keystrokesandquality(BLEU).

6Conclusion

Inthispaper,weproposedtheMinKSRevaluationmetricforcoordinating humantranslatorswiththeCAT-orientedinputmethod.MinKSRevaluates boththefinalMTresultsandtheintermediateresults(e.g.,n-bestcandidates, hypothesesandtranslationrules),andestimatesthekeystrokesthatcanbe savedatleastbytheintegratedMTsystem.MinKSRisanextensiontoBLEU. UndertheguidanceofMinKSR,theMTsystemgeneratesmoreusefuldeep informationfortheinputmethod.ExperimentshaveshownthatMinKSRhelps

Keystrokes

theCAT-orientedinputmethodsubstantiallyreducethekeystrokesoftranslationprocess,andsignificantlyimprovetheproductivityofhumantranslation. Furthermore,MinKSRhasbeenfriendlycombinedwiththeinputmethod,the existingMTsystemandtheCATplatform.

Acknowledgments. TheresearchworkhasbeenpartiallyfundedbytheNatural ScienceFoundationofChina(NSFC)underGrantNo.61403379andNo.61402123.

References

1.Barrachina,S.,Bender,O.,Casacuberta,F.,Cubel,J.C.E.,Khadivi,S.,Lagarda, A.,Ney,H.,JesusTomas,E.V.,Vilar,J.:Statisticalapproachestocomputerassistedtranslation.Comput.Linguist. 35(1),3–28(2009)

2.Carl,M.,Dragsted,B.,Elming,J.,Hardt,D.,Jakobsen,A.L.:Theprocessof post-editing:apilotstudy.CopenhagenStud.Lang. 41,131–142(2011)

3.Cui,W.:Evaluationofchinesecharacterkeyboards.IEEEComput. 18(1),54–59 (1985)

4.Foster,G.:Textpredictionfortranslators.Universit´edeMontr´eal(2002)

5.Garay-Vitoria,N.,Abascal,J.:Textpredictionsystems:asurvey.Univ.AccessInf. Soc. 4(3),188–203(2006)

6.Green,S.,Wang,S.I.,Chuang,J.,Heer,J.,Schuster,S.,Manning,C.D.:Human effortandmachinelearnabilityincomputeraidedtranslation.In:Proceedingsof theEMNLP2014(2014)

7.Huang,G.,Zhang,J.,Zhou,Y.,Zong,C.:Anewinputmethodforhumantranslators:integratingmachinetranslationeffectivelyandimperceptibly.In:Proceedings oftheIJCAI2015(2015)

8.Koehn,P.:Statisticalsignificancetestsformachinetranslationevaluation.In:ProceedingsoftheEMNLP2004(2004)

9.Koehn,P.:Aprocessstudyofcomputer-aidedtransltion.Mach.Transl.J. 23(4), 241–263(2009)

10.Koehn,P.:Computer-aidedtranslation.MachineTranslationMarathon(2012)

11.Och,F.J.,Zens,R.,Ney,H.:Efficientsearchforinteractivestatisticalmachine translation.In:ProceedingsofEACL2003(2003)

12.Papineni,K.,Roukos,S.,Ward,T.,Zhu,W.:Bleu:amethodforautomaticevaluationofmachinetranslation.In:ProceedingsoftheACL2002(2002)

13.Snover,M.,JDorr,B.,Schwartz,R.,Micciulla,L.,Makhoul,J.:Astudyoftranslationeditratewithtargetedhumanannotation.In:ConferenceoftheAssociation forMachineTranslationintheAmericas(2006)

14.Xiong,D.,Liu,Q.,Lin,S.:Maximumentropybasedphrasereorderingmodelfor statisticalmachinetranslation.In:ProceedingsofCOLING-ACL2006(2006)

15.Zaidan,O.F.:Z-mert:afullyconfigurableopensourcetoolforminimumerrorrate trainingofmachinetranslationsystems.PragueBull.Math.Linguist. 91,79–88 (2009)

16.Zhechev,V.:Machinetranslationinfrastructureandpost-ediingperformanceat autodesk.In:AMTA2012WorkshoponPost-EditingTechnologyandPractice (2012)

Pivot-BasedSemanticSplicingforNeural

MachineTranslation

HarbinInstituteofTechnology,Harbin150001,China {Liudi,chzhu,tjzhao,wangxiaoxue,ymy}@mtlab.hit.edu.cn

Abstract. Currentneuralmachinetranslation(NMT)usuallyextractsa fixedlengthsemanticrepresentationforsourcesentence,andthendependsonthis representationtogeneratecorrespondingtargettranslation.Inthispaper,we proposedapivot-basedsemanticsplicingmodel(PBSSM)toobtainasemantic representationincludingmoretranslationinformationforsourcesentence,thus improvingthetranslationperformanceofNMT.Thesplicedsemanticrepresentationisderivedfromsourcelanguagesoftrilingualparallelcorpusbythe pivot-basedNMT.Besides,theproposedPBSSMonlydependsononesource languagetogenerateitssemanticrepresentationduringtheencodingprocess. WeintegrateditintotheNMTarchitecture.Experimentsonthe English-Japanesetranslationtaskshowthatourmodelachievesasubstantial improvementbyupto22.9%(3.74BLEU)overthebaseline.

Keywords: Neuralmachinetranslation Pivot-basedtranslation Semantic splicing

1Introduction

Theneuralmachinetranslationsystemsimplementedasencoder-decodernetworkwith recurrentneuralnetworks(Mikolovetal. 2010;Rumelhartetal. 1988;Sundermeyeretal. 2012)haveachievedimpressiveperformanceinmanytranslationtasks(Sutskeveretal. 2014;Choetal. 2014a).Currentneuralmachinetranslation(NMT)methodsusually extracta fixed-lengthsemanticrepresentationforsourcesentence,andthengenerate correspondingtargettranslationdependingontherepresentation(Sutskeveretal. 2014). Obviously,thesemanticrepresentationobtainedbytheencoderisessentialtoNMT.

Inordertoobtainmoreeffectivesemanticvector,manyresearchersusemultilingual parallelcorpustotrainasystemthatconsistsmultipleencodersandmultipledecoders (Luongetal. 2015a;Dongetal. 2015;AndoandZhang 2004;Cohnetal. 2007). Despitetheirsuccess,thesemethodscenteraroundlearningsemanticrepresentation dependingonmultilingualinputtotheencoder.However,theyneglecttheequivalenttranslationinformationbetweenmultilingualinputs.Thisworkshowsthatthe equivalenttranslationinformationisbene ficialforNMT.

Inthispaper,weputforwardthepivot-basedNMTmodel,whichsignificantly improvesthetranslationqualityofEnglishtoJapanese.Basedonthepivot-basedNMT model,wefurtherenrichthesemanticrepresentation,proposingapivot-basedsemantic

© SpringerNatureSingaporePteLtd.2016 M.YangandS.Liu(Eds.):CWMT2016,CCIS668,pp.14–24,2016. DOI:10.1007/978-981-10-3635-4_2

splicingmodel(PBSSM)thatachievesasubstantialimprovementofupto3.74BLEU pointsoverthebaseline.

2Background

Inthissection,wemainlyintroducethepivot-basedtranslationandNMTmodelwith attentionmechanism:RNN-Search(Bahdanauetal. 2015).Onthebasisofthesework, weputforwardpivot-basedNMTmodelanditssemanticsplicingextensionmodel (PBSSM).

2.1Pivot-BasedMachineTranslation

Whenbeinglackofthebilingualparallelcorpusfromthesourcelanguagetothetarget language,thewholetranslationperformancewillbedegraded.Tosolvetheproblem causedbythelackofparallelcorpus,thepivotlanguageisintroduced.Thepivot languageasanintermediaryestablishesabridgefromthesourcelanguagetothetarget language.Pivot-basedtranslationmodelisshowninFig. 1

Therepresentativeresearchmethodsofpivot-basedtranslationcanbedividedinto phrase-basedtranslationmethod(Cohnetal. 2007),sentence-basedtranslationmethod (Utiyamaetal. 2007)andCorpus-basedmethod(Huaetal. 2009).Atpresent,the researchonpivot-basedtranslationismainlycarriedoutinStatisticsMachineTranslation(SMT).InspiredbythesuccessofNMT,weimplementthepivot-basedmachine translationbyneuralnetwork,andproposepivot-basedNMTmodel.

2.2Attention-BasedNeuralMachineTranslation

ThebasicNMTmodelconsistsofanencoderandadecoder.Theencoderreadsand encodesasourcesentence,asequenceofvectors x ¼ðx1; x2; ; xm Þ,intoasemantic representation C.Thedecoderthengeneratesonetargetword yj ; ð1 j nÞ atatime fromtheencodedsemanticrepresentation c.Motivatedfromtheobservationin(Cho etal. 2014a),BahdanauadoptedattentionmechanisminNMTmodel,proposedthe attention-basedneuralmachinetranslationmodel(Bahdanauetal. 2015)(Fig. 2).

Fig.1. ThePivot-basedtranslationarchitecture

Fig.2. Theattention-basedNMTmodel

Theattentionmechanismintranslationtaskallowsthemodeltolearntoalign wordswhentranslating.Theencoderofthismodelisconstructedbyabidirectional recurrentneuralnetwork(BiRNN)(Schusteretal. 1997),whichconsistsofaforward RNN ~ f andareverseRNN f .Whentheencoderreadsaninputsourcesentence x ¼ðx1 ; x2 ; ; xm Þ,theforwardRNN ~ f calculatesaforwardsequenceofhiddenstates ðh1 ; ...hm Þ,andthereverseRNN f computesabackwardsequence ð h1 ; ... hm Þ.At eachpositionofthesourcesentence x,theannotationvector hj ¼½ ~ hj ; hj isobtainedby concatenatingthehiddenstates hj ! and hj .Thedecodergeneratesacorresponding translation y ¼ðy1 ; ; yn Þ withBeam-Searchalgorithm.Whengiventheencoded semanticrepresentation c andallthepreviouslypredictedwords y ¼ðy1 ; ; yt 1 Þ,the decoderusesEq.(1)topredictthenexttargetword yi

Where g isanonlinearfunctionthatoutputstheprobabilityof yi ,and si isthe hiddenstateattime i whichcomputedbyEq.(2).

Where f isanonlinearfunction, ci isrelatedtothehiddenstatesoftheinput sentence,andiscalculatedbyEq.(3).

¼ X Tx j¼1 aij hj

Where aij iscomputedbythefollowingEq.(4).

Pivot-BasedSemanticSplicingforNeuralMachineTranslation17

Where eij ¼ aðsi 1; hj Þ isanalignmentmodel.Itisusedtocalculatetherelevance score,whichmeasureshowrelevantthe j-thencodedsemanticrepresentationofthe inputsandtheoutputatposition i.Thescoreiscomputedwiththedecodedhiddenstate si andthe j-thhiddenstate hj oftheencoder.Thealignmentmodel a isjointlytrained withallotherparameters.

3TheFrameworkofSemanticSplicingExtension

Ourbaselineisimplementedwithattention-basedneuralmachinetranslationmodel.In ordertoimprovethetranslationperformanceofEnglishtoJapanese,weutilizemultiple parallelcorporatostrengthentherepresentationofsourcesentencewiththemethodof pivot-basedtranslation.AsillustratedinFig. 3.

InFig. 3, ① referstoatypicalNMTstructure, ② isanencoderprocess, ③ isa decoderprocess.Referredtointhereddottedlineisatypicalstructureofpivot-based translation,whichconsistsof ① and ②‚ and ③.Itisjustthepivot-basedNMTmodel wepropose.Inaddition,toenrichthesemanticrepresentation,weputthesemantic representationofthesourcelanguage(② ontheleftside)andthepivotlanguage(②

Fig.3. Theframeworkofsemanticsplicingextension(Color figureonline)

ontherightside)togethertogetanextendedsemanticrepresentation C,andthenuse c togeneratetargettranslationwiththedecoder(illustratedas ③).Thisistheother modelPBSSM(PBSSM)thatwepropose.

3.1Pivot-BasedNeuralMachineTranslationModel

Inthecaseofscarcityofbilingualparallelcorporaofsourcelanguageandtarget language,themodelhasapoorperformance(ShownindottedlinesinFig. 1).Based ontheresearchofpivot-basedmachinetranslationandneuralmachinetranslation,we combinetheiradvantagestoimprovethetranslationperformance. Consideringtheenvironmentofpivotlanguage,weintroducethepivotlanguage betweenthesourcelanguageandthetargetlanguage.Becausethereexistrichparallel corporaofthesourcelanguagetothepivotlanguageandthepivotlanguagetothe targetlanguage,whichiscrucialtothewholeprocessoftranslation.

InFig. 4, Model 1and Model 2adoptattentionNMTmodelwhichisdescribedin Sect. 2.2.Afterwe finishtrainingthetwoseparatemodels, Model 1canbeusedto translatethesourcelanguagetothepivotlanguage,andthen,use Model 2totranslate thepivotlanguagetothetargetlanguage. Model 1and Model 2aretwoseparate models,thuswetrytousetheavailablecorpusofthesourcelanguagetothepivot languageasmuchaspossible,toimprovethetranslationperformanceofthesource languagetothepivotlanguage,sodoesthewholetranslationsystem.

3.2Pivot-BasedSemanticSplicingModel

Mostneuralmachinetranslationmodelsaretrainedonbilingualparallelcorpora.When wehavemultilingualparallelcorpora,wecanmakefulluseofthemtoimprovethe performanceoftranslation.Orhan(Orhanetal. 2016)proposesanattention-based encoder-decodernetworkthatadmitsasharedattentionmechanismwithmultiple encodersanddecoders.Orhanprovesthatusingthesharedtranslationsystemwith

multipleparallelcorporacanimprovethesystem’sperformancewithlessdatasetby experiments.

Someofourdatasetsistrilingualparallelcorpora,wecanusethesemanticsimilaritybetweenparallelcorpora,andtreatbilingualparallelcorpusasinput,whichwill increasetheinputinformationandextendthesemanticrepresentation.Whatdescribed aboveisshowninFig. 5.

Toextendsemanticrepresentation,thesystemneedsbilingualparallelcorpora lan src1 and lan src2 asinput.Usingthefunction / toestablishaconnectionbetween theencodedvector c0 from lan src1 andtheotherencodedvector c00 from lan src2 ,

thuswecangetanewvector c fromfunction / thatrepresentsthesemanticofbilingual sourcelanguage.Then,usethedecodertogeneratetargetlanguagewith c.AsillustratedinEq.(3),calculatingthesemanticrepresentationisassociatedwithhidden statesoftheencoder.Whencalculatingthehiddenstates,wecreatetheconnection betweenthehiddenstate h0 of lan src1 andtheotherhiddenstate h00 ofthe lan src2 ,as displayedinEq.(5)toEq.(10),where

BeforetheforwardhiddenstatesarecalculatedbytheforwardRNN,theyare randomlyinitializedwith h0 0 I ;Calculating h0 i I ; 1 i Tx0 withEq.(9);Initializing h00 0 I

Fig.5. Thestructureofextendedsemanticvector

Another random document with no related content on Scribd:

T nondescript and beautiful species of the genus Scilla, is allied to Scilla præcox of Willdenow; but appears to differ in too many particulars, to admit of their being united. It is a native of Siberia, increases slowly by the root, but sometimes ripens seeds in this country. It commences flowering in the beginning of February, before the common Scilla bifolia; but continues in beauty long after that plant is past; and although perfectly hardy, its flowers are liable to be injured by strong frosts, unless occasionally protected. It thrives well in a light soil, and warm situation; but, like all dwarf plants, appears to most advantage in a pot: and indeed, succeeds best with the treatment usually given to alpine plants.

PLATE CCCLXVI.

GERANIUM BARBATUM. Var. Undulatum.

Bearded-leaved Geranium. Var. Waved-petalled.

CLASS XVI. ORDER IV.

MONADELPHIA DECANDRIA. Threads united. Ten Chives.

ESSENTIAL GENERIC CHARACTER.

M. Stigmata quinque. Fructus rostratus, pentacoccus.

O P. Five summits. Fruit furnished with long awns, five dry berries,

SPECIFIC CHARACTER, &C.

Geranium. Foliis pinnatis, incisuris pinnarum aristatis barbatisque, petalis omnibus flavicantibus, rubro-notatis undulatisque.

Geranium. With winged leaves, the segments aristated and bearded, all the petals yellowish, marked with red, and undulated.

D. Pinnæ foliorum inæqualiter incisæ, incissuris acuminatis, barbatisque. Scapus ramosus. Flores umbellati. Petala omnia linearia, obtusa, elongata, recurvata, valde undulata, flavicantia. basi fere ad medium lætissime rubra. Stamina fertilia quinque.

REFERENCE TO THE PLATE.

1. The Empalement.

2.The Chives and Pointal.

3.The same magnified.

4 The Pointal magnified

T plant was sent from the Cape to the collection of George Hibbert, Esq. at Clapham, where our drawing was taken in September; it is no more than a variety, although a very beautiful one, of the Geranium barbatum of this work, of which one variety has already been figured on plate 323. It is a green-house plant, and requires the same treatment as the other tuberous rooted species.

PLATE CCCLXVII.

ANAGALLIS GRANDIFLORA.

Great-flowered Pimpernel.

CLASS V. ORDER I.

PENTANDRIA MONOGYNIA. Five Chives. One Pointal.

ESSENTIAL GENERIC CHARACTER.

C 1-locularis, circumscissa. Corolla rotata. Stamina hirsuta. Stigma capitatum.

C one-celled, cut round. Corolla wheel-shaped. Chives hairy. Summit headed.

SPECIFIC CHARACTER, &C.

A, foliis ternatis cordato-ovatis acuminatis.

A, with leaves in threes heart-egg-shaped acuminated.

D. Radix annua. Rami elongati, effusi, procumbentes, angulati, superne simplices. Folia ternatim verticillata, remota, elliptica, acuminata, amplexicaulia, utrinque 3-5-lineata. Pedunculi ternatim verticillati, axillares, filiformes, primo patuli, demum sæpe recurvi, foliis duplo longiores. Calyx 4-rarius 5-phyllus, foliolis lanceolatis, acuminatis, carinatis, marginibus membranaceis. Corolla 5-rarius 4-petala, petalis basi confluentibus, patulis, orbiculatis, coccineis, basi intus nigris. Filamenta 5 hirsuta, atropurpurascentia, petalis multo breviora. Germen pallidum. Stylus pergracilis purpureus antheras luteas superans. Stigma simplex, capitulatum, viride.

REFERENCE TO THE PLATE.

1. A peduncle and calyx.

2.The corolla spread open.

3 The seed-bud and pointal, and summit magnified T new and elegant species of Anagallis, the largest and most showy of that genus hitherto discovered, was introduced into England, we believe, in the last year; but by whom, or from what country, we have not yet satisfactorily ascertained. It is reported to be of African origin, and to have come to England from the Paris garden. Our figure was made from a plant

trained up near three feet high, in Lady De Clifford’s collection at Paddington, where it is treated as a green-house plant. We have not yet seen it produce good seeds, although apparently an annual plant; but it is easily increased by cuttings in the usual way. In every thing except size, and in having more entire petals, it very much resembles Anagallis arvensis; a plant truly remarkable for being the only one indigenous to Britain (the Poppies excepted) with scarlet flowers.

PLATE CCCLXVIII.

MELANTHIUM MASSONIÆFOLIUM.

Massonia-leaved Melanthium.

CLASS VI. ORDER I.

HEXANDRIA TRIGYNIA. Six Chives. Three Pointals.

ESSENTIAL GENERIC CHARACTER.

C 0. Corolla infera, 6-petala, petalis staminiferis.

N C. Corolla beneath, 6-petalled, with the petals staminiferous.

SPECIFIC CHARACTER, &C.

M, foliis subrotundis prostratis sulcato-striatis, floribus spicatis.

M, with roundish prostrate sulcato-striated leaves, and spiked flowers.

D. Folia duo humi appressa, subrotunda, viridia, acumine obsoleto recurvato, striisque sulcatis parallelis circiter 12; subtus glabra pallidiora. Flores in spica perbracteata ut in Eucomide. Scapus clavatus teres. Bracteæ ovato-acuminatæ, magnæ; superiores paulo minores. Flores sessiles, bracteis multoties breviores, virides. Corolla hexapetaloidea, vix aperta, petalis sublanceolatis, obtusis, erectis, apicem versus incurvis. Filamenta brevissima, compressa, collo corollæ imposita, basi confluentia. Antheræ erectæ obsoletæ. Germen alato-triangulare, desinens in stylos 3 obsoletissimos, subulatos, et fere adnatos.

REFERENCE TO THE PLATE.

1. A floral leaf.

2.The corolla cut open.

3 A back view of the same

4. The seed-bud and obsolete styles T singular plant appears to us to be a new, but somewhat anomalous species of the Genus Melanthium; which, as it at present stands, unquestionably contains several Genera. It is a native of the Cape, and a Green-house plant; and prior to flowering possesses altogether the

appearance of a Massonia, and thrives very well with the treatment of one. Our drawing of it was taken from fine plants in the Hibbertian collection in the month of March.

PLATE CCCLXIX.

EUCOMIS PURPUREOCAULIS.

Purple-stalked Eucomis.

CLASS VI. ORDER I.

HEXANDRIA MONOGYNIA. Six Chives. One Pointal.

ESSENTIAL GENERIC CHARACTER.

C infera, 6-partita, persistens. Filamenta nectario adnata.

C beneath, 6-parted, persistent. Chives conjoined to the base of the corolla, forming a nectary.

SPECIFIC CHARACTER.

E, scapo clavato, foliis multifariis expansis orbiculato-spatulatis.

E, with a clavated scape, leaves pointing many ways expanded orbicular-spatula-shaped.

D. Radix ut in affinibus. Folia 5-7, multifaria, expansa, demum prostrata, orbiculato-spatulata, vel subinde multo angustiora, viridia, obsolete sulcato-lineata et lucida, marginibus minute cartilagineis, glabriusculis; subtus pallidiora, lucidiora, magisque sulcata. Scapus claviformis, perbrevis, crassus, atro-purpureus; intra flores valde contractus, viridis, purpureoque punctatus. Flores spicati, conferti, sessiles, sæpe adscendentes, unibracteati. Bracteæ imæ obcuneatæ, subrecurvæ, submembranaceæ, et sæpe purpurascentes; sensim minores; summæ longiores, lineari-lanceolatæ, purpureo-marginatæ, steriles; in coronam foliolorum perelegantem supra flores collectæ. Corolla hexapetaloidea, petalis subæqualibus, lineari-oblongis, vix attenuatis, viridibus. Filamenta 6, basi petalorum valde connata, subulata, compressa; superne incurvata. Antheræ flavescentes; post florescentiam fuscæ, pendulæ, ad apices petalorum vix attingentes. Germen sulcato-triangulare. Stylus flexuosoadscendens, teres, vix subulatus, filamentis multo brevior. Stigma nullum sive inconspicuum.

REFERENCE TO THE PLATE.

1. One of the lower floral leaves.

2 The corolla cut open

3. The seed-bud and pointal

W find no account of this fine plant in any publication we have consulted: it is closely allied to Eucomis regia, but differs sufficiently from that species in the shape of its leaves, and the smoothness of their margins. It is a greenhouse plant, and was lately introduced from the Cape by G. Hibbert, Esq. from a plant in whose collection our drawing was taken in the month of March.

PLATE CCCLXX.

POLYGALA TERETIFOLIA.

Cylindric-leaved Milkwort. CLASS XVII. ORDER III.

DIADELPHIA OCTANDRIA. Two Brotherhoods. Eight Chives.

ESSENTIAL GENERIC CHARACTER.

C 5-phyllus; foliolis duobus alæformibus, coloratis. Legumen obcordatum, biloculare.

C 5-leaved; with two of the leaves like wings, coloured. Pod inverse heart-shaped, two-celled.

SPECIFIC CHARACTER.

P, floribus cristatis, racemis terminalibus paucifloris, alis calycinis ovatis acutiusculis multinerviis, caule fruticoso, foliis linearisubulatis. Willd. Sp. Pl. 882.

P, with cristated flowers, racemes terminal few-flowered, calyxwings ovate acutish many-nerved, shrubby stem, and linear-awl-shaped leaves.

D. Ramuli filiformes, patuli, canescentes. Folia sparsa, sæpe conferta, recurva, et falcata, linearia, obtusa, marginibus revolutis, ut in Erica; supra canescentia: subtus cana. Racemi 2-5 flori. Pedunculi pubescentes.

REFERENCE TO THE PLATE.

1. A leaf.

2 The under surface of the same magnified

3.The exterior part of the cup.

4.One of the wing-like leaves of the cup, outside.

5.The same inside.

6 The chives, keel and banner spread open

7.The keel and its crest detached.

8.The same magnified.

9 The chives and banner

10 The same magnified

11.The seed-bud and pointal.

12.The same magnified.

O plate represents the true species of Polygala, which we promised in our account of P. stipulacea. It is a green-house shrub, and rather delicate; yet may, with care, be propagated by cuttings; but is at present very scarce in this country. Its native country is the Cape. Our drawing was taken from a plant in the Clapham collection in the summer of 1803.

PLATE CCCLXXI.

POLYGALA ALOPECUROIDES.

Fox-tail Milkwort.

CLASS XVII. ORDER III.

DIADELPHIA OCTANDRIA. Two Brotherhoods. Eight Chives.

ESSENTIAL GENERIC CHARACTER.

C 5-phyllus, foliolis duobus alæformibus, coloratis. Legumen obcordatum, biloculare.

C 5-leaved, with two of the leaves like wings, coloured. Pod inverse heart-shaped, two-celled.

SPECIFIC CHARACTER, &C.

P, floribus imberbibus, pedanculis solitariis axillaribus, foliis fasciculatis ovatis mucronatis margine ciliatis. Willd. Sp. Pl. 890.

P. floribus imberbibus lateralibus, foliis fasciculatis lanceolatis mucronatis villosis. Thunb. Prod. 121.

P, with flowers beardless, peduncles solitary axillary, leaves fascicled egg-shaped mucronated and ciliated on the margin.

P. with flowers beardless lateral, leaves fasciculated lance-shaped mucronated and villose.

D. Suffrutex elegans, ramulis hirtis. Folia valde conferta, fasciculata, pone medium recurva, 5-6 in singulo fasciculo, infimo majore, latiore, stipuliforme; omnia lineari-lanceolata, mucronata, hirta, ad margines valde ciliata. Flores axillares, sessiles, solitarii, minuti, purpurascentes, serrulati.

REFERENCE TO THE PLATE.

1. The empalement magnified.

2.The keel magnified.

3 One of the wings magnified

4.The chives and pointal magnified.

5. The pointal detached and magnified

T is the Polygala of the Heisteria family, which we last month engaged to lay before our readers. They will now have an opportunity, from our copious dissections of the flowers, of judging of the great and numerous generical differences which exist between a genuine Polygala and the discarded genus Heisteria. All the Heisteriæ we have yet had an opportunity of examining are heptandrous; all the true Polygalæ octandrous: but these are the least of their distinctions.

The Fox-tail Milkwort is a very elegant shrub of the green-house kind, and is often in flower. It was recently raised from Cape seeds in the Clapham collection, and is at present, we believe, in no other: thrives well in a mixture of bog earth and loam, and is capable of propagation by cuttings.

PLATE CCCLXXII.

MIMOSA PURPUREA.

Soldier Bush Mimosa. CLASS XXIII. ORDER I.

POLYGAMIA MONOECIA. Various Dispositions. Upon one Plant.

ESSENTIAL GENERIC CHARACTER.

H. Calyx 5-dentatus. Cor. 5-fida. Stam. 5 seu plura. Pist. 1. Legumen.

Mascul. Calyx 5-dentatus, Cor. 5-fida. Stam. 5-1O, plura.

H. Cup 5-toothed. Blos. 5-cleft. Chives 5 or more. Pointal one. A Pod.

Male. Cup 5-toothed. Blos. 5-cleft. Chives, 5, 10, or more.

SPECIFIC CHARACTER, &C.

M, inermis, foliis conjugatis pinnatis, foliolis intimis minoribus. Linn. Sp. Pl. ed. 3. p. 1500.

M. foliis tergeminis. Plum. Ic. t. 10. f. 2.

M, unarmed, with leaves conjugate prinnate, and the inner leaflets smaller.

M. with leaves three times twinned.

REFERENCE TO THE PLATE.

1. A single flower.

2.The cup.

3 The blossom

4. The seed-bud and pointal

T Mimosa purpurea is a native of the West Indies, and is there known by the expressive appellation of Soldier Bush; from the plants being sometimes almost covered with their bright red-purple flowers, in which state it is said they are visible, and even cognizable, on the sides of hills, at the distance of a mile.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.