Constructing DOM hierarchy from real-world website screen captures by IRJET Journal

Constructing DOM hierarchy from real-world website screen captures

Thisaranie Kaluarachchi

University of Colombo School of Computing, Colombo 0700, Sri Lanka

Abstract - Web UI/UX developers typically devote significant effort to designing and developing a desired website, as the website design is constantly reviewed and modified by end users or stakeholders. This repetition is a time-consuming and labor-intensive step in website development In this paper, a novel automatic DOM hierarchy construction approach based on computer vision and machine learning is proposed to reduce the overhead of the iterative web UI prototyping process by creating the DOM for a website, which can eventually be translated into various front-end web frameworks. The proposed approach constructs the DOM hierarchy from a screen capture of a real-world website as a website mock-up design artifact. It detects and classifies web GUI elements in a website mock-up design artifact using computer vision techniques and a CNN-driven image classifier. To construct the DOM hierarchy for a website, heuristic algorithms are proposed that arrange classified web GUI elements in a tree structure. An in-depth analysis of the approach reveals that it is capable of (I) accurately detecting and categorizing web GUI elements in a screen capture of a real-world website and (II) generating DOM hierarchies similar to those designed by web developers.

Key Words: websitemock-updesignartifact,webGUIelementsdetection,webGUIelementsclassification,heuristicalgorithm, automaticDOMhierarchyconstruction

1. INTRODUCTION

The DOM construction is an Application Programming Interface (API) for documents written in HTML and XML markuplanguages.Itisanessentialpartoffront-endwebdevelopment.Itprovidesawayforprogrammerstointeractwithand manipulate the structure of a website. With the help of DOM, web developers can access and modify different parts of a webpage.Thefinalmock-updesignsketchmaynotalwaysaccuratelyrepresentthelookandfeelofafinalwebpagethatthe clientexpects.IfthewebUI/UXdesignerdoesnotrefineamock-updesignsketchuntilitreachesthedevelopers,incorrect assumptionsmaybemadeduringtheiterativeprototypedevelopmentphase.Finally,thesketchedprototypeofthewebsite maynotsatisfytheclient’sexpectations.Themock-updesignsketchleadstoiterationsofdifferentcopiesofthedesignsketch tofine-tunetheprototypeuntilanagreementisreachedwiththeclient.FunctionalGUIcodeshouldbeiterativelymodified withtheDOMstructureaccordingly.

WebUI/UXdeveloperstypicallydevotesignificantefforttodesigninganddevelopingadesiredwebsite,asthewebsite design is constantly reviewed and modified by end users or stakeholders. This repetition is costly in terms of time and resources. As a result of these challenges, as well as the ever-changing web development trends with front-end code automation,anautomaticwebsitegenerationconcepthasemerged[1],partiallytoautomatewebdesignanddevelopment workflow.Automaticwebsitegenerationisaversatilesolutiontominimizetheburdenonthewebdesignanddevelopment processwhiledramaticallyminimizingwebdevelopmentcosts[2-4]andaccidentalmistakes[5].

According to a recent literature review on automatic website generation [1], researchers have invented several strategiestoautomatetheUIprototypingaspectinthewebsitedevelopmentprocess,suchasGUIelementclassificationfrom visualdesigns,DOMhierarchyconstruction,andHTMLcodegeneration.However,automatingtheUIprototypingaspectofthe websitedevelopmentprocessisstillabigchallenge.Thisleadstothemotivationofthisresearchwork,whichexplainsanovel automaticDOMhierarchyconstructionapproachfromscreencapturesofreal-worldwebsitestoreducetheburdenandtimewastageoftheiterativewebUIprototypingprocesstosomeextent.

Assuch,thisresearchpresentsandalgorithmtoconstructtheDOMofawebsitefromthemock-ups.Fortheresearch baseline,threestepsareconsideredtobeabletocreateafunctionalapproach:(I)identifyGUIelementstorealizeawebsite screen capture, (II) determine web GUI elements and their related attributes, and (III) generate the corresponding DOM hierarchyasthelaststep.Thesestepsareorganizedinapipeline,andeachstepcanbemodifiedindependently,facilitatingthe development and experimentation of different modules The DOM construction approach presented in this paper was comparedwiththestate-of-the-artwebpagegenerationtools Sketch2Code [14]and pix2code [2].Comparingthecodegenerated withtheDOMconstructionis,therefore,valid,andtheresultsindicatethattheapproachproducesDOMhierarchiescloserto thetargetDOMhierarchiesthan Sketch2Code and pix2code

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

Theremainderofthispaperisorganizedasfollows:Section2conductsaliteraturereviewofthecurrentstate-of-theartmethodsresearchershaveengagedintoautomatethewebUIprototypingprocess.Section3focusesonthedesignscience research implementation of the proposed DOM hierarchy construction approach. Section 4 evaluates the approach by structurallycomparingconstructedDOMhierarchiestoactualDOMhierarchiesconstructedbyprofessionalwebdevelopers. Section5expandsonthediscussionofresults,andfinally,Section6concludestheresearchbylookingatthelimitationsand futuredirectionsoftheproposedDOMhierarchyconstructionapproach.

2. LITERATURE REVIEW

Withwebsitedevelopmenttrendschangingyearly,theautomaticwebsitegenerationtrendassistsUI/UXdevelopersin automatingaspectsofthewebUIprototypingprocess,providingaversatilesolutiontosignificantlyreducetheburdenandtime wasted in web design and development processes [1]. Two strategies researchers have invented to automate the web UI prototypingprocessareconsideredtomotivatetheproposedautomaticDOMhierarchyconstructionapproach. Theyareweb GUIelementsextractionandclassificationfromvisualdesigns[2,3,6-15]andDOMhierarchicalconstruction[2,6,7,11,12,14,15].

2.1 Web GUI Elements Extraction and Classification

The related works show recent methods of extracting and classifying web GUI elements via computer vision (CV) and machinelearningtechniques.ClassicalCVtechniquesdetectandclassifythepositions,sizes,andtypesofwebGUIelementsfrom visualdesigns.Thestudies[6,7]utilizevariousCVtechniques,suchascontourdetection,edgedetection,colourfiltering,andline detection, to robustly classify web GUI elements, such as images, texts, titles, buttons, menus, inputs, etc., in an image. CV techniquesfacetheproblemofbeingextendedwithnewGUIelements,whichischallengingandtime-consuming.

Withtheadvancementinmachinelearning,theemergenceofnewfamiliesofmachinelearningmethodswasrequired.Lately, the Convolutional Neural Network (CNN) has been immensely popular due to the introduction of deep learning. CNN is a supervisedmachinelearningapproachthatcanautomaticallylearnrobust,salientimagecategoryfeaturesfrommanylabelled trainingimages.Therefore,theneuralnetworkisusedtoautomateimageclassificationforwebsites.Thestudies[2,3,8-15] developedvariousCNN-poweredvisionmodelstodetectandclassifywebGUIelementsfromvisualdesigns.

DOMhelpsdevelopersaccessandchangedifferentpartsofthepage,suchasheadings,paragraphs,images,andforms. Therefore,theDOMcanbethoughtofasawaytoorganizeandcontrolalltheelementsthatmakeupawebpage.IntheDOM, everyelementonawebpage,likeaheadingoraparagraph,istreatedasanobject.Usingobjects,theDOMprovidesaconsistent andorganizedwayfordeveloperstointeractwithwebpageelements.Therefore,classifyingthewebGUIelementsofawebpage isanimportantstepinautomaticallyconstructingaDOMhierarchyfromawebsitemock-updesignartifact.

2.2 Document Object Model (DOM) Hierarchical Construction

TherelatedworksshowthemethodsofconstructingasuitableDOMhierarchyforawebsitewithpropergroupingofweb GUIelementsinwebGUIcontainers.Beltramelli[2]usesaDomainSpecificLanguage(DSL)languagemodelbasedonaShortTerm Long Memory (LSTM) network to encode classified web GUI elements and their semantic relationships into an intermediate DSL representation that represents the DOM hierarchy of the website. Alexander Robinson [6] introduceda recursivealgorithmthatrecursivelygroupsnestedwebGUIelementstobuildpredefinedwebGUIcontainers.Finally,the hierarchicaltreestructureofthewebsiteisbuiltbyrecursivelygroupingtheclassifiedwebGUIcontainers.Nguyenetal.[7] introducedamethodtodetectandgroupmobileGUIelementsintocontainersthatgeneratehierarchicalsub-trees.Then,theUI hierarchyformobileapplicationsisgraduallyconstructedbymergingthosehierarchicalsub-trees.

Yunetal.[12]proposedamethodtoconstructthehierarchicalstructureofaUIsketchasanXMLfileutilizingtheYOLOdeep neuralnetwork,whichextractedthestructuralanddescriptiveinformationofUIelements.Microsoft Sketch2Code [14]usesa layout construction algorithm to construct the DOM hierarchy of a website. Ferreira et al. [11] introduced a hierarchy reconstructionalgorithmthatiterativelyintegratesclassifiedwebGUIelementsandcontainerstobuildtheDOMhierarchyfora website. Similarly, to build the GUI skeleton from mobile design mock-ups, Chen C. et al. [15] utilized a Recurrent Neural Network(RNN)encoderandanRNNdecoderwithvisualfeaturesofmobileGUIelementstoproducetheGUIskeletonfrom tokensequencerepresentation,andeventuallyconvertedtoaUIhierarchythroughDepthFirstTraversal(DFT)algorithm.

Although generating a hierarchical UI representation is an active research field as suggested by these breakthroughs, constructingtheDOMfromvisualinputsisstillanearlyunexploredresearcharea.However,someresearchleveragesmachine learningtoconstructtheDOMwhereassomemethodsrelyonheuristicalgorithms.Ourresearchpresentsanothermethodof constructingtheDOMfromascreencaptureofawebsiteincollaborationwithmachinelearningandheuristicalgorithms.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

3. APPROACH DESCRIPTION

The research study aims to develop a novel automatic DOM hierarchy construction approach that constructs DOM representationsofwebsitesfromscreencapturesofreal-worldwebsites.Therefore,designscienceresearchmethodologyis followedtodesignandinvestigateanovelautomaticDOMhierarchyconstructionapproachasanartifact. Firstly, aseriesof imageprocessingtechniques(Canny’sedgedetectionalgorithm,morphologicaloperationsetc.)isappliedtoinfertheboundaries ofwebGUIelementsandtoextractimagesofeachGUIelementfromwebsitescreencaptures.Theoutputofthisphaseisasetof X-axisandY-axiscoordinatesonawebsitescreencapture.Similarly,basedonthederivedX-axisandY-axiscoordinates,a collectionofwebGUIelementimagescroppedfromtheinputwebsitescreencaptureisgeneratedandfedtothenextphasetobe classifiedintowebdomain-specificelementcategories

AfteridentifyingtheboundingboxesofwebGUIelementsinawebsitemock-updesignartifact,thecroppedimagesofthose elementsareclassifiedintotheirrespectivewebdomain-specificcategories.Forthat,aCNNimageclassifieristrainedtoclassify the images of web GUI elements However, no dataset contains labelled atomic web GUI elements. Therefore, a datasetis generated,includingallstructuralinformationnecessarytoconstructanartifacttodetectandclassifywebGUIelements.Screen capturesofcurrentreal-worldwebsitesareconsideredandcategorizedintothreedatasetsusedbytheproposedCNNtotrain, validateandtesttheimageclassifier.Astheinitialprototypeofthisapproach,pureHTML5andCSS-supportedwebsitesare attempted,and620real-worldwebsiteblocksarediscoveredfromHTMLwebsitetemplateprovidersontheinternetasof December2022.

WebsiteblocksdevelopedinpureHTML5andCSSaremanuallyfilteredout,excludingHTML5andCSS-supportedwebsite blocksdevelopedinframeworkssuchasBootstraporPHP,websiteblockswithnon-standardwebGUIelements,suchasgaming websites,tendtousenon-standardinteractivewebGUIelementsthatcanbecomplicatedtocategorize,andduplicatewebsites thatbelongtomorethanonecategory.Similarly,whileonlineadvertisementsarearegularfeatureofmostprominentwebsites, theyaremanuallyblockedwhenscreencapturesarecapturedtopreventdetectingnon-nativeelementssuchasflash,videosand frames.ThewebsitescreencapturesarecapturedusingPhantomJsinMATLAB,athird-partyscreenshotcapturermodule,or with browser extensions suchas GoFullPage for theGoogleChrome browserand Lightshot screenshottool for the Firefox browser.Tomaintaintheconsistencyofheightandwidthofallscreencaptures,thewebsitesdisplayinginlandscapeorientation werereviewed,andtheirscreencaptureswereverifiedataresolutionof1280x1024.

TotraintheproposedCNNimageclassifier,alargesetofimagesofGUIelementscroppedfromscreencapturesofreal-world websiteblocksannotatedwiththeirappropriatewebdomain-specifictypesisrequired.Forthat,awebscrapingtoolisusedto manuallytraversetheGUIofthetargetwebsiteinamannersimilartotheDepthFirstSearch(DFS)algorithmtocapturethe boundingrectanglecoordinates,givingthepreciselocations,width,andheight,andmetainformationofeachGUIelement,which arethenstoredinXMLformat.ThisinformationisutilizedinMATLABtoextractindividualsub-imagesofeachwebGUIelement displayedonthewebsitemock-updesignartifactssavedinthedataset.Finally,theindividualsub-imagesofeachGUIelement aremanuallylabelledintotheirrespectivewebdomain-specifictypeswithpreciousaccuracy.Thisprocessproducedadataset containingscreencapturesof620distinctreal-worldwebsiteblockswith33,120sub-imagesofwebGUIelementsfrom12most prominentnativeclasses.

ThewebGUIelementimagesdatasetissegmentedintotraining,validation,andtestingdatasetsonascaleforusewiththe suggestedCNNimageclassifier.Thesegmentationmethodologydescribedin[16]researchstudyisfollowedtosegmentwebGUI elementimagesintotraining,validation,andtestingdatasetsinproportionsof75%,15%,and10%,respectively. TheCNN architecturesuggestedissimilartothearchitectureofAlexNetCNN[17],whichisoneofthemostcommonlyusedconvolutional neuralnetworksforimageclassification.Ithastwofewerconvolutionallayers(3insteadof5). ThesuggestedCNNisdeveloped inMATLAButilizingthreetoolkits:NeuralNetwork[18],ComputerVision[19],andParallelComputing[20]. Aftertrainingthe CNNimageclassifier,newunseenimagesofwebGUIelementscanbefedtotheclassifier,resultinginaseriesofclassification scorescorrespondingtoeachelementclass.Withtheproposedapproach,thewebGUIelementclasswiththehighestdegreeof confidenceisassignedtobetheappropriatelabelforagiventargetimage.Section4evaluatestheclassificationprecisionofthe proposedCNNimageclassifierusingthewebGUIelementsdatasetdescribedinthissection.

ThefinalstageintheapproachistobuildtheDOMhierarchyofawebsitewiththepropergroupingofwebGUIelementsin web GUIcontainers. HTML5 offerssixsemantic layoutcomponents(header, nav, section,article,aside,andfooter)asGUI containers to define different parts of a webpage. Therefore, an algorithm is designed to derive HTML5 semantic layout componentsbygroupingcorrespondingGUIelements.ThealgorithmrequireseachGUIelement'scoordinates,width,andheight in the input website image. The algorithm traverses the image along the X- and Y-axis directions to draw circumscribed rectangles,whicharerectanglesthattouchatleastonepointoftheGUIelementwitheachofitsslides,whileallpixelsoftheGUI elementcontainedwithinaninternalareaoftherectangleoritsborders.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

AcircumscribedrectangleRectiisdefinedasapairofcoordinatesthatsatisfythefollowingexpression:topleftcorner(x1r, y1r)andbottomrightcorner(x2r,y2r).Assumethatthecoordinatesofapixel p aredenotedbyxpandyp,andtheimageofaGUI elementisdenotedbyelei (topleftcorner(x1,y1)andbottomrightcorner(x2,y2)).

(x1r, y1r), (x2r, y2r): p elei (x1r ≤ xp

˄ p elei (x

)

EvenifawebGUIelementbelongstomorethanonecircumscribedrectangle,everywebGUIelementshouldbelongto exactlyonecircumscribedrectangle.Therefore,circumscribedrectanglescanbeassumedaswebGUIcontainerscontainingany sequentialwebGUIelements.Inconstructingcircumscribedrectangles,rectangleswithintersectioncanbeignoredbecauseitis assumedthatmostwebGUIcontainersinsimpleHTML5andCSS-supportedwebsiteblocksmaynotoverlap.Ifacircumscribed rectanglesatisfiesthecriteriaof(I)notintersectingothercircumscribedrectanglesand(II)beingstructurallyidenticaltooneof thepredefinedHTML5semanticlayoutcomponents,itisextractedasawebGUIcontainerwithitsassociatedelements.The algorithmisprovidedwithalistofstructuralinformationofpredefinedHTML5semanticlayoutcomponentstoensurethe derivedcontainersareunique.ThelistofHTML5semanticlayoutcomponentscomprisesstructuralinformationofsemantic layout components (x, y coordinates, width, and height) and their corresponding categories; the structural information is collectedusingawebscrapingtool,andthecorrespondingcategoriesaremanuallylabelled.Finally,itgeneratesacollectionof HTML5semanticlayoutcomponentsintheinputwebsiteimage,alongwitheachlayoutcomponent’sstructuralinformation, categorynames,andGUIelementcomposition.Thepseudo-codeofthealgorithmcanbeexploredinAlgorithm1asfollows.The resultofthealgorithmisdepictedinFigure1andFigure2

Algorithm 1: Web GUI Container Derivation Algorithm

Input: elements -element1(x1,y1,width1,height1)…………elementn(xn,yn,widthn,heightn)and HTML_containers -alist ofpredefinedHTML5semanticlayoutcomponents

Output: containers (alistofcontainerswithassociatedelements)

1: function CONTAINERDERIVATION(elements)

2: containers  []

3: temp.containers  []

4: for all do

5: xa min( ,

6: ya min( ,

7: xb  max( + ),( )

8: yb  max( + ),( ) 9: R  rectangle(xa,ya)(xb,yb)

10: if R then

11:  rectangle(xa,ya)(xb,yb)

12: continue

13: end if 14: temp. containers [] R

15: end for 16: for all do

17: if =NULL

18: if  belongsto HTML_containers then

19: containers 

20: end if

21: end if

22: end for

23: return containers

24: end function

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

1: Examples of Subsets Identified as Containers on a Website Block

2: Examples of Subsets That Not Identified as Containers on a Website Block

Accordingly,allsubsetsofwebGUIelementswithcircumscribedrectanglesinblue,asshowninFigure1,arecontainersand correspondtopredefinedHTML5semanticlayoutcomponents,whereassubsetswithcircumscribedrectanglesinred,shownin Figure2,arenotcontainersbecauseofintersectionsbetweenrectanglesanddonotcorrespondtoanyofthepredefinedHTML5 semanticlayoutcomponents.

OncetheGUIcontainercomponentsarederived,analgorithmisdesignedtoconstructtheDOMhierarchyofawebsitewith theproperarrangementofthederivedwebGUIcontainersandelements.Forthat,aniterativeDOMhierarchyconstruction algorithmisdesignedtoderivearealisticDOMhierarchyforawebsitegraduallybyutilizingderivedwebGUIcontainersas HTML5 semantic layoutcomponents, in which successive web GUI elementsare grouped, and then constructing the DOM hierarchywiththeproperlayoutofwebGUIelementsandcontainers.

Accordingtotheconceptofchild,parent,leaf,andinternalnodesingraphtheory,forinstance,ifN1 andN2 arewebGUI elementsandthecoordinatesofN1aregreaterthanthecoordinatesofN2(N1>N2),thenN2isachildofparentN1.Accordingly, nodes can be classified into two types based on their orientation property: definite and indefinite. A node is definite if its orientationcanbeuniquelydefinedashorizontalorvertical,whereasanodeisindefiniteifithasbothhorizontalandvertical orientationsimultaneouslyorcannotbedetermined.Thestructureofawebsitescreencaptureisatreeinwhicheachelement correspondstoadefinitenodewithaparent-childrelationship,andeachelementmustbeconnectedtoexactlyoneparent, exceptfortherootnode,whichhasnone.TheDOMstructureofa websitefromascreencaptureisconstructedusingthat conceptbysuggestinganiterativeDOMhierarchyconstructionalgorithmwhereGUIcontainersareconstructedassub-trees, andthefinaltreeisconstructedbyhierarchicallyarrangingthosesub-trees. Thepseudo-codeofthealgorithmcanbeexplored in Algorithm 2 asfollows.TheresultofthealgorithmisdepictedinFigure3.

Algorithm2:DOMHierarchyConstructionAlgorithm

Input: containers -acollectionofderivedHTML5semanticlayoutcontainersforawebsitemock-updesignartifact withtheirstructuralinformation,categoryname,andwebGUIelementcomposition

Output: |containers|==1and containers1 -therootnodeofthetreecontainingallwebGUIcontainersandwebGUI elementsforawebsitemock-updesignartifact

1: function DOMCONSTRUCTION(containers)

2: for all do

3: for all do

4: orientation  .orientation

5: if orientation isHorizontal then

6:  selecttheelementtotherightof

7: else if orientation isVertical then

8:  selecttheelementtothebottomof elementi+1

9: if isnotNULL then

Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

10: if is definite then 11: make achildof 12: else

13: newElement  new Element

14: make achildof newElement

15: make achildof newElement

16: newElement.orientation orientation

17:  newElement

19: i  i +1

20: end if 21: end if 22: end for

23: k  k +1

24: end for

25: for all do

18:  \

26: orientation  orientation

27: if orientation isHorizontal then

28:  selectthecontainertotherightof

29: else

30:  selectthecontainertotheleftof

31: end if

32: i  i +1

33: end for

34: return containers1

35: end function

ThealgorithmintendstoiterativelyprocesseachwebGUIcontainertoconstructsub-treeswith parent-child relationshipsof elementsbymergingeachwebGUIelementintoadjacent definite webGUIelementsateachstage.Thehierarchicalarrangement ofsuccessiveGUIelementswithineachGUIcontainerisperformedbyevaluatingtheirorientationaccordingtothegraphtheory. UsingMATLAB,hierarchyconstructionheuristicsaredevelopedtobuildsub-treesforwebGUIelementstobeproperlyplacedin GUIcontainers.Initially,thealgorithmproducesacollectionofsub-treeswithrelatedwebGUIelementsarrangedhierarchically. ThealgorithmisfurtherextendedtoconstructarealisticDOMhierarchyforawebsitebyiterativelyarrangingtheconstructed sub-treeslikeatree.Thealgorithmterminateswhenexactlyonecomponent(containers1)isconstructedastherootrepresenting atreestructureofwebGUIcontainersandwebGUIelements,eachwithacorrespondingname(seeFigure3).

Figure 3: A Sample of DOM Hierarchy Constructed as a Tree Structure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

4. RESEARCH EVALUATION

Thesuggestedapproachisevaluatedontwovalidationstrategies:(I)theaccuracyoftheCNN-drivenclassifieronwebGUI elementimagesclassificationand(II)thesimilarityoftheDOMhierarchiesconstructedbytheapproachtorealDOMhierarchies constructedbywebdevelopers.Fortheexperimentoftheapproach,thestudyconsistsof(I)asetof33,120labelledimagesof webGUIelementsextractedfrom620distinctwebsitescreencapturestoassesstheaccuracyofCNN-drivenwebGUIelement classifiers,(II)64additionalwebsitescreencaptures(excludedfromthedatasetusedtotrainandtesttheCNNimageclassifier) toserveaswebsitemock-updesignartifacts,and(III)twolatestapproachesonautomaticwebsitegeneration: Sketch2Code [14] and pix2code [2].

4.1 CNN-Driven Image Classifier Effectiveness

Table1showsthedatasetsutilized,brokendownbywebGUIelementclass.ThesuggestedCNNwastrainedonthetraining datasetandverifiedonthevalidationdataset.AlltheunseenwebGUIelementimagesinthetestingandvalidationdatasetswere collectedfromthescreencapturesofrealwebsiteblocksandweredistinctfromthoseinthetrainingdataset.Theeffectiveness oftheproposedCNN-drivenimageclassifierisevaluatedbymeasuringtheaveragetop-1classificationprecisionacrossallweb GUIelementclassesinthetestdataset,asshownbelow.

Here, True Positives are instances where the CNN image classifier correctly predictspositive web GUI element classes, whereas False Positives areinstanceswheretheclassifierincorrectlypredictsnegativewebGUIelementclasses. Section5 describestheclassificationcapabilitiesoftheCNN-drivenimageclassifier,whicharerepresentedinconfusionmatrixwith precisionacrossclasses.

Table 1: Distribution of Labelled Web GUI Element Images on Training, Validation and Testing Datasets

4.2

DOM Hierarchy Construction

ThesimilarityofDOMhierarchiesconstructedbytheapproachwasmeasuredbycomparingtoagroundtruthsetofDOM hierarchiesandasetofDOMhierarchiesconstructedbytwobaselinewebsitegenerationapproaches: Sketch2Code [14]and pix2code [2].Forthispartofthestudy,64websitescreencaptureswereselectedfrom620distinctscreencapturesofreal websitesbyrandomlyselectingfourwebsiteblocksfromeachcategorytoutilizeasinputstoourapproach, Sketch2Code,and pix2code approaches,fromwhichDOMhierarchiesweregenerated.To comparetheDOMhierarchiesconstructedbyeach approach, the runtime DOM hierarchies extracted dynamically from the input websites for each approach using Selenium WebDriver are compared to the set of ground truth DOM hierarchies extracted from the original websites. The Selenium WebDriver representation of the website reflects the automatically generated GUI-related source code for each approach

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

displayedatruntimeonthedevicescreen.ItcanbeusedtoaccuratelycomparetheDOMhierarchyconstructionofwebGUI elementsandcontainersforeachapproach.

TheruntimeDOM-hierarchiesofallgeneratedprototypewebsitesforeachapproacharecomparedtotheoriginal,ground truthruntimeDOM-hierarchies(extractedfrom Selenium WebDriver)bydeconstructingtheDOMtreesusingpre-ordertraversal andusingtheWagner-Fisheralgorithm[21]implementationof Levenshtein editdistancetocalculatethesimilaritybetweenthe DOMhierarchicalrepresentationsoftheruntimeGUIs.TheDOMhierarchiesaredeconstructedsuchthattheelementtypeand nestedorderofGUIelementsareincludedinthehierarchydeconstruction.Thepre-ordertraversalisimplementedinthiswayto avoidsmalldeviationsinotherattributesincludedinthe Selenium WebDriver information,suchaspixelvalues,giventhatthe maingoalofthisevaluationistomeasurehierarchicalsimilarities.Threetraditionaleditoperationssupportedby Levenshtein distance:insertion,deletion,andsubstitutionareconsideredinthemeasurementofeditdistance.Tomorecompletelymeasure the similarity of the DOM hierarchies of prototype websites to the ground truth DOM hierarchies, a weighting schema is introducedthatrepresentsapenaltyvalueforeachtypeofeditoperationrequiredtotransformoneGUIelementofaDOMinto another,wherein,inthedefaultcase,eacheditoperationcarriesanidenticalweightof1/3.

5. RESULT DISCUSSION

5.1 Results:

CNN-Driven Image Classifier Effectiveness

To evaluate the results produced by the CNN-driven image classifier, a confusion matrix is presented to illustrate the classificationprecisionacrossthe12mostpopularwebGUIelements.Table2showstheconfusionmatricesgeneratedbythe CNN-drivenimageclassifier.TheX-axisofaconfusionmatrixrepresentsthewebGUIelementclasspredictedbytheclassifier, andtheY-axisrepresentsitsactualwebGUIelementclass.Similarly,thefirstcolumnoftheconfusionmatricesrepresentsthe totalnumberofeachwebGUIelementinthetestdataset,whilethenumbersinthesubsequentcolumnscorrespondtothe percentageofeachclassontheY-axisthatwereclassifiedaswebGUIelementsontheX-axis.Thus,thediagonalofthematrix, highlightedingreen(seeTable2),correspondstothetrueclassification(i.e., True Positives)ofeachwebGUIelementclass, whereas False Positives areshownintheothercells.

Table 2: Confusion Matrix for CNN-driven Image Classifier

ColumnHeadingAbbreviationsforWebGUIElements:Heading(H),Paragraph(P),Image(I),Hyperlink(HL),DropdownMenu(DM),Button(B),GoogleMap (GM),TextField(TF),TextArea(TA),Icon(IC),SearchBar(SB)andLabel(L).

TheCNN-drivenimageclassifierachieves89.8%ofthetop-1precisiononthepredictionofwebGUIelements,whichisthe overallprobabilityoftrueclassifications Therefore,itisexperimentallystatedthattheCNN-drivenimageclassifieremployedin theproposedapproachperformsmuchbetter

5.2 Results: DOM Hierarchy Construction

AnimportantstepintheDOMhierarchyconstructionprocessistheautomaticgenerationofaDOMhierarchy,enablinga well-structuredlayoutandthepropervisualizationofwebGUIelementsintoHTML5webGUIcontainers.Theevaluationofthe approach’sDOMhierarchyconstructioncomparesagainstthetwobaselinewebsitegenerationapproaches, Sketch2Code and pix2code,bydecomposingtheruntimeDOM-hierarchiesintotreesandmeasuringtheLevenshteineditdistancebetweenthe generatedtreesandtargettreesasdescribedinSection4.2.Byvaryingthepenaltyvaluesprescribedtoeacheditoperation,a morecomprehensiveunderstandingofthesimilarityofthegeneratedDOMhierarchiescanbegainedbyobserving,forinstance,

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

whethercertainDOMhierarchiesweremoreorlessshallowthantherealwebsites,byexaminingtheperformanceofinsertion anddeletionedits.

Figure4comparestheDOMhierarchiesgeneratedbyourapproach, Sketch2Code,and pix2code approachesbasedontreeedit distance(I–III),whichmeasurestheminimumcostofeditingoperationswhentransformingonetreeintoanother.Eachgraph illustrates results for a different edit operation, and the proposed approach, Sketch2Code, and pix2code approaches are representedbylinesinthreedifferentcolourswiththeeditdistanceshownontheY-axisandthepenaltyprescribedtotheedit operationontheX-axis.

Thegraphsshowthatthe Sketch2Code and pix2code approachestrendtoproducerelativelyshallowDOMhierarchies.

Figure 4: Similarity of DOM Hierarchies Constructed by WebDraw, pix2code and Sketch2code based on Levenshtein Edit Distances

Whenthepenaltyforinsertionoperationisincreased,ourapproachoutperformsboth Sketch2Code and pix2code.Thisisbecause ourapproachrequiresfewerinsertionoperationsintotheDOMhierarchiestomatchthetargetDOMhierarchiesprecisely. However, Sketch2Code and pix2code requireaddingmoreinnernodestotheDOMtreesbecausetheyproduceveryshallowDOM hierarchies.TheevaluationshowsthattheshallowDOMhierarchiesproducedbythe pix2code and Sketch2Code approaches matchthetargetDOMhierarchiesandourapproach'sDOMhierarchies.Whenrenderingawebsiteinawebbrowser,ashallow DOMhierarchyispreferred,andourapproachbuildsDOMhierarchiesthataresignificantlysimilartotargetDOMhierarchies andtendtobeslowerthan Sketch2Code and pix2code

6. FUTURE DIRECTIONS AND CONCLUSIONS

AlthoughtheproposedapproachisaneffectivetoolforconstructingDOMhierarchies,ithassomepracticallimitations, someofwhichpresentfutureresearchopportunitiesinautomaticDOMhierarchyconstruction.Whenwebsitesarescreencapturedfromdesktoporlaptopcomputers,theiterativeDOMhierarchyconstructionalgorithmiscurrentlytiedtothewebsite’s appearanceonaparticularscreensize.Althoughthisalgorithm canbegeneralizedtoworkregardlessofscreensizeusing displaydensity-independentpixel(dp)values,thisisataskforthefuture.

(I) Insertion Edits

(II) Substitution Edits

(III) Deletion Edits

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

The current CNN-driven web GUI element classifier can classify incoming images into one of the top 12 web GUI elements.Thus,itcurrentlydoesnotsupportunusualwebGUIelementtypes.FutureresearchstudiescouldexamineCNN architectureswithgreaterclassificationcapacity(forinstance,CNNarchitecturesindeeplearningtoclassifymorewebGUI elementswithwebGUIcontainers),orresearcherscouldevenexaminenewlyemergingarchitecturessuchashierarchicalCNNs [22].TheDOMhierarchyconstructionapproachcurrentlyoperatesintwostepstodetectandclassifywebGUIelementsfroma websitemock-updesignartifact;however,futureresearchstudieswillinvestigatethepotentialofobjectdetectionclassifiersin CNN[23,24]thatcanperformtheseobjectdetectionandclassificationstepssimultaneously.

Theever-growingfieldofmachinelearninghasledresearcherstoasignificantstepforward:improvingUIdesignsto provideabetterfunctionaluserexperienceatalowercost.Withthediscoveryoftheapproach,researcherscanexploreobject detection-orientedCNNclassifiersinthefuturetoprovidebettersupportinthewebGUIelementsdetectionphaseandobject classification.Similarly,theapproachcanbedeployedasanautomaticwebsiteprototypegenerationtoolthatconvertsthe constructedDOMhierarchiesintofunctionalHTML/CSScode.

ThenovelautomaticDOMhierarchyconstructionapproachpresentedinthispaperwillbenefitwebUI/UXdevelopers byallowingthemtospeeduptheiterativewebUIprototypingprocessintheirworkflowbyenablingthegenerationofautomatic DOMforawebsiteformatoftheirchoice.Therefore,theinventiveapproachmeetstheneedtointroduceanovelapproachtothe automaticDOMhierarchyconstructionstrategyintheautomaticwebUIprototypingconcepttoreducetheburdenandwasteof timeintheiterativewebUIprototypingprocess.

ACKNOWLEDGEMENT

Thisresearchreceivednospecificgrantfromfundingagenciesinthepublic,commercial,ornot-for-profitsectors.

REFERENCES

[1] Kaluarachchi,T.,&Wickramasinghe,M.(2023). A systematic literature review on automatic website generation.Journalof ComputerLanguages,Volume75,2023,101202,ISSN2590-1184,https://doi.org/10.1016/j.cola.2023.101202

[2] Beltramelli,T.(2018,June). pix2code: Generating code from a graphical user interface screenshot.InProceedingsoftheACM SIGCHISymposiumonEngineeringInteractiveComputingSystems(pp.1-6).

[3] Suleri, S., Sermuga Pandian, V. P., Shishkovets, S., & Jarke, M. (2019, May). Eve: A sketch-based software prototyping workbench.InExtendedAbstractsofthe2019CHIConferenceonHumanFactorsinComputingSystems(pp.1-6).

[4] Moran,K.,Bernal-Cárdenas,C.,Curcio,M.,Bonett,R., &Poshyvanyk,D.(2018). Machine learning-based prototyping of graphical user interfaces for mobile apps.IEEETransactionsonSoftwareEngineering,46(2),196-221.

[5] Ozkaya, Are DevOps and automation our next silver bullet? IEEESoftw.36(4)(2019)3–95.

[6] Robinson,A.(2019). Sketch2code: Generating a website from a paper mock-up.arXivpreprintarXiv:1905.13750.

[7] T.A.NguyenandC.Csallner, Reverse Engineering Mobile Application User Interfaces with REMAUI (T).201530thIEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 2015, pp. 248-259, DOI: 10.1109/ASE.2015.32.

[8] Hassan,S.,Arya,M.,Bhardwaj,U.,&Kole,S.(2018). Extraction and Classification of User Interface Components from an Image. InternationalJournalofPureandAppliedMathematics,118(24),1-16.

[9] Aşıroğlu,B.,Mete,B.R.,Yıldız,E.,Nalçakan,Y.,Sezen,A.,Dağtekin,M.,&Ensari,T.(2019,April). Automatic HTML code generation from mock-up images using machine learning techniques.In2019ScientificMeetingonElectrical-Electronics& BiomedicalEngineeringandComputerScience(EBBT)(pp.1-4).IEEE.

[10]Kim, B., Park, S., Won, T., Heo, J., & Kim, B. (2018, October). Deep-learning based web UI automatic programming. In Proceedingsofthe2018ConferenceonResearchinAdaptiveandConvergentSystems(pp.64-65).

[11]J.Ferreira,A.Restivo,H.S.Ferreira, Automatically Generating Websites from Hand-Drawn Mock-Ups,InProceedingsofthe 16thInternationalJointConferenceonComputerVision,ImagingandComputerGraphicsTheoryandApplications,2021.

[12]Yun,Y.S.,Jung,J.,Eun,S.,So,S.S.,&Heo,J.(2019). Detection of GUI elements on sketch images using object detector based on deep neural networks. InProceedingsoftheSixthInternationalConferenceonGreenandHumanInformationTechnology: ICGHIT2018(pp.86-90).SpringerSingapore.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024 www.irjet.net p-ISSN: 2395-0072

[13]Kaluarachchi,T.T.,Kumara,H.M.M.S.D.,&Wickramasinghe,M.I.E.(2020). Turning GUI into HTML source code with Image processing and Deep learning.PGISInternationalResearchCongress(RESCON)2020,FacultyofScience,University ofPeradeniya,SriLanka.

[14]Sketch2Code-MicrosoftAIlabs,2019, https://sketch2code.azurewebsites.net/

[15]ChunyangChen,TingSu,GuozhuMeng,ZhenchangXing,YangLiu, From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation,in:Proceedingsofthe40thInternationalConferenceonSoftware Engineering,ICSE’18,ACM,NewYork,NY,USA,2018,pp.665–676.

[16]M.D.ZeilerandR.Fergus. Visualizing and Understanding Convolutional Networks.Cham:SpringerInternationalPublishing, 2014,pp.818–833.[Online].Available: https://doi.org/10.1007/978-3-319-10590-1_53

[17]A.Krizhevsky,I.Sutskever,andG.E.Hinton. Imagenet Classification with Deep Convolutional Neural Networks.InAdvances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates,Inc.,2012,pp.1097–1105.[Online].Available:http://papers.nips.cc/paper/4824-imagenet-classification-withdeep-convolutional-neural-networks.pdf

[18]MATLABNeuralNetworkToolbox:https://www.mathworks.com/products/neural-network.html

[19]MATLABTesseractOCRLibrary:https://www.mathworks.com/products/computer-vision.html

[20]MATLABParallelComputingToolbox:https://www.mathworks.com/products/parallel-computing.html

[21]Wagner-Fischeralgorithm:https://en.wikipedia.org/wiki/Wagner-Fischer_algorithm

[22]R.Girshick,J.Donahue,T.Darrell,andJ.Malik,2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.InProceedingsofthe2014IEEEConferenceonComputerVisionandPatternRecognition,ser.CVPR’14. Washington,DC,USA:IEEEComputerSociety,pp.580–587.

[23]S.Ren, K. He, R.Girshick,andJ. Sun,2015.FasterR-CNN:TowardsReal-Time ObjectDetection with Region Proposal Networks.InProceedingsofthe28thInternationalConferenceonNeuralInformationProcessingSystems,ser.NIPS’15. Cambridge,MA,USA:MITPress,pp.91–99.[Online].Available:http://dl.acm.org/citation.cfm?id=2969239.2969250

[24]Gu,Jiuxiang,ZhenhuaWang,JasonKuen,LianyangMa,AmirShahroudy,BingShuai,TingLiuetal.,2018. Recent Advances in Convolutional Neural Networks.Patternrecognition,77,354-377.

BIOGRAPHIES

ThisaranieKaluarachchireceivedaB.Sc.degreeinComputerSciencefromtheFacultyofScienceatthe UniversityofPeradeniyainKandy,SriLanka.SheispursuingherPhDinComputingattheUniversityof ColomboSchoolofComputinginColombo,SriLanka.HerresearchinterestsincludeArtificialIntelligence, MachineLearning,DeepLearning,andComputerVision.