Reinforcement Learning Strategies for Dynamic Resource Allocation in Cloud-Based Architectures by IRJET Journal

Reinforcement Learning Strategies for Dynamic Resource Allocation in Cloud-Based Architectures

Sai Krishna Khanday

DevOps Engineer

Abstract

Cloud computing is becoming an increasingly popular approach to providing efficient and scalable solutions for the high demandsofmodernapplications.Oneofthekeychallengesinthisenvironmentisefficientlyallocatingresourcestomeetthe varying needs of different users and applications. Reinforcement learning (RL) is a promising approach for tackling this problem,asitallowsdynamicadaptationtochangingconditions.ThispaperproposesaframeworkthatusesRLstrategiesfor optimal resourceallocationincloud-basedarchitectures.Ourapproachusesagents thatinteractwiththecloudenvironment and learn to allocate resources based on feedback from the environment. These agents also consider different applications' varyingresourcedemandsandprioritiestooptimizetheoverallresourceallocation.Ourexperimentsshowthatourproposed RL framework outperforms traditional resource allocation methods regarding utilization and response time. It also adapts well to changing conditions and is robust to unexpected changes in the workload. Our approach has the potential to significantly improve resource utilization, reduce costs, and enhance the user experience in cloud-based architectures. Additionally,ourframeworkcanbeeasilyextendedtohandlecomplexmulti-objectiveresourceallocationproblems,makingit aversatileapproachfordynamicresourceallocationincloudenvironments

Keywords: HighDemands,DynamicAdaptation,AllocateResources,Robust,ReduceCosts

1. Introduction

Dynamic Resource Allocation is a key feature of cloud-based architectures, which refers to the ability of a system to automatically provision and scale resources depending on the current workload and demand [1]. It allows for efficient and effectiveutilizationofresourcesandensureshighavailabilityandperformanceofapplications.Inacloud-basedarchitecture, resourcessuchasvirtualmachines,storage,andnetworkbandwidtharesharedamongmultipleusers [2].Asthedemandfor resourcescanvarygreatlydependingontheworkloadandusagepatterns,itisimportanttohaveadynamicsystemthatcan allocateresourcesasneededratherthanhavingafixedallocationthatmayresultinunderutilizationoroversubscription [3] ThereareseveralcomponentsinvolvedinDynamicResourceAllocation.ThefirstistheResourcePool,apoolofresourcesthat canbeallocatedtoapplicationsondemand.Thispoolcanincludephysicalandvirtualresources,suchasserversandstorage devices[4].ThesecondcomponentistheResourceManager,responsibleformanagingandallocatingresourcesfromthepool. It receives resource requests from different applications and allocates them based on predefined rules and policies [5]. Dynamic resource allocation refers to assigning computing resources, such as processing power, storage, and memory, to different applications or services based on their current demand. In cloud-based architectures, where resources are shared among multiple users and applications [6], dynamic resource allocation plays a crucial role in ensuring efficient and optimal utilizationofresources.However,italsoposesseveral challengesthatmustbeaddressedtodeployandoperatecloud-based architectures successfully [7]. One of the main issues of dynamic resource allocation is the management of resource contention. As multiple applications compete for resources [8], there may be instances where more resources are needed to meet the demand, resulting in contention. This can lead to performance degradation or system failures if not managed properly [9]. Therefore, cloud providers must implement robust resource allocation algorithms and policies to address this challenge. Another area for improvement is the difficulty in accurately predicting resource demand [10]. As the demand for resources constantly fluctuates, it is often challenging to determine the exact amount needed to execute an application or service.Thiscanresultin eitherunder-provisioning,leadingtopoorperformance,orover-provisioning,resultinginwastage ofresourcesandincreasedcosts.Themaincontributionoftheresearchhasthefollowing:

 Improved efficiency: Dynamic resource allocation in cloud-based architectures enables the allocation of computing, storage,andnetworkresourcesflexibleandscalable.Thisallowsforefficientutilizationofresources,ensuringthatno resourcesarewastedandminimizingoverallcosts.

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

e-ISSN:2395-0056

p-ISSN:2395-0072

 Seamless scalability: With dynamic resource allocation, cloud-based architectures can quickly respond to spikes in workload demand by scaling up resources accordingly. This ensures that services are always available to users and canhandledemandfluctuationswithoutinterruption.

 Cost optimization: Cloud-based architectures can optimize resource allocation and minimize costs by dynamically allocatingresourcesbasedonworkloaddemand.Thisisparticularlybeneficialforbusinesseswithvaryingworkload demands,astheycanavoidover-provisioningandonlypayfortheresourcestheyneedatanygiventime.Thisresults insignificantcostsavingsfororganizations.

The remainingpartoftheresearchhasthe followingchapters. Chapter2describestherecent worksrelated tothe research. Chapter3describestheproposedmodel,andchapter4describesthecomparativeanalysis.Finally,chapter5showstheresult, andchapter6describestheconclusionandfuturescopeoftheresearch.

2. Related Words

Schuler, L.,et al.[11] have discussed AI-based resource allocation, which uses reinforcement learning, a type of machine learning, to continuously monitor and adjust resource allocation in a server less environment. This enables efficient autoscaling, automatically increasing or decreasing resources based on real-time demand and optimizing performance and cost. Wei,Y.,etal.[12]havediscussedthisapproach,whichutilizesreinforcementlearningtodynamicallyadjusttheresourcesofa Software-as-a-Service(SaaS)providerinachangingcloudenvironment.Itlearnsfrompastdataanduserinteractionstomake intelligent predictions and optimize resource allocation, ensuring efficient and cost-effective customer service delivery.Penmetcha, M., et,al.[13] have discussed The deep reinforcement learning-based dynamic computational offloading methodforcloudrobotics,whichinvolvesusingmachinelearningalgorithmstooptimizethedistributionoftasksbetweenthe robot and the cloud, thereby reducing the burden on the robot's computational resources. This allows for efficient and adaptabledecision-makinginreal-time,leadingtoimprovedoverallperformanceandfunctionalityofcloudroboticssystems. Rosenberger, J., et al. [14] have discussed a Deep reinforcement learning multi-agent system for resource allocation in industrialIoTsystems.Thisisacomplexdecision-makingsystemthatusesmachinelearningalgorithmstooptimizeresource allocationinindustrialIoTsystems.Itleveragesthepowerofmultipleagentsanddeepreinforcementlearningtodynamically adapt and optimize resource allocation, improving efficiency and performance. Guo, W., et al. [15] have discussed Cloud resource scheduling, which refers to the process of allocating virtual resources (e.g., computing power, storage) in a cloud environment. Deep reinforcement learning and imitation learning are advanced machine learning techniques that can optimizethisprocessbyusingdataandexperiencetomakeintelligentdecisionsonresourceallocation,resultinginimproved efficiency and cost optimization. Ran, L., et al.[16] have discussed the SLAs-aware online task scheduling method based on deep reinforcement learning, which aims to optimize task allocation and resource utilization in a cloud environment. Deep reinforcementlearningcandynamicallyadjusttheschedulingstrategybasedonincomingtasksandtheircorrespondingSLA requirements,ensuringefficientandreliabletaskallocation. Mangalampalli,S.,etal.[17]havediscussedDeepreinforcement learning (DRL), a type of machine learning technique that enables agents to learn optimal decision-making policies through trialanderror.Incloudcomputing,DRL-basedtask-schedulingalgorithmsusethisapproachtoallocatecomputingresources and schedule tasks efficiently and autonomously automatically. Khani, M., et al.[18] have discussed Deep reinforcement learning-based resource allocation in multi-access edge computing, a method that uses artificial intelligence techniques to optimizetheallocationofcomputingresourcesinedgecomputingnetworks.Itutilizesafeedbacklooptocontinuouslylearn and make decisions on efficiently distributing resources for different tasks, resulting in improved performance and energy efficiency.Joseph,C.T.,etal.[19]havediscussedFuzzyreinforcementlearning-basedmicroserviceallocation,atechniquethat leverages the principles of fuzzy logic and reinforcement learning to dynamically allocate micro services in cloud computing environments.Ituseshistoricaldataanduserfeedbacktomakeintelligentdecisionsaboutdistributing microservicesamong available resources, optimizing performance and resource utilization. Jyoti, A., et al.[20] have discussed Dynamic resource provisioningincloudcomputing,whichistheprocessofautomaticallyallocatingorscalingcomputingresourcesbasedonthe current demand or load. This can be achieved through load balancing techniques and service broker policies, which help efficientlymanageresourcesandensureoptimalperformanceforcloudservices.

3. Proposed model

The proposed Reinforcement Learning Strategies for Dynamic Resources model is designed to optimize resource allocation, management, and utilization in a dynamic and evolving environment. It utilizes reinforcement learning, an artificial

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

intelligence that learns through trial and error, to make decisions and actions that maximize resource efficiency and performance.Themodelhasthreecomponents:theenvironment,theagent,andtherewardsandpenaltiessystem.

Inthissubsectionwedefinetherewardfunction,valuefunctionandQfunctionfromthisMDPforthereinforcementlearning. 1

We thendefinethevaluefunction of eachstate Vπ(s),denotingthe expectedtotal rewardforanagentstartingfromstate’s withthepolicyπ.

Amongallpolicyπ’q,thereexistinganoptimalpolicyπ∗thatmakes 

Tq  tobemaximum.

WithinthetimeconstraintperiodT,thetransmissionrateofthedatapacketwithasizeofBisdetermined.

The environment includes all the resources and variables affecting their availability and performance. The agent is the decision-making entity that interacts with the environment and learns from its actions. The rewards and penalties system providesfeedbacktotheagentbasedonitsdecisionsandactions,encouragingittolearnandimproveovertime.

(3)

Theresourceallocationproblemunderconsiderationcanbedefinedasfollows:forallk ∈Kandm∈thetransmissionpower oftheV2Vlinkcanbeadjustedcontinuouslyasavariable.

The local channel information that an agent can observe comprises its own channel gain []eyn , the interference channel ,'[]eyyn ] from other V2V link transmitters, the interference channel ,[]eyIn from all V2I senders, and the interference channel ,[]enyn fromallV2Vsenders,forallm∈M.

This paper focuses on the resource allocation design for the Ivor, which involves the continuous control of the transmission poweroftheT2Tlink.

Initially,theagenthasnopriorknowledgeabouttheenvironmentandmustexploreandinteractwithittogatherinformation and learn. Through trial and error, the agent knows which actions result in positive outcomes (rewards) and negative consequences (penalties). As it continues to interact with the environment, the agent becomes better at predicting which actionswillbringthemostsignificantrewards.

3.1.Construction

ReinforcementLearning(RL)isamachinelearningapproachthatenablesagentstolearnoptimaldecision-makingstrategies through interactions with an environment. In dynamic resource management, RL strategies help agents make effective decisionsforallocatingandutilizingresourcesinachangingclimate.Fig1showstheconstructionoftheproposedmodel

e-ISSN:2395-0056

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net p-ISSN:2395-0072

Fig1constructionoftheproposedmodel

The construction of RL strategies for dynamic resource management involves several technical details, which include the environment,agent,rewards,andlearningalgorithms.Firstly,theenvironmentmustbemodeledaccuratelytoreflectresource availabilitydynamicsandresourceallocation'simpactontheenvironment.

Then,therewardissettoaconstantβ,whichisgreaterthanthemaximumV2Vlinktransmissionrate.

TherewardfunctioninRLiscrucialforachievingoptimalperformanceinhighdimensionalandcomplexenvironments.

Therewardfunctioninthis paperisdesignedtobalancethetrade-offbetweenthe total capacityoftheV2Ilink andtheV2V linkload’sprobabilityofsuccessfultransmission.

TheDeepDeterministicPolicyGradientalgorithmisanapproachthatusesneuralnetworkstofitthevaluefunction.

Thisinvolvesidentifyingtherelevantparametersandstatesofthe environment,suchasthenumberand types ofresources, resource usage patterns, and resource demand. Secondly, the agent must be designed to interact with the environment and learnoptimalresourcemanagementstrategies.

International Research Journal of Engineering and Technology (IRJET)

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

e-ISSN:2395-0056

p-ISSN:2395-0072

The Critic network evaluates the quality of the action selected by the Actor-network according to the policy Qi () using the state-action value function. Ski t represents the input state of agent k, and γ represents the discount factor of the immediate reward Lyv

The DDPG algorithm aims to obtain an optimal policy π∗ y and learn the corresponding state-action value function until it reachesconvergence.

The agent's goal is to maximize a reward function, which reflects the desired outcome of resource utilization, such as maximizingefficiencyorminimizingcost.

TheActorandCriticnetworksupdatetheirevaluationnetworkparametersbasedontheinputmini-batchsamples.TheCritic network’slossfunctioncanbeexpressedwith3.

Theaction-statevaluefunctionofthetargetnetworkisdenotedasQ′k().IfLsq.k)iscontinuouslydifferentiable,thensq.k canbeupdatedusingthegradientofthelossfunction.

Theagentcantakeactionbyusingvariousresourceallocationandutilizationstrategies.

3.2.Operatingprinciple

Reinforcementlearning(RL)isatypeofmachinelearningthatallowsanagenttolearnandmakedecisionsinanenvironment through trial and error. In the context of dynamic resource management, RL can be used to optimize the allocation and utilization of resources in real time. The operating principle of reinforcement learning strategies for dynamic resource managementinvolvesthreemaincomponents:theagent,theenvironment,andtherewardsystem. Fig2showstheoperating principleoftheproposedmodel

Fig2operatingprincipleoftheproposedmodel

International Research Journal of Engineering and Technology (IRJET)

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

The agent is the decision-making entity that interacts with the environment. It takes actions based on its current state and receives a reward or penalty from the environment for each action. The environment represents the optimized system or process,suchasacomputernetworkormanufacturingsystem.

The task reallocation and service migration can be triggered when the controller detects any significant Qi’s degradation or newrequirementofIotausers,whichisbeyondthefocusofthispaper.

We employ a widely-used edge computing model whereby the computing latency depends on the computing capacity requirementandtheallocatedcomputingcapacity.

(10)

Itisdynamicandconstantlychanging,creatingacomplexanduncertainenvironmentfortheagenttolearnfrom.Thereward system provides feedback to the agent based on its actions. The agent's goal is to maximize its total reward over time, so it learnstotakeactionsthatleadtohighrewardsandavoidactionsthatleadtopenalties.

4. Result and Discussion

The proposed model Distributed Distributional Deterministic Policy Gradients (D4PG) has been compared with the existing TrustRegionPolicyOptimization(TRPO),DeterministicPolicyGradient(DPG)andSoftActor-Critic(SAC)

4.1.ConvergenceRate:Thisparametermeasuresthespeedatwhichthereinforcementlearningalgorithmcanlearnandadapt to changing resource allocation situations in the cloud-based architecture. A higher convergence rate indicates that the algorithm can quickly and effectively adjust resource allocation based on real-time changes, improving overall system performance Fig.3showstheComparisonofConvergenceRate

Fig.3 ComparisonofConvergenceRate

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

4.2.ResourceUtilization:Theresourceutilizationparametermeasurestheefficiencyofthereinforcementlearningstrategyin utilizing and allocating resources within the cloud-based architecture. A higher resource utilization rate indicates a more efficientuseofavailableresources,resultinginimprovedsystemperformanceandcostsavings.Fig.4showstheComparisonof ResourceUtilization

4.3. Quality of Service (QoS): This parameter measures the ability of the reinforcement learning strategy to maintain and improve the quality of service for users of the cloud-based architecture. This includes factors such as response time, availability,andreliability Fig.5showstheComparisonofQualityofService

Fig.5 ComparisonofQualityofService

Fig.4 ComparisonofResourceUtilization

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net

e-ISSN:2395-0056

p-ISSN:2395-0072

4.4.Robustness:Therobustnessparametermeasurestheabilityofthereinforcementlearningstrategytohandle unexpected and unpredictable changes in the cloud-based architecture, such as sudden changes in workload or resource failures Fig.6 showstheComparisonofRobustness

Conclusion:

Inthispaper,weexploredthepotential ofreinforcementlearning(RL)strategiesforoptimizingdynamicresourceallocation in cloud-based architectures. The proposed RL framework demonstrated the ability to adapt to varying workload demands whileefficientlymanagingcloudresources.Bylearningfromreal-timefeedback,theRLagentsimprovedresourceutilization, reduced response times, and achieved higher performance compared to traditional allocation methods. This dynamic approachnotonlyenhancesthescalabilityandrobustnessofcloudsystemsbutalsooffersapathwayforcostoptimizationby ensuring resources are provisioned according to precise demand levels.The experimental results validated the framework's capability to handle unexpected changes in workloads, highlighting its robustness in a fluctuating cloud environment. Furthermore,theframeworkcanbeextendedtomorecomplexmulti-objectiveresourceallocationproblems,pavingtheway forfutureresearchincloudcomputingoptimization.

Overall,theimplementation ofreinforcement learningstrategiesforresourceallocationprovidesa promisingsolutiontothe challenges faced by modern cloud infrastructures, offering significant improvements in efficiency, performance, and scalability. Further research is encouraged to refine the model, address potential security concerns, and explore its applicationsindiversecloudenvironments.

References

[1] Thein, T., Myo, M. M., Parvin, S., & Gawanmeh, A. (2020). Reinforcement learning based methodology for energyefficient resource allocation in cloud data centers. Journal of King Saud University-Computer and Information Sciences,32(10),1127-1139.

Fig.6 ComparisonofRobustness

International Research Journal of Engineering and Technology (IRJET)

e-ISSN:2395-0056

Volume: 11 Issue: 10 | Oct 2024 www.irjet.net p-ISSN:2395-0072

[2] Chen,X.,Zhu,F.,Chen,Z.,Min,G.,Zheng,X.,&Rong,C.(2020).Resourceallocationforcloud-basedsoftwareservices using prediction-enabled feedback control with reinforcement learning. IEEE Transactions on Cloud Computing, 10(2),1117-1129.

[3] Belgacem, A., Mahmoudi, S., & Kihl, M. (2022). Intelligent multi-agent reinforcement learning model for resources allocationincloudcomputing.JournalofKingSaudUniversity-ComputerandInformationSciences,34(6),2391-2404.

[4] Yang,Z.,Nguyen,P.,Jin,H.,&Nahrstedt,K.(2019,July).MIRAS:Model-basedreinforcementlearningformicroservice resource allocation over scientific workflows. In 2019 IEEE 39th international conference on distributed computing systems(ICDCS)(pp.122-132).IEEE.

[5] Chen,Y.,Sun,Y.,Wang,C.,&Taleb,T.(2022).Dynamictaskallocationandservicemigrationinedge-cloudiotsystem basedondeepreinforcementlearning.IEEEInternetofThingsJournal,9(18),16742-16757.

[6] Desai, B., & Patil, K. (2023). Reinforcement Learning-Based Load Balancing with Large Language Models and Edge IntelligenceforDynamicCloudEnvironments.JournalofInnovativeTechnologies,6(1),1-13.

[7] Nascimento, A., Olimpio, V., Silva, V., Paes, A., & de Oliveira, D. (2019, May). A reinforcement learning scheduling strategy for parallel cloud-based workflows. In 2019 IEEE international parallel and distributed processing symposiumworkshops(IPDPSW)(pp.817-824).IEEE.

[8] Nascimento, A., Olimpio, V., Silva, V., Paes, A., & de Oliveira, D. (2019, May). A reinforcement learning scheduling strategy for parallel cloud-based workflows. In 2019 IEEE international parallel and distributed processing symposiumworkshops(IPDPSW)(pp.817-824).IEEE.

[9] Xu, Z., Tang, J., Yin, C., Wang, Y., Xue, G., Wang, J., & Gursoy, M. C. (2020). ReCARL:resource allocation in cloudRANs withdeepreinforcementlearning.IEEETransactionsonMobileComputing,21(7),2533-2545.

[10] Asghari, A., Sohrabi, M. K., & Yaghmaee, F. (2020). A cloud resource management framework for multiple online scientificworkflowsusingcooperativereinforcementlearningagents.ComputerNetworks,179,107340.

[11] Schuler,L., Jamil, S., & Kühl, N.(2021, May). AI-based resourceallocation: Reinforcementlearning for adaptive autoscalinginserverlessenvironments.In2021IEEE/ACM21stInternational SymposiumonCluster,CloudandInternet Computing(CCGrid)(pp.804-811).IEEE.

[12] Wei,Y.,Kudenko,D.,Liu,S., Pan,L.,Wu,L.,&Meng,X.(2019).Areinforcementlearningbasedauto‐scalingapproach forSaaSprovidersindynamiccloudenvironment.MathematicalProblemsinEngineering,2019(1),5080647.

[13] Penmetcha, M., & Min, B. C. (2021). A deep reinforcement learning-based dynamic computational offloading method forcloudrobotics.IEEEAccess,9,60265-60279.

[14] Rosenberger, J., Urlaub, M., Rauterberg, F., Lutz, T., Selig, A., Bühren, M., & Schramm, D. (2022). Deep reinforcement learningmulti-agentsystemforresourceallocationinindustrialinternetofthings.Sensors,22(11),4099.

[15] Guo, W., Tian, W., Ye, Y., Xu, L., & Wu, K. (2020). Cloud resource scheduling with deep reinforcement learning and imitationlearning.IEEEInternetofThingsJournal,8(5),3576-3586.

[16] Ran,L.,Shi,X.,&Shang,M.(2019,August).SLAs-awareonlinetaskschedulingbasedondeepreinforcementlearning method in cloud environment. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data ScienceandSystems(HPCC/SmartCity/DSS)(pp.1518-1525).IEEE.

[17] Mangalampalli, S., Karri, G. R., Kumar, M., Khalaf, O. I., Romero, C. A. T., & Sahib, G. A. (2024). DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing. Multimedia Tools and Applications, 83(3),8359-8387.

[18] Khani,M.,Sadr,M.M.,&Jamali,S.(2024).Deepreinforcementlearning‐basedresourceallocationinmulti‐accessedge computing.ConcurrencyandComputation:PracticeandExperience,36(15),e7995.

[19] Joseph,C.T.,Martin,J.P.,Chandrasekaran,K.,&Kandasamy,A.(2019,October).Fuzzy reinforcementlearningbased microservice allocation in cloud computing environments. In TENCON 2019-2019 IEEE Region 10 Conference (TENCON)(pp.1559-1563).IEEE.

[20] Jyoti,A.,&Shrimali,M.(2020).Dynamicprovisioningofresourcesbasedonloadbalancingandservicebrokerpolicy incloudcomputing.ClusterComputing,23(1),377-395.