Multi-Objective Reinforcement Learning for Resource-Optimal LLM Serving in SaaS Clouds

Large Language Models, SaaS, Adaptive Precision, Energy Efficiency, Coherence, Factuality, Model Serving.

Authors

Vol. 13 No. 11 (2025)
Engineering and Computer Science
November 16, 2025

Downloads

Large Language Models (LLMs) have been the cornerstone for current Software as a Service (SaaS) solutions. These LLMs have made intelligent automation and analytics possible. But their current computation or inference cost is high. As a result, cloud service companies face challenges with respect to cloud scalability. Adaptive Precision Scaling (APS) is the strategy of adapting computational precision during execution. This paper describes the newly proposed architecture of Adaptive Precision Scaling (APS) in the context of Software as a Service (SaaS) and proposes a taxonomy of precision scaling to have a clearer understanding of precision adaptivity.