Prompt-Layered Architecture: A New Stack for AI-First Product Design
Downloads
With the advancement of large language models powering next-generation applications, there is an increasing demand for architectural frameworks that treat prompts as modular, orchestratable, and extendible parts of a software system. The traditional methods of AI integration have treated prompt engineering as some kind of ad hoc or application-specific task with no connection to systematic design principles or software architecture standards. The paper introduces the Prompt-Layered Architecture (PLA), a new architectural style where prompts have been elevated into first-class citizens of the software stack. PLA provides composition, management, and orchestration of prompts through modularized layers, thus allowing the building of AI-first products that are scalable, testable, and extendible.
We formalize the PLA model as four core layers: the Prompt Composition Layer, the Prompt Orchestration Layer, the Response Interpretation Layer, and the Domain Memory Layer, which together support reuse of prompt templates, structured routing of model outputs, persistence of memories across interaction chains, and resonance to business logic and user context. Inspired by traditional layered software architectures, PLA brings versioning to LLMs, API-driven abstraction of prompts, and test scaffolding for verifying LLM behavior.
To validate the design, we develop a prototype implementation on top of the OpenAI GPT APIs and evaluate the PLA versus flat prompt-based systems on modularity metrics, reusability benchmarks, and cognitive load for prompt engineers. The results evidence the benefits of PLA in improving maintainability while fast-tracking the integration of AI capabilities across various distributed services. The paper also illustrates several SmartArt diagrams and examples of orchestration in Python and discusses how PLA fills the gap between emerging frameworks such as LangChain, AutoGPT, and prompt programming compilers.
By formalizing prompts as copiable units of architecture, this research lays the blueprint for building scalable AI-first applications with structured reasoning, state awareness, and prompt governance.
Downloads
1. T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS, 2020, pp. 1877–1901.
2. A. Bach et al., "PromptSource: An integrated development environment and repository for natural language prompts," in Proc. EMNLP, 2022. [Online]. Available: https://github.com/bigscience-workshop/promptsource
3. PromptLayer, “PromptLayer – Prompt logging and versioning,” 2023. [Online]. Available: https://www.promptlayer.com
4. LangChain, “LangChain documentation,” 2023. [Online]. Available: https://docs.langchain.com
5. Significant Gravitas, "AutoGPT: Autonomous GPT-4 experiment," GitHub, 2023. [Online]. Available: https://github.com/Torantulino/Auto-GPT
6. Yohei Nakajima, "BabyAGI: AI-powered task management," GitHub, 2023. [Online]. Available: https://github.com/yoheinakajima/babyagi
7. H. Khattab et al., "DSPy: An interpretable programming model for building LLM pipelines," arXiv preprint arXiv:2305.14247, 2023.
8. M. Tunstall et al., "Guidance: A declarative language for controlling large language models," GitHub, 2023. [Online]. Available: https://github.com/microsoft/guidance
9. Superagent Team, “Superagent: Build LLM-powered agents in minutes,” GitHub, 2023. [Online]. Available: https://github.com/homanp/superagent
10. E. Gamma, R. Helm, R. Johnson, and J. Vlissides, *Design Patterns: Elements of Reusable Object-Oriented Software*, Addison-Wesley, 1994.
11. D. Sculley et al., "Hidden technical debt in machine learning systems," in Proc. NeurIPS, 2015, pp. 2503–2511.
12. T. Wolf et al., "Transformers: State-of-the-art natural language processing," in Proc. EMNLP: System Demonstrations, 2020, pp. 38–45.
13. OpenAI, “OpenAI API documentation,” 2024. [Online]. Available: https://platform.openai.com/docs
14. Pinecone, “Pinecone vector database,” 2024. [Online]. Available: https://www.pinecone.io
15. J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL, 2019, pp. 4171–4186.
16. R. Parrish and J. Steinhardt, "Prompt engineering best practices," OpenAI Technical Report, 2022.
17. M. Mitchell et al., "Model cards for model reporting," in Proc. FAT*, 2019, pp. 220–229.
18. A. Tamkin et al., "Understanding the capabilities, limitations, and societal impact of large language models," arXiv preprint arXiv:2102.02503, 2021.
19. D. Hudson et al., "Composable systems for language model orchestration," in Proc. ACM FAccT, 2022.
20. D. Liang et al., "Chain-of-thought prompting: Reasoning via intermediate steps," arXiv preprint arXiv:2201.11903, 2022.
21. M. Nye et al., "Show your work: Scratchpads for intermediate computation with language models," in Proc. NeurIPS, 2021.
22. S. Singh et al., "FLAML: A fast and lightweight AutoML library," in Proc. ICML, 2021.
23. A. Radford et al., "GPT-4 Technical Report," OpenAI, Tech. Rep., 2023. [Online]. Available: https://openai.com/research/gpt-4
24. C. Olston, S. F. R. Kaplan, and A. Elmeleegy, "Dataflow programming and its relevance to AI systems," in Proc. CIDR, 2021.
25. M. Bansal and D. Lee, "Task decomposition in NLP agents," in Proc. ACL, 2022, pp. 456–468.
26. J. Kreps, "Microservices and DevOps: Re-thinking software architecture," InfoQ, 2021. [Online]. Available: https://www.infoq.com/articles/microservices-devops-architecture
27. F. Chollet, "On the measure of intelligence," *arXiv preprint arXiv:1911.01547*, 2019.
28. L. Weidinger et al., "Ethical and social risks of LLMs," arXiv preprint arXiv:2112.04359, 2021.
29. M. Reynolds et al., "LLMOps: Building production LLM systems," arXiv preprint arXiv:2307.09288, 2023.
30. A. Zimek, E. Schubert, and H. Kriegel, "A survey on unsupervised outlier detection," *Stat. Anal. Data Mining*, vol. 5, no. 5, pp. 363–387, 2012.
Copyright (c) 2024 Savi Khatri

This work is licensed under a Creative Commons Attribution 4.0 International License.