A recent study’s deep dive into the US government’s massively complex Unified Command Plan finds that LLMs could be the answer to the cybersecurity professional’s forecasting needs — with human guidance. Credit: Andrey_Popov/Shutterstock As much as cybersecurity professionals might prefer otherwise, ours is a field defined by reaction — just ask the average cyber analyst or threat hunter about their day to get a sense of how true this is. We are a technology-centric field in a functional sense, but technology evolves interactively alongside trends in economics, sociopolitical conditions, and geopolitical fluctuations. Unfortunately, this means that cybersecurity planning gets hit coming and going. Our need to forecast is arguably greater than that of other security professions because this interplay of systemic effects makes risk harder to value. But forecasting itself is also more difficult and the results less trustworthy because of the ambiguity about what factors might prove impactful over time. This gives rise to another contradiction, namely that operational perspective alone is never sufficient to address questions about the changing shape of cybersecurity despite obviously being its most critical professional component. So, what’s a CISO to do? A recent report from the Center for Strategic and International Studies (CSIS) represents one of the first robust examples of how CISOs and cybersecurity executives might leverage generative AI to think more clearly about future cyber risk and the architecture of approach it implies. The report looked at how large language models (LLMs) could provide a different view of future Unified Command Plans (UCP) — secret documents compiled every couple of years within the Pentagon that delineate responsibilities, mission priorities, and functional capacities for all areas of defense establishment operation. AI solutions: Promising or problematic? The CSIS study questions whether the structure and architecture of national defense planning in the United States should be scrapped in favor of something better in years, and it suggests that effective analytic use of LLMs is less about the machine itself and more about the humans in the loop. Perhaps the most common refrain of the past few years is that artificial intelligence stands to paint a clearer picture of the future. But to what degree is that refrain just a sales pitch? After all, any data scientist will tell you that generative AI, in particular, is in the simplest sense just a sophisticated sentence constructor. LLMs don’t understand the meaning behind questions we might ask so don’t offer intuition so much as simple imitation. If cybersecurity’s key problem statement is that the past will only ever partially relate to future insecurity, how can a tool that uses past data to see patterns going forward be the gamechanger that so much marketing predicts? While uses of LLMs for forecasting are still relatively limited across industry and research environments, at least one recent and far-ranging study offers robust lessons for anyone interested in leveraging AI to resolve ambiguity about the cyber future. Datasets must be curated and tailored to multiple distinct-but-interrelated interest areas. Clear procedures are needed to discover the right questions and query pipelines to deploy. A focus on iterative, recursive engagements with models is required to clearly see useful patterns in underlying data and purge the impact of idiosyncratic outputs. Questioning the Unified Command Plan The UCP reflects the interests and the negotiations of innumerable offices, decision-makers, service elements, and subcultures, all of which is blended to create something both prone to idiosyncrasy and resistant to massive alteration. In short, UCPs are meant to be plans for the operation of national security tomorrow but rarely depart from the models of today except to emphasize pet projects. Naturally, the tribal negotiation of the UCP makes it open to significant criticism along the same lines that cybersecurity professionals might use to critique common conversations on the future of cyber practice. Whether thinking about CERTs or the function of a company’s security division, for instance, would we be better off organizing security as a functional set of missions in line with geographic, sectoral, issue-specific or some other factors? Regardless of the answer, surely the model we prefer today might not work as well in two decades. One need only look at transformations in global security architectures since the end of the Cold War or 9/11 to see the validity of such a critique. Should today’s senior decision-makers really have so much say in driving our view of how to grapple with future insecurity? Advocates of more dynamic thinking about future risk simply point out that the strongest voices in forecasting are usually those burdened with aggregating the most conflicting inputs to the process. Surely such narrow chokepoints for decision-making limit the views that are ultimately incorporated into future planning. How the CSIS study used LLMs to rethink the UCP To achieve their goals, the authors of the CSIS study built a series of datasets optimized for retrieval augmentation generation. This process allows researchers to change how LLMs weight information that is available for retrieval, essentially streamlining the capacity of a model to offer more expert tailored returns. A range of different parameters were used to define and demarcate different datasets. The authors then queried the models involved, deploying a cautious and deliberate methodology of building knowledge foundations before extracting more narrow forecasts. First, models were asked to summarize and characterize common criticisms of the UCP. Then, the AI was told to use these critiques as a baseline for illustrative exploration of competing options for reforming and transforming the UCP. Here, the models were directed to utilize competing datasets that had been curated to represent categories of knowledge that would (or should) typically drive Pentagon analytics, such as budgetary data or the strategic documentation of key adversaries. At this stage, the LLMs were asked iteratively (four separate times) to develop scenarios for which distinct alternatives to the UCP in the future would make sense. This process was repeated several times to tease out points of divergence and convergence. In each instance, the LLM was prompted to speak to key (often competing) ideas that often drive discourse on national security, from deterrence theory to social constructivism and grand narratives found in military history. Finally, the models’ outputs were taken and presented in a number of different ways for decision-maker consumption, from traditional reporting to AI-generated visualizations using Midjourney. LLMs can effectively help forecast alternative cyber futures In many ways, the use of LLMs, as described by this and other recent studies, is a kind of AI-infused adaptation of scenario planning. The opportunities to more effectively forecast risk and predict future functional needs beyond such traditional forecasting are clear, however. LLMs offer rapid returns. Careful curation of underlying datasets can provide a real proxy for expert inputs that both limits the need to pay for teams of analysts and can be manipulated to explore (and excise) potential bias. Moreover, the dissection of iterated outputs can ensure human intuition influences outcomes in a truly replicable fashion. And producing alternative findings for decision-maker consumption in different mediums can make communicating the threat of pathological thinking easier. Perhaps the most critical step in using an LLM effectively for forecasting that is customized to the experience of a single company or conglomerate of related organizations is the curation of multiple datasets that can proxy for different dimensions of the knowledge base a CISO wants to bring to bear. Through retrieval augmentation generation, an LLM can be optimized for exactly the kind of synthesis that then breeds true analysis via interactions with human researchers. What lessons should security teams take from the UCP study? What might this look like for a cybersecurity team? The UCP study provides some excellent lessons. The study operationalizes six bodies of thought and wide-ranging information in six distinct datasets. These are, briefly, documentation pertaining to: The nature and history of UCP development itself Adversary military strategy and doctrine (China, specifically) Deterrence theory Budgetary, resource, and net assessment Operational art and practice Scholarly theory about the function of international relations These categories of data reflect core baskets of thinking and information that experts identified as ideally constituted the foundation of broad calculations pertaining to the UCP. Most importantly, they are mutually exclusive. While one can find net assessments that couch analyses in the discussion of deterrence theory or developments in adversary ability, these datasets are curated to omit such conflation. The data are tagged by humans in some cases to provide a supervised understanding of what is understood to be seminal. Each dataset passes through several rounds of review to ensure balance in coverage of relevant perspectives (e.g. theoretical lenses or competing assessments of adversary strategy). In short, the value of expert human input is clear. Datasets must be developed that reflect the commitment to avoiding parochial bias in the process along several lines. Rules of thumb for cybersecurity pros using AI forecasting In cybersecurity, datasets might adopt a similar approach — of course, a company’s specific questions should determine exactly where these lines are drawn. But for any cyber professional interested in simply establishing a basis for using AI in this way, some rules of thumb seem obvious. First, optimized LLMs must be fed a baseline of company practice that encompasses standards of operation and operational history. This should include information not only about a firm seeking to think more clearly about its future but also data on organizations with similar profiles. Second, LLMs require exposure to data on developments in regulatory, legal, and normative standards of practice within cybersecurity as a global practice. This implies two distinct datasets for any firm optimizing their models, one narrowly cast around a company’s sectoral identity such as commercial retail or banking services, and another containing legal, scholarly and other expert documentation on issues of global cyber governance. Third, LLMs require threat data, which can be tailored to include a corpus of documentation from two sources. On the one hand, the reporting and technical information published by cybersecurity firms like Mandiant or FireEye should form the core of descriptive datasets on the evolution of the global cyber threat environment. On the other hand, research on cyber conflict as a geopolitical arena of contestation and a novel evolution of national intelligence architectures should be arrayed in their own distinct dataset as a critical step towards aligning perspectives on cybersecurity that often fail to meet in real life. Fourth, LLMs will require data on the business realities of cybersecurity in context. This means memoranda and direct net assessments that characterize the state of the art on technological capabilities and the resources that must underlie them for effective deployment. Future assessments of such can be included in this category. And finally, LLMs asked about the future of digital insecurity will need some cross-sectional optimization around the societal impacts of web technologies as a transformative force. Few forecasters in the 1990s would have foreseen the social media turn of the 2000s that laid the foundation for rising challenges of malign influence and disinformation today. But know-how on the intersection of technology and psychology, public opinion, and sociological issues can inject the kind of context into otherwise rote operational analysis. Injecting human intuition via iteration Datasets can and should be curated over time in line with the added nuance and specificity that security planners bring with their questioning. But this foundation is the kind of optimization regime that can allow planners to take the next important step in deploying LLMs for effective future-looking: iterative querying. Here, after constructing a descriptive knowledge base for the LLM to weight against and draw from, it is imperative that planners begin to engage their model with a synthesis-first set of expectations. Query the model to explore trends in the underlying data relevant to the mission or organization in question. The goal here is discovery. The humans in the loop must define baselines that reasonably comport with their expectations that can then be used to let an LLM loose in offering “creative” pattern analysis. Significantly, if the outputs of a model at this testing phase seem overly idiosyncratic, it is here that human intuition must make itself most clearly felt by returning to curate the parameters of underlying datasets and by exhaustively mapping the pathways an LLM takes to output its results. The decision-maker moment: Rich findings to invite rich questioning LLMs that have been so thoroughly optimized can be used for forecasting and related analyses. Here, as before, the key is iteration. Different at this stage, however, must be the focus on the decision-maker. Exploring key questions about cybersecurity function, transformations, and relevant exogenous factors inevitably has to be couched in terms understood by decision-makers. A key takeaway from the UCP study is that LLM outputs must be dissected and analyzed to understand points of convergence and divergence. Doing so allows planners to place their own weight on variables that appear critical in determining the shape of some suppositions versus others. Then, so armed, planners can inject these findings directly into decision-maker briefings as an alternative to just directly reporting on the outputs of a few AI models. In other words, it is the cross-comparative analysis of how LLMs come to individually interesting conclusions that matter, rather than the generated scenarios or suggestions themselves. The bottom line: Avoiding the AI CISO When it comes to using LLMs effectively for cybersecurity planning, the bottom line is clear: Planners and executives must avoid the AI CISO. Simply put, the AI CISO concept describes circumstances where an organization uses AI without effectively incorporating humans into not only the decision-making loop, but also conversations about underlying ethical, methodological, and technical practice. The result would be the rise of AI systems as de facto decision-makers. Not Skynet or HAL 9000, of course, but support systems to which we delegate too much of what goes into decision-making. This recent study and others like it lay out initial best practices for accomplishing this. They make the case that effective use of LLMs for robust forecasting and analysis means having humans in the loop at every stage of deployment. More importantly, they make the case that this engagement has to reflect the full range of human expertise — from specialist know-how to investigative skills and marketing savvy — to get the most out of the machine. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe