1. Introduction
Information is abundant regarding contemporary research and professional contexts, yet transforming that information into clear, structured, and visually engaging formats remains a persistent challenge. Whether in academia, industry, or technical fields, the ability to present findings coherently is essential. However, documentation is often labourious, demanding significant cognitive and creative effort to translate abstract, exploratory thinking into polished reports, presentations, or academic posters. Tasks such as layout design, adherence to formatting guidelines, and content structuring frequently slow the creative process and divert attention from ideation. VisRep (Visualisation Report) addresses these challenges as a formatting assistant and a full-fledged visualisation design and ideation recording system. Its contributions lie in automating the capture and structuring of design thinking from design sessions while maintaining the richness and freedom of predesign exploration. Predesign—the critical phase before formal design begins—is where ideas are voiced, re-evaluated, sketched, and rephrased. It is the ambient, messy background where the design direction is shaped. Existing tools like Teams AI or note-taking applications may record meetings or transcribe speech but lack domain-specific visualisation awareness. They do not understand the role of a chart, the reason a design was chosen, or how a diagram supports a specific task or audience. They cannot integrate sketches, questions, or design reasoning into one cohesive artefact. VisRep fills this gap by offering a design-aware, AI-driven workflow that captures, organises, and refines evolving ideas. Users can speak freely, sketch rough concepts, or respond to structured prompts. The system transcribes the conversation, anchors content around eleven targeted design questions, and integrates visual and verbal input into structured deliverables—a report, PowerPoint, or poster. Crucially, it does so with an awareness of the visualisation context: who the audience is, what task the visualisation must support, what outputs are needed, and how the design aligns with broader goals. In this way, VisRep transforms a chaotic ideation session into a shareable, revisitable, and explainable design narrative. VisRep is not just about producing documentation—it is an ideation companion, preserving ideas during the design process’s most fragile and formative stages. Researchers, educators, and professionals regularly generate insights that vanish without a structured mechanism for capture. VisRep ensures that these early contributions are not lost but structured into interactive, visually compelling formats supporting reflection, iteration, and communication. Its intelligent system highlights key insights, clusters related ideas, and eliminates the burden of repetitive formatting—freeing users to focus on the essence of their design. The integration of talk-aloud sessions, sketch capture, and structured generation makes VisRep uniquely suited to visualisation: a field where meaning is often communicated through a blend of narrative, structure, and visual form. This research tackles the following core challenges in the visualisation design process:
How can ideation be captured naturally without disrupting creativity?
How can multiple design ideas be structured, compared, and refined?
How should requirements—such as audience, intent, and output format—be integrated early, rather than added later?
How can exploratory thinking be turned into professional-grade deliverables without manual restructuring?
How can transparency and explainability be embedded into the design process?
These are not merely documentation questions but involve design provenance challenges, cognitive articulation, and practical knowledge transfer. VisRep offers a response through four primary contributions: (1) a novel predesign prompting framework that facilitates natural ideation capture, (2) a curated set of eleven reflective questions that structure and ground user thinking, (3) an interactive application that records, processes, and refines multimodal design input, and (4) a collection of markdown-based output templates tailored to presentations, posters, and reports. These elements create a new paradigm for documenting early-stage visualisation thinking—bridging verbal reasoning, visual sketching, and AI-enhanced structuring to transform informal ideation into coherent, actionable, and shareable outputs.
2. Background
In visualisation and design processes, the predesign phase, where ideas are raw, exploratory, and often unstructured, is crucial yet frequently overlooked in formal documentation. This stage, standard across fields from academic research to product design, is where teams articulate goals, sketch early concepts, raise uncertainties, and frame the design problem. The design’s background songs—informal conversations, scattered sketches, and voice notes—shape the essence of what follows. Existing tools like Teams AI can record meetings and transcribe discussions, but they lack awareness of visualisation intent. They do not understand what diagrams mean, how design decisions evolve, or how to integrate sketches and verbal reasoning into a meaningful, structured output. There remains no standard way to report on predesign in a way that combines narrative, visual, and structural clarity. VisRep addresses this gap by capturing early ideation through talk-aloud transcripts, descriptions of user sketches, and structured reflections and then transforming them into usable formats such as reports, presentations, or posters. Rather than discarding the ambiguity and richness of predesign, VisRep preserves and organises these elements, enabling users to iterate, clarify, and ultimately communicate their design process in full.
A critical component of VisRep’s methodology is its reliance on Think-Aloud (or talk-aloud) protocols, a technique widely used in cognitive science, usability studies, and human–computer interaction research. Think-Aloud protocols allow users to verbalise their thoughts while engaging in a task, making implicit cognitive processes explicit. By integrating this approach into VisRep’s workflow, the system captures the structured components of a report and the user’s conceptual reasoning, intent, and iterative thought process. The benefit of the Think-Aloud protocol is that it prevents information loss during ideation by ensuring that raw, unfiltered thoughts are retained and structured meaningfully. Researchers have long advocated the effectiveness of Think-Aloud protocols in understanding cognitive processes and improving system usability [
1]. When individuals articulate their thoughts during problem-solving or design processes, their explanations become more prosperous and self-reinforcing [
2]. This cognitive externalisation is particularly useful in structured reporting, where clarity and precision are paramount. By allowing users to talk through their ideas, VisRep ensures that even abstract, evolving concepts are captured before they fade from memory.
Moving beyond conceptualisation, VisRep explores the origins and rationale behind an idea through the questions, “How did you arrive at this idea? Was there a particular moment or insight that led to it?” These questions delve into the research methodology, background, and inspiration behind the work. Cognitive psychology research suggests that people recall episodic memory fragments when explaining their thought processes [
3]. Capturing this aspect ensures that contextual narratives—often absent in purely technical documentation—are preserved in structured outputs. To further break down an idea into structured components, VisRep includes the question,“What are the essential components that make this work?” Following this question, it translates ideas into distinct sections using markdown headers. Reflecting the natural cognitive process of chunking information, a concept supported by Shannon’s Information Theory posits that structuring information into discrete, meaningful units enhances retention and communication efficiency [
4]. Recognising the importance of visual elements in structured reporting, VisRep integrates the question, “If you had to explain this visually, what elements would you highlight?” Encouraging users to think about diagrams, figures, and spatial relationships is crucial to design thinking and data visualisation [
5]. Many complex concepts benefit from graphical representation, and by prompting users to identify key visual elements, VisRep ensures that structured outputs include compelling visual aids that enhance comprehension. Beyond the conceptual and structural aspects, VisRep also explores real-world applicability by asking, “How do you imagine this being used in the real world?” This question ensures that users articulate practical applications, use cases, and broader implications. Research in design process theory suggests that anchoring ideas to real-world contexts improves clarity and audience engagement [
6]. VisRep further refines the structured reporting process by prompting users to reflect on potential improvements—“If someone were to improve or expand on this, what would they focus on?” This question introduces future directions, discussion points, and opportunities for iteration, mirroring established practices in iterative design and knowledge evolution [
7].
Unlike traditional systems that impose rigid templates, VisRep allows users to tailor their reports dynamically through the question, “Do you see this working better as a deep dive or a quick overview?” The AI leverages this response to determine the level of detail required, ensuring that users receive an output that matches their needs. This approach aligns with adaptive information retrieval, which suggests that interactive systems should adjust content complexity based on user intent [
8]. To further refine tone, depth, and structure, VisRep asks, “Would you like to emphasise clarity, storytelling, or technical depth?” This question aligns with rhetorical structure theory (RST) research, which explores how communicative intent influences textual organisation [
9]. VisRep ensures that outputs resonate with target audiences by allowing users to specify their preferred emphasis. Finally, including visuals, examples, or comparisons is addressed with the question, “Would visuals, examples, or comparisons help explain this?” VisRep suggests appropriate figures, tables, or graphical representations if the user responds affirmatively. Further, this is particularly relevant in visualisation research, where the integration of images significantly enhances comprehension and retention [
10].
By structuring the ideation and reporting process around Think-Aloud protocols and cognitive science principles, VisRep optimises information capture, organisation, and communication, ensuring that reports and presentations are well-structured and rich in conceptual depth and visual clarity. The remainder of this paper will explore the methodological and technical foundations of VisRep in greater detail, examining how these structured prompts translate into functional, automated reporting.
3. Related Work
Creativity and ideation are inherently interconnected processes that drive innovation across various domains, from research and design to problem solving and visualisation.
3.1. Creativity and Ideation
Creativity generates novel and valuable ideas, while ideation structures them into actionable solutions. Without structure, creativity can become chaotic; ideation may become predictable. Integrating the two ensures originality and practical development [
11]. Inspiration, a key driver of creativity, involves evocation, motivation, and transcendence—highlighting the role of external stimuli and internal drive in elevating ideas [
12]. Ideation captures and refines these moments systematically. Frameworks that support creativity promote free exploration while guiding ideas into coherent outputs [
13]. In visualisation and design, structured ideation helps avoid premature fixation, encouraging iterative development [
14,
15]. Structured methodologies begin with divergent thinking and guide towards application, supporting innovation through clarity and refinement [
13]. In problem solving, especially with complex or ill-structured challenges, flexible yet guided ideation is essential [
16]. The balance between structure and creativity transforms raw insight into meaningful solutions [
15]. A structured ideation process mitigates risks like idea loss or premature convergence. Iterative stages—discovery, abstraction, refinement, and deployment—enable adaptability and focus [
17]. This balance fosters environments where creativity can thrive without chaos or rigidity [
11]. Creativity provides the initial spark; ideation ensures follow-through. Applying structured methods ensures ideas are conceived and effectively implemented in visualisation and design contexts, reinforcing their value in innovation-driven disciplines [
12].
3.2. Idea Generation Methodologies
Generating ideas is central to creativity and problem solving, but raw insights must be developed through structured methodologies. Effective frameworks balance divergent thinking (exploring possibilities) and convergent thinking (refining ideas) to avoid unfocused or impractical outcomes [
18]. A key challenge is transitioning from conceptual thinking to structured outputs. Cognitive scaffolding and feedback loops help individuals organise and evaluate their thoughts while maintaining flexibility [
19]. Rather than stifling creativity, constraints can enhance it by narrowing focus and stimulating alternative strategies. Visualisation plays a vital role in externalising thoughts and revealing unexpected connections. Analogical reasoning and visual comparisons support bisociation—blending concepts from different domains to drive innovation [
20]. Structured visualisation ensures that abstract ideas are transformed into precise representations [
21]. Frameworks like the Genex model structure ideation into four stages—collect, relate, create, and donate [
18]—grounding creativity in context and knowledge integration. Cognitive Flexibility Theory also stresses the importance of adaptability in ill-structured domains [
16]. Research confirms that well-defined constraints can boost originality by encouraging focused exploration [
19]. In learning environments, structured problem framing further stimulates ideation [
16]. The cognitive process of inspiration moves through recognisable stages—from insight to motivation—which structured methods help preserve and refine [
22]. This ensures that ideas are sparked and systematically developed into meaningful innovations. Visualisation methodologies reinforce the value of structure in idea generation. Selecting appropriate representations, from simple charts to network graphs, depends on effective categorisation [
21]. Guidelines for quantitative visualisation emphasise clarity and simplicity [
23]. Computational tools can support ideation by offering interactive, structured environments. Research in human–computer interaction suggests that balancing open-ended exploration with structured organisation keeps users engaged and productive [
18]. Structured ideation ensures creativity is not lost in chaos but captured and refined. By leveraging flexible models, constraint-based strategies, and digital tools, idea generation becomes a disciplined, repeatable process that fosters impactful solutions.
3.3. Critical Thinking and Lateral Thinking
Critical and lateral thinking are complementary cognitive processes essential for generating ideas that are novel, practical, and logically sound. Critical thinking involves analysis, evaluation, and logical reasoning, ensuring that ideas are coherent and feasible [
24]. Without it, ideation risks producing shallow or unworkable concepts. In contrast, lateral thinking encourages nonlinear approaches and unconventional connections to break out of cognitive rigidity and discover unexpected insights [
25,
26]. It fosters creativity by challenging assumptions, reframing problems, and leveraging analogies to generate diverse, original ideas [
19]. While critical thinking refines and grounds ideas, lateral thinking expands possibilities, especially in ill-structured problem spaces where novel approaches are vital. These thinking styles work in tandem: lateral thinking generates possibilities, and critical thinking filters them through logical and ethical constraints [
27]. The Six Thinking Hats framework exemplifies this integration by assigning distinct roles to each mode, such as the green hat for creativity and the black hat for critical judgment [
26]. This alternation prevents bias and supports balanced ideation. This dual approach is crucial in visualisation—designs must be innovative and analytically robust [
21,
24]. Creative cognition research confirms that innovation stems from iterating between divergent (lateral) and convergent (critical) thinking [
27]. This structured interplay avoids stagnation and premature conclusions, ensuring ideas remain flexible yet refined. This balance is essential in innovation-driven fields. Lateral thinking introduces fresh perspectives, while critical thinking structures them into actionable, logical outcomes. Methodologies like Six Thinking Hats and findings from creative cognition offer practical means to embed this dynamic into ideation workflows, promoting sustainable and impactful innovation.
3.4. Where Do Good Ideas Come From?
Good ideas rarely emerge from isolated moments, resulting from iterative, structured processes. Linus Pauling’s assertion that “the best way to have a good idea is to have lots of ideas” underscores the need for volume and systematic refinement [
28]. Ideas evolve through feedback, collaboration, and exposure to diverse perspectives, making network connectivity critical for innovation [
29,
30]. Effective ideation relies on structured methods to organise insights and guide development. Wallas’s four-stage creative process—Preparation, Incubation, Illumination, Verification—remains a foundational model [
31]. Each phase plays a role, from gathering knowledge to refining insights. Without structured capture and refinement, many ideas remain underdeveloped. Talk-aloud methodologies enhance ideation by externalising thought, enabling more coherent and flexible exploration [
32]. Verbalising thoughts fosters fluid idea development and reflects how innovation often arises through interaction, not isolation [
28]. Structured evaluation is essential for turning raw concepts into viable solutions. Studies suggest evaluating ideas based on novelty, workability, relevance, and specificity [
32]. Without structure, valuable insights may be overlooked or lost in excess. Cross-pollination through brainstorming and social interaction fosters environments where ideas can mature and be refined [
29]. Combined with reflection, talk-aloud methods support iterative ideation and deeper engagement [
33]. Cognitive incubation also plays a key role—stepping away allows ideas to unconsciously integrate and evolve, often leading to breakthrough insights [
31]. Innovation stems from integrating structured processes, reflective thinking, and collaborative discourse [
30]. Embracing these principles allows for a sustainable and repeatable approach to generating high-quality, impactful ideas.
3.5. How People Work: Structuralists vs. Conceptualists
People engage with creativity and design through differing cognitive orientations—structuralists favour order and logic, while conceptualists embrace ambiguity and fluidity [
34]. Structuralists believe that insights emerge from analysing underlying systems and prefer structured methodologies, such as heuristics and iterative design frameworks [
35,
36]. They thrive in environments guided by explicit constraints that enhance clarity and coherence. In contrast, conceptualists view ideation as an evolving, exploratory process. Aligned with traditions in art and philosophy, they emphasise emergent meaning through abstraction and reinterpretation [
37,
38]. They gravitate towards open-ended problem solving, valuing flexibility over fixed structures. This divide is evident in design methodologies: structuralists often follow models like the Double Diamond, while conceptualists lean towards iterative, nonlinear processes. Linguistics and visual art reflect similar distinctions—some traditions favour formal grammar or structured visuals, while others prioritise interpretive meaning and context [
30,
36,
39]. Despite differences, both orientations offer value. The most effective ideation frameworks integrate structure and flexibility, guiding thought while allowing conceptual expansion [
28]. This balance is particularly relevant in interdisciplinary work. Cognitive flexibility—shifting between structured and fluid thinking—enhances creative problem solving [
32]. Systems that support this adaptability accommodate diverse cognitive styles, empowering users to work in ways that align with their preferences. The goal is not to enforce one thinking style but to design environments where structure and abstraction coexist [
29,
33]. By embracing both perspectives, we foster inclusive and innovative spaces for ideation and design.
3.6. The Role of Structured Creativity, Idea Preservation, and Recall
Structured methodologies enhance creative outcomes by ensuring that ideas are preserved, refined, and developed into usable outputs. Rather than restricting free thought, structure prevents innovative insights from fading before they can be applied [
40]. One major challenge in creative work is cognitive overload—generating many ideas without a system to capture and organise them. Research on note-taking shows that externalising thoughts supports understanding and long-term recall, helping individuals offload mental effort while retaining valuable insights [
41,
42,
43]. Tools like mood boards and concept maps extend structured note-taking, aiding recall and enabling nonlinear exploration of abstract ideas [
32,
44]. Despite their benefits, these tools are often underutilised. Structured methods also protect ideas from premature judgment and organisational pressures that can stifle innovation [
45]. In environments driven by efficiency and standardisation, creativity may be suppressed—not intentionally, but due to a lack of space for exploration. Deliberate methods for ideation and review counteract this, allowing new ideas to mature [
46]. Recording ideas alone is not enough—review and retrieval are equally vital. Structured systems like concept mapping and iterative reflection support memory and help recontextualise earlier ideas into new solutions [
47,
48]. This prevents forgotten insights and encourages continual development [
41]. Shared creativity also thrives through iteration; collaborative idea sharing and networked creativity are most effective when supported by revisitation and revision [
29].
Investment theory suggests that early-stage ideas may initially seem unremarkable but gain significance with time and nurturing [
46]. Structured creativity overcomes memory limits, avoids overload, and builds a sustainable innovation process [
49]. By combining note-taking, visual tools, and structured review, individuals move beyond fleeting inspiration towards meaningful execution. When supported with systematic preservation and recall, creativity evolves into a lasting, impactful practice.
Recent human–AI collaboration advancements have explored how structured prompts and computational reasoning can support presentation generation and information structuring tasks. For example, slide generation from computational notebooks has been shown to benefit from outline-based guidance, enabling systems to maintain coherence and improve content flow by aligning outputs with author intent [
50]. Similarly, collaborative tools that blend user-generated outlines with AI-driven formatting have demonstrated the effectiveness of modular design scaffolds in producing slide decks that retain narrative intent and domain specificity [
51]. These approaches support the growing recognition that structured intermediate representations such as outlines, summaries, or tagged inputs can enhance AI interpretability and output quality, especially in design or reporting contexts.
4. The VisRep Framework
Figure 1 shows the how the framework utilises a talk-aloud session and structured prompting to give a LLM the optimum opportunity to take all the data from a design session and produce a meaningful output. VisRep is designed to guide users through an interactive and structured content generation process that captures, refines, and transforms ideas into high-quality visual outputs. The system follows a modular architecture that begins with a talk-aloud stage. It allows users to freely verbalise their thoughts before engaging with AI-driven text processing, content structuring, and visual formatting—each operating as independent yet interconnected components. This talk-aloud phase serves as a cognitive reinforcement mechanism, ensuring that key ideas are naturally repeated, clarified, and contextualised before being processed by the AI. Users interact with the system through a command-line-inspired terminal interface, where structured questions prompt further elaboration, reinforcing critical details and filtering out irrelevant information through repetition-driven refinement. The AI then converts these structured responses into markdown-based representations, which are seamlessly formatted into various outputs, including reports, slides, and posters. Integrating verbal ideation with structured questioning ensures that creative insights are preserved and refined through iterative reinforcement, maintaining clarity and coherence throughout the content generation.
To facilitate this process, VisRep employs a set of eleven targeted questions designed to extract key aspects of a user’s concept while maintaining flexibility and adaptability. The system begins by asking foundational questions to establish the core idea, such as “Tell me about what you are working on. What is the core idea or challenge?” These enquiries provide a high-level overview and ensure the content’s fundamental purpose is well defined. It follows with “What makes this unique or different?” to capture distinguishing features and innovative elements. Since misunderstandings often arise in complex topics, the system asks, “If someone were to misunderstand this, what would they likely get wrong?” to clarify potential misconceptions and ensure precise communication. Once the foundation is established, the system digs deeper into the design process by exploring the origins and structure of the idea. The questions “How did you arrive at this idea? Was there a particular moment or insight that led to it?” help extract the user’s thought process, previous iterations, and influences that contributed to the idea’s development. To break down the core structural components, the system asks, “What are the essential components that make this work?”, which helps organise the response into logical markdown headers. Recognising the importance of visualisation in effective communication, VisRep also prompts users with “If you had to explain this visually, what elements would you highlight?”, ensuring that key graphical or illustrative elements are captured for use in slides and posters. Moving beyond conceptualisation, the system then explores real-world applications and future development. The question “How do you imagine this being used in the real world?” guides the user towards practical applications and expected impact. At the same time, the question “If someone were to improve or expand on this, what would they focus on?” encourages reflection on limitations, potential enhancements, and next steps in the evolution of the idea. Without explicitly forcing a rigid format, VisRep adapts its output structure based on user intent and content type. To determine the depth and scope of the final output, the system inquires, “Do you see this working better as a deep dive or a quick overview?”, allowing the AI to adjust the level of detail. It further refines presentation style by asking, “Would you like to emphasise clarity, storytelling, or technical depth?”, ensuring that the generated output aligns with the user’s intended communication style. Finally, to enhance clarity and engagement, the system asks, “Would visuals, examples, or comparisons help explain this?”, prompting the AI to suggest additional figures, tables, or illustrative content. The strength of this approach lies in its balance between structured guidance and creative flexibility. By employing a questioning strategy rooted in cognitive science, information theory, and design process methodologies, VisRep ensures that users generate high-quality structured content with minimal cognitive overhead. The system does not impose a fixed structure but instead adapts to natural thought processes, extracting rich, meaningful responses while preserving the fluidity of idea development. By aligning with recognised principles such as Hierarchical Task Analysis (HTA), Mental Model Theory, and the Double Diamond model, VisRep effectively mirrors real-world design workflows and enhances the efficiency of content creation.
The final markdown-based output is then processed into a structured document, ensuring that ideas are preserved, refined, and accessible for future use. Unlike traditional brainstorming or note-taking, where insights can be lost or remain unstructured, VisRep guarantees that each session results in a tangible, organised output. This methodology aligns with research on cognitive effort in note-taking, demonstrating that structured externalisation of ideas significantly enhances recall and creative engagement. By ensuring that each interaction results in a coherent, formatted document, the system eliminates the need for users to rely on memory or disorganised notes, allowing for seamless recall and iteration over time.
4.1. Eleven Questions
To support reflective ideation without imposing rigid structure, VisRep employs a set of eleven carefully designed questions (see
Table 1) that align with established theories in cognitive science, information theory, and design process methodology. These questions were chosen to mimic how individuals naturally articulate design ideas, progressing from conceptual overview to detailed structure, real-world application, and presentation strategy. Rather than asking users to categorise or formalise their ideas prematurely, the questions guide them through a conversational reflection that elicits meaningful, structured responses. The framework draws from Hierarchical Task Analysis to capture layered thinking [
52], Mental Model Theory to anticipate and clarify misconceptions [
53], and Information Theory to maximise insight from minimal prompts [
4]. Together, these prompts enable the system to extract high-quality, context-rich input that can be mapped onto structured markdown formats, supporting automated output generation without sacrificing creativity or narrative flow.
4.2. User Groups
VisRep was designed with multiple user groups in mind, each with distinct but overlapping needs in the context of ideation, structuring, and visualisation. The system’s flexible, question-driven approach supports a range of scenarios from early-stage concept development to structured report generation, making it suitable for students, researchers, and educators. For students, VisRep acts as a reflective learning tool. The eleven questions map directly onto key stages of the visualisation design process, helping students deconstruct a design brief, articulate the purpose of a visualisation, and evaluate potential misunderstandings. By prompting structured elaboration, VisRep reinforces critical thinking and enhances understanding of core visualisation principles through active engagement rather than passive learning. For, VisRep serves as a memory scaffold and idea organiser. Academic work often involves juggling multiple concurrent ideas over long periods. By capturing design rationale and conceptual structure in a consistent markdown format, VisRep enables researchers to return to dormant projects with minimal reorientation. The reflective prompts help articulate novel contributions, track intellectual evolution, and generate ready-to-share material for publications, talks, or collaborations. For educators and academics, VisRep provides a pedagogical scaffold for teaching visualisation principles. The eleven questions embody foundational concepts such as design intent, visual encoding, audience needs, and evaluation criteria. Educators can guide students through complex design challenges by structuring classroom activities or formative assessments around these prompts while encouraging introspection and iteration. The consistent format also simplifies review and feedback by providing clear checkpoints in student thinking.
VisRep accommodates these user groups and operates as a bridge between informal ideation and structured output. It aligns its questioning framework with diverse cognitive workflows while maintaining a universal format for downstream processing.
5. The VisRep Implementation
VisRep is implemented as a web-based application using Dash 3.1 and Python 3.12: combining AI-assisted transcription, structured prompting, and visual output generation to support the design and reporting process—see
Figure 2,
Figure 3 and
Figure 4. While this is an early prototype, we plan to keep advancing the project and include a full prototype in future publications.
5.1. The Components
Before any AI-driven processing occurs, VisRep is built upon two foundational components that structure ideation in a lightweight, human-centred way: a real-time transcription module and a fixed set of reflective questions. These elements are essential in grounding the system in user intention, allowing for creative expression while creating a scaffold for structured thinking. The first is the transcription module, which passively captures the verbal ideation process during talk-aloud sessions. Users are encouraged to verbalise thoughts naturally as they sketch, plan, or brainstorm. The system does not interrupt or interpret at this stage; instead, it listens silently, storing every sentence for later use. This phase supports cognitive externalisation, enabling users to offload ideas in raw form while maintaining a continuous flow of thought. The goal is to preserve the spontaneity and richness of unfiltered ideation—ideas often lost when not immediately documented.
The second core component is a structured prompt framework, which appears after the talk-aloud session concludes. This consists of eleven carefully designed questions to guide users through structured reflection. These questions move from conceptual framing (e.g., “What is the core idea or challenge?”) to clarification (“What makes this unique?”). Moreover, output-oriented thinking is also needed (“What visuals or formats would best communicate this?”). The questions do not aim to constrain creativity but rather to focus on it—ensuring that important insights are verbalised, organised, and contextualised. This two-part structure (verbal capture followed by guided reflection) bridges free ideation and AI-assisted structuring. By combining passive recording with active prompting, VisRep ensures that users have a clear foundation from which automated processes can begin to generate visual and textual outputs. These components reduce the risk of idea loss and make the design process more transparent, repeatable, and adaptable across multiple projects and audiences.
5.2. Large Language Model
Once the raw transcript and structured question responses are captured, the next stage in VisRep’s pipeline involves transforming these informal, fragmented thoughts into coherent, structured text using a large language model (LLM). This stage marks the system’s shift from human-guided ideation to AI-assisted synthesis. The LLM processes these data, while the initial input may be disjointed—containing pauses, repetition, and incomplete ideas. It produces refined, expanded narratives better suited for formal outputs such as reports, presentations, and posters. To understand how this process works, it is helpful to consider what an LLM is and how it functions. A large language model is a type of artificial intelligence trained on vast amounts of text from books, articles, websites, and other sources. Rather than storing predefined answers, the model learns patterns in language—how words relate to each other, how ideas flow, and how context shapes meaning. It does this by operating like a highly advanced form of predictive text. Much like the autocomplete feature in messaging apps, an LLM attempts to predict what word, sentence, or paragraph should come next based on the input received. However, the difference is scale and sophistication. LLMs operate with billions of parameters, allowing them to model basic grammar, nuanced tone, logical structure, and conceptual depth. In the context of VisRep, this predictive ability is used to reframe the ideation session into a straightforward, structured narrative. For instance, a user might speak a fragmented line like “um…the birds… terns and cranes… going north more often, maybe climate thing?” The model takes this input—along with answers from the structured questions such as “What is the core idea?”—and expands it into something like the following: “This study investigates how climate change has influenced northward shifts in the migration patterns of species such as Arctic terns and cranes, using satellite tracking data collected over the past two decades”. The LLM’s task is not to invent new content but to interpret the user’s intent and articulate it more clearly, drawing on its academic, technical, and natural language training.
Once the raw content is refined, the next challenge is structural decomposition—segmenting the improved output into formats suitable for dissemination. LLMs are particularly well suited to this because they can adjust tone, depth, and structure based on the intended output. For instance, the same core idea may be expressed differently depending on whether the final product is a PowerPoint slide, a poster, or a written report. A slide might require short, punchy headings and visual cues, while a report demands detailed paragraphs with citations and narrative flow. The LLM can adapt the same base content into these different forms, intelligently selecting what to expand, summarise, and highlight. This flexibility comes from the model’s ability to interpret what was said and why it was said—using the Q&A inputs as thematic anchors. For example, the answer to “What makes this unique”? will guide the LLM to emphasise novelty or methodological innovation, while “What visuals would help explain this?” helps the model highlight elements most suitable for visual encoding. This allows VisRep to produce multiple structured outputs from a single session, each tailored to its format. A poster might foreground bullet-point insights and visuals; a PowerPoint deck might become a linear story with visual transitions; a report might break content into abstract, background, method, findings, and conclusion—each paragraph derived from and linked to the structured prompts. Consistency and tone control are key benefits of using an LLM in this process. Human ideation sessions often shift registers—from technical to casual language or specific details to vague generalisations. The LLM smooths these transitions, ensuring the final text is readable, coherent, and aligned with the expected tone of academic or professional communication. It can also act as an editor, removing redundancy, clarifying ambiguous phrasing, and suggesting improvements without losing the user’s original intent.
In VisRep, the LLM is not used to replace the user’s ideas but to amplify and structure them. The system maintains traceability between original inputs and final outputs, allowing users to review, refine, and even revise their content post-generation. This supports iterative design and ensures transparency—users remain in control of their narrative, while the AI functions as a collaborator rather than a black box. In summary, integrating a large language model in VisRep is a transformative middle layer: converting informal spoken thoughts and reflective responses into polished, structured content. By leveraging the predictive and generative power of LLMs, the system bridges the gap between ideation and communication—enabling users to turn spontaneous insights into well-formed, format-ready outputs that are meaningful and coherent.
5.3. Stylesheet
While the primary focus of structured content generation is capturing and refining ideas, an equally important stage in the process is applying a consistent style to AI-generated outputs. This ensures that reports, slides, and posters maintain visual coherence while remaining adaptable to different presentation formats. Each output type in the system—whether a markdown-driven report, PowerPoint presentation, or research poster—follows a dedicated stylesheet that governs layout, typography, colour schemes, and structural elements. This approach enhances readability and professionalism and reinforces the importance of visual identity in effective communication. The styling process operates as a distinct but integrated stage of the content pipeline. Once the AI generates structured text and visual elements, the system maps the content to predefined style templates, ensuring that key elements such as headings, emphasis markers, figures, and tables align with the conventions of their respective formats. Reports, for instance, prioritise readability and hierarchical organization using structured headings, consistent spacing, and academic-style citations. PowerPoint slides, by contrast, emphasise concise content, visual hierarchy, and slide transitions to enhance audience engagement. Posters demand a highly visual layout, balancing text density with graphical clarity to accommodate research communication standards. This structured application of stylesheets introduces an additional layer of automation in visual formatting, reducing the manual effort required to refine and standardise output designs. While this may not represent a groundbreaking contribution to visualisation research, it underscores the importance of adaptable, predefined formatting in AI-assisted content generation. By ensuring that AI-generated outputs retain structural integrity and aesthetic consistency, this methodology reinforces the idea that content presentation is just as critical as content generation, particularly in academic and professional settings.
5.4. Prompt Engineering
Prompt engineering is a critical layer within the VisRep system, shaping how raw input—transcripts and reflective question responses—is converted into well-structured outputs [
54]. Rather than issuing vague or open-ended instructions to the language model, VisRep uses carefully engineered prompts incorporating format-specific templates, contextual guidance, and iteration loops to ensure coherence, clarity, and relevance [
55]. These prompts provide the large language model (LLM) with a detailed sense of what the output should “look like,” acting as both instruction and scaffold for generation rather than ad hoc tasks [
56]. One of the most effective techniques used in this process is the inclusion of markdown-style structural samples for each output format. Each output type within VisRep—whether a slide deck, a report, or a poster—has a different rhetorical goal and structural convention. To help the LLM meet these expectations, VisRep supplies a lightweight markdown template within the prompt itself. These templates define the sections that should be included, the style in which the content should be delivered, and the level of depth expected in each part. For example, a prompt for a slide deck might include the following:
## Slide 1: Title Slide |
- Title: |
- Subtitle: |
- Presenter: |
|
## Slide 2: Problem Statement |
- Heading: |
- Key Points: |
|
## Slide 3: Method |
- Heading: |
- 2--3 Bullet Points: |
This approach communicates the “shape” of the content to the model without hardcoding rigid structures. It allows the LLM to understand what the user wants to say and how it should be organised and presented to suit the format. Similarly, reports use more formal structures, guiding the model to generate cohesive paragraphs under each section header as outlined below:
# Title |
## Abstract |
[Summarise key findings in 3--5 sentences.] |
|
## Introduction |
[Explain the research motivation and background.] |
|
## Methodology |
[Describe how the data was collected and |
analysed.] |
|
## Results |
[Present key findings with optional visual |
references.] |
|
## Conclusion |
[Summarise the significance and next steps.] |
Posters, on the other hand focus on telling a story [
57], and they have a visual and spatial logic that differs from both reports and slides. To support poster generation, VisRep uses markdown prompts such as those shown below:
# Poster Title |
## Summary |
[A short paragraph summarising the project.] |
|
## Key Findings (use bullet points) |
- Insight 1 |
- Insight 2 |
- Insight 3 |
|
## Visual Suggestions |
- Heatmap showing migration shifts |
- Comparative chart of bird species |
|
## Quotes or Highlights |
“Migration patterns are shifting due to |
rising temperatures.” |
|
## Contact/References |
[Author names, emails, relevant links.] |
By embedding markdown structures directly into the prompt, VisRep sets clear expectations for the LLM, reducing ambiguity and increasing the likelihood of generating usable outputs on the first attempt. This method effectively “trains the model on the spot,” aligning it with specific documentation goals without requiring formal fine-tuning. Each markdown section is anchored to the talk-aloud transcript or one of the eleven structured questions. For example, the answer to “What is the core idea or challenge?” may populate the introduction in a report or the first slide of a presentation. This ensures that the final content remains grounded in user input, promoting consistency and avoiding hallucinations. To further align tone and depth with different formats, VisRep applies role conditioning within prompts. Depending on the context, the LLM may be asked to behave as a scientific editor, a visualisation designer, or a presentation coach. This enables the same content to be appropriately adapted across formats, from formal paragraphs to visual summaries. Although structured prompts are effective, occasional misalignment still occurs. VisRep addresses this through a lightweight verification and correction step. A targeted follow-up prompt is issued to amend the output if sections are missing or off-topic. These corrections are minimal due to the well-defined scaffold, enabling fast, iterative refinement.
Finally, structuring output in markdown supports modular reuse. Without rephrasing or duplication, bullet points, summaries, and visual suggestions can be moved between formats—slides, posters, and reports. This modularity is central to VisRep’s vision of turning spontaneous ideation into structured, shareable, and adaptable content across diverse outputs.
5.5. UI
VisRep adopts a minimalist terminal-style interface to focus user attention solely on reflective input and response. Unlike conventional web interfaces cluttered with panels, icons, or menus, the terminal aesthetic serves a specific cognitive function: it eliminates distraction and foregrounds the act of structured ideation. This aligns with the system’s goal of supporting uninterrupted, introspective engagement rather than exploratory browsing or interface manipulation. The decision to use a terminal metaphor is not purely stylistic. It reflects a principle of single-purpose clarity, and every UI element in VisRep exists to encourage user reflection or display the structured progression of ideas. The lack of visual clutter reinforces the system’s role as a thinking companion rather than a graphical design tool. From a systems perspective, this choice also reduces overhead, simplifies cross-platform deployment, and reinforces traceability between input and output. In addition, the aesthetic evokes historical associations with early AI interfaces and command-line systems interfaces that demanded direct, purposeful interaction. While this lineage is not central to VisRep’s technical function, it resonates with the broader design ethos, emphasising intentional, dialogic engagement over aesthetic complexity.
Importantly, the terminal UI integrates seamlessly with VisRep’s broader workflow, acting as an input collector and feedback surface. It guides users through reflective prompts and reveals their structured responses in real-time, creating a sense of momentum and clarity throughout the ideation process. This approach reduces the cognitive load for users less familiar with conventional design tools, making VisRep accessible and usable without prior training.
6. Usage Scenario and Testing
In this scenario, two researchers, Alex and Doyle (see
Figure 5), are working on a long-term project examining how climate change affects bird migration patterns. Sitting together in an informal setting, they begin the ideation process not by writing or sketching but by simply talking. Over coffee, they dive into their observations: two decades of satellite tracking data combined with environmental information like temperature shifts and seasonal anomalies. They speculate, interrupt each other, revisit past insights, and make loosely formed suggestions about possible visualisations. Alex suggests using animated maps to show route changes over time; Doyle adds that a comparative view across species like Arctic terns and swallows could reveal critical patterns. Their dialogue is raw, disorganised, and unfiltered: the kind of session that often fuels excellent ideas but is notoriously difficult to document effectively. This is where VisRep begins to work. Rather than waiting for a polished idea, the system captures the entire design conversation in real time. The talk-aloud session is transcribed, with key elements such as repeated ideas, visual concepts, or expressed uncertainties tagged for downstream relevance.
The transcript becomes more than a flat record; it is a living design artefact. Each phrase, pause, or speculative leap contributes to the foundation of structured ideation. VisRep’s unique strength lies in this stage, which documents the cognitive landscape of design before structure or documentation gets in the way. Following the discussion, Alex and Doyle transition into a more reflective phase. Here, VisRep introduces eleven structured questions that serve as a bridge between chaotic brainstorming and deliberate articulation. These questions ask for the following clarifications: What is the core idea? What makes this project unique? Who is the audience? What output formats are expected? Each answer helps to scaffold future content generation. For example, when asked, “Does this work better as a deep dive or a quick overview?”, the researchers clarify the need for a detailed scientific report and an engaging poster for outreach. These preferences are embedded into the design trajectory, informing the structure and tone for future outputs. Importantly, VisRep does not summarise or reinterpret during this process. Instead, it prepares: the transcript is layered with metadata, the Q&A responses are tied to specific output elements, and the system waits for the user’s approval before initiating the next phase. The researchers are not forced to constrain their creative process to match documentation requirements. Instead, their evolving thought process is preserved faithfully, ready to be refined later. When Alex and Doyle return to their project two weeks later, they do not start over. They resume where they left off, with a detailed record of what was said, why it mattered, and how they planned to communicate it. This approach demonstrates VisRep’s core contribution: it captures design insight before it fades and preserves it in a structured, contextualised way. Traditionally, these early-stage moments of ideation are either lost or later reconstructed inaccurately from memory. VisRep eliminates that fragility by embedding structure into creativity without sacrificing early design sessions’ fluid, exploratory nature. VisRep’s evaluation methodology builds directly on this workflow. The system was tested through comparative studies, contrasting AI-assisted outputs with traditional, manually generated content. The key metric was how fast users could produce reports, slides, and posters and how well these outputs communicated their core insights. The structured transcript and Q&A provide the raw material, but the transformation into usable content occurs through a large language model (LLM) guided by prompt engineering and markdown templates. This modular system allows users to generate nine-slide presentations, single-page scientific posters, or detailed technical reports, all tailored to the original content and audience. A defining aspect of VisRep is its dual accommodation of different cognitive working styles. Conceptualists, those who think aloud and iterate fluidly, benefit most from the initial transcription and talk-aloud stage. They can freely express ideas without prematurely categorising them. In contrast, structuralists, those who prefer organised thinking and methodical progression, are supported by the eleven-question framework, which systematises content without stifling ideation. This balance makes VisRep broadly usable across disciplines and personal work styles. Crucially, VisRep also excels in visual formatting. It converts structured ideas into visually compelling outputs using predefined style guides and modular markdown grammar. The system handles layout, typography, and visual hierarchy, producing consistent deliverables with minimal user effort. Whether generating a report with an academic tone and citations or a poster with bold headlines and comparative images, VisRep ensures that outputs are functional and engaging. This reduces the burden of formatting and allows users to focus on refining their ideas instead of fighting with document structure. Studies indicate that users completed their documentation tasks faster with VisRep than traditional methods, with improved structure and consistency. The AI-assisted system flagged missing sections, clarified vague language, and adjusted tone depending on the output type. It could also respond to secondary prompts like “Add a missing results section” or “Simplify the language for a general audience,” making it easy to iterate without starting from scratch. This evaluation also highlights VisRep’s contribution to reproducibility and design provenance. By embedding metadata in transcripts, tagging relationships across ideas, and recording decisions at each phase, the system creates a complete, traceable record of the design process. This is especially valuable in academic settings where transparency, reproducibility, and clarity of communication are vital.
VisRep’s impact is not limited to text alone. The system is being expanded to support real-time visual storytelling and context-aware image generation for posters and slides. Users will soon be able to fine-tune the visual design of outputs, bridging the final gap between AI structuring and user-driven aesthetics. Performance optimisations are also underway to reduce current processing times to near real-time for more seamless interaction. Ultimately, VisRep offers a paradigm shift in how design thinking is captured, processed, and presented. Bridging the cognitive gap between raw ideation and structured output transforms the often-fragmented design process into a continuous, repeatable, and intelligible workflow. VisRep provides a guided, intelligent framework for turning thoughts into structured, impactful documentation from initial conversation to final presentation. Whether used by researchers, educators, or industry professionals, VisRep enhances creativity, reduces cognitive load, and produces clear, consistent, and communicatively powerful results.
7. Future Work
Future developments of VisRep will focus on integrating computer vision techniques to interpret sketching activities performed on suspended surfaces, whiteboards, or other physical media during design sessions. Currently, image selection within VisRep is based on a lightweight, deterministic approach: a predefined catalogue of thematic images is maintained, and the system selects the image corresponding to the keyword or phrase with the highest frequency across the transcript and reflective responses. While effective for matching core ideas with illustrative visuals, this method does not dynamically generate new visuals. Integrating an image-based large language model (LLM) for real-time visual generation remains an exciting direction for future work. However, current limitations in such models, particularly their inability to produce accurate, meaningful, or domain-specific visualisations, pose a significant challenge. Unlike structured charts or scientific diagrams, generative models often prioritise aesthetic qualities over factual correctness, making them less suitable for visualisation tasks where precision and clarity are paramount. Therefore, VisRep’s current version deliberately omits automatic image generation to maintain the integrity and relevance of visual content. In the future, hybrid methods may be explored to combine keyword-driven selection with visual LLM enhancements, providing both relevance and flexibility. By fusing visual input with verbal transcription and structured Q&A responses, the long-term vision is to construct a rich, multimodal dataset capturing the full context of the session. This data will be processed by a large language model conditioned in real time and trained on the fly to interpret visual and conceptual elements. The result will be highly customised, format-specific outputs that reflect the spontaneity of ideation while preserving structure and clarity. This vision advances VisRep’s goal of supporting freeform, creative thinking with intelligent, structured documentation. Performing a more detailed evaluation of the completed system would also elevate the quality and need for a system like VisRep.
8. Discussion and Conclusions
One of the most compelling future applications of VisRep is its potential role in education, particularly in fostering creativity and structured thinking in students. Traditional note-taking has declined in modern classrooms, often replaced by passive learning methods or digital slides that do not encourage engagement with the material. Research has shown that active note-taking enhances comprehension and retention, yet many students struggle with structuring their notes effectively. VisRep could be leveraged to transform lecture content into structured, interactive visualisations, helping students engage more deeply with lessons. Automatically generating structured reports or visual summaries of class discussions could be a real-time learning aid, reinforcing key concepts and encouraging students to develop creative interpretations. Instead of merely transcribing information, students could interact with their notes, refining and expanding on them over time, leading to stronger conceptual understanding and long-term knowledge retention.
Beyond individual creativity, VisRep’s ability to store and structure ideas over time suggests its potential as a long-term knowledge management tool. Many great ideas are lost due to a lack of organisation or forgotten over time. By providing a structured repository for ideation, VisRep allows users to return to past ideas, refine them, and develop them further. This is particularly valuable in academic research, product development, and long-term creative projects, where ideas must evolve before reaching their full potential. The ability to track iterations of an idea, visualise its growth, and recall past insights positions VisRep as more than just a documentation tool—it becomes a dynamic system for creative evolution, helping individuals and teams build on previous work rather than starting from scratch. A further expansion of VisRep’s capabilities could involve collaborative intelligence, where users are connected based on similar ideas, research topics, or creative interests. By analysing structured inputs, VisRep could suggest potential collaborators, research groups, or professional networks that align with a user’s area of interest. This could lead to more effective interdisciplinary collaborations, where individuals with complementary skills and shared objectives can be brought together through AI-driven idea-matching mechanisms. The potential for VisRep to act as a networking tool for researchers and creatives introduces a broader vision: a system that not only structures and refines ideas but also facilitates meaningful connections between individuals working on related problems. This research project set out to answer fundamental questions about managing the design process, organising ideas, integrating requirements, structuring reports across different formats, and ensuring transparency in idea refinement. Through the development and evaluation of VisRep, we have demonstrated a practical framework for structured creativity, combining AI-driven refinement with user-guided ideation. The system successfully balances structured and unstructured thought, enabling researchers, educators, and professionals to capture, organise, and visually refine their ideas.
VisRep’s modular architecture ensures that structured reporting and visualisation are efficient and adaptable. It accommodates different cognitive styles, providing conceptualists a space for free ideation while supporting structuralists with guided refinement. Integrating talk-aloud sessions, structured questioning, and visual formatting enables a fluid transition from raw ideas to polished outputs, addressing the challenge of documenting and communicating ideas effectively. The evaluation of VisRep has demonstrated its efficiency in structured content generation, its impact on visualisation clarity, and its ability to preserve ideas for long-term use, reinforcing the importance of structured methodologies such as the Double Diamond model in automated reporting. The future of VisRep lies in expanding its functionality, including real-time style customisation, enhanced visualisations, and AI-assisted collaboration networking. By addressing the decline of note-taking in education, enhancing long-term idea storage, and potentially fostering new research collaborations, VisRep positions itself as a powerful tool for structured ideation, creativity, and visualisation. In doing so, it advances the field of automated report generation, offering a new paradigm for capturing, refining, and preserving knowledge in an increasingly digital world.