Does AI only repeat what it has learned?

24.3.2025

Artificial intelligence is often criticized with claims that it can only repeat its training data, and therefore always produces plagiarized and average output. Is there any truth to these claims?

Claim 1: AI retrieves its answers from a database

I’ve encountered this claim often. The idea is that AI retrieves answers from its database, and thus it plagiarizes or fails to find the correct answer. Large language models and image-generating AI models do not, by default, have access to any kind of database. Instead, these models have learned to generate responses independently. The image or poem produced by AI, for example, does not exist as-is in any database.

Large language models don’t use databases, but they can be connected to one

Today, large language models can indeed be connected to a database. Currently, the most common method for doing this is the so-called RAG model (Retrieval Augmented Generation). In this setup, the AI can retrieve information from a database to support its answer. However, the AI still writes the response itself.

Claim 2: AI only produces average answers

This claim is more complex, as there are many types of generative AI models. Images are often produced using diffusion models, which begin with a random mess of pixels and gradually transform that noise into a better image. The AI aims to reach some sort of average optimal output, so its tendency is toward the mean.

Diffusion models run iteratively – each iteration creates a better image, one that’s also closer to the average. Somewhere between the initial noise and the average optimal lies an iteration where the AI produces good images that haven’t yet converged into uniform, average-looking ones. These images are by no means simply average, even if they inevitably share something with the optimal average.

With an update, Adobe Firefly began producing better, though very similar, images

What about large language models? They also aim to produce the best possible answer, which often results in an average-like response depending on the prompt. However, large language models have a feature that allows you to adjust the temperature, which influences how average or creative the responses are. At the extremes, adjusting the temperature can make the model generate either extremely bland text or pure nonsensical gibberish.

Emergent intelligence

The intelligence of large language models is emergent. They can generalize from what they’ve learned to completely new tasks. This simply means that AI models can generate responses to questions they’ve never encountered in their training data. These responses are not merely average repetitions of what’s already been learned, as the AI cannot just mimic its data like a parrot would.

Adobe Firefly’s training data guides it so heavily that it cannot generate a wine glass filled to the brim

Image-generating AI models do not show the same level of emergent intelligence, as their training data influences their output more heavily than with text models. It can often be nearly impossible to get certain kinds of images from them.

Average or not?

The claim that AI only produces average responses oversimplifies things. Training data influences AI more or less depending on the model, but that doesn’t mean AI is only capable of producing dull, obvious answers. AI also doesn’t just repeat what it has learned, since it’s trained to provide responses to problems it has never encountered before.

Comments

No comments

Comment Cancel reply

You might also be interested in these

Metropolia’s AI research strongly featured in an international workshop

16.12.2025 Exploring AI

Metropolia and the University of Eastern Finland jointly organized the IWCLUL workshop (International Workshop on Computational Linguistics for Uralic Languages), which brought together researchers of Finno-Ugric languages from across Europe. The workshop was held as part of the international ACL community and provided an up-to-date overview of language technology research on Uralic languages, especially in the era of artificial intelligence and large language models. A broad range of Metropolia’s research Metropolia’s AI research was exceptionally well represented at the workshop. Four full papers produced at Metropolia were accepted for the workshop, addressing both pedagogical and language technology topics from multiple perspectives. The paper From NLG Evaluation to Modern Student Assessment in the Era of ChatGPT: The Great Misalignment Problem and Pedagogical Multi-Factor Assessment (P-MFA) examined the impact of artificial intelligence on assessment practices in higher education. The study highlighted the so-called Great Misalignment Problem, where assessment no longer measures what it is intended to measure when students can produce high-quality outputs using generative language models. The paper introduced a new Pedagogical Multi-Factor Assessment (P-MFA) model, which emphasizes the learning process, diverse forms of evidence, and pedagogical transparency rather than single final products. In a paper co-authored with Waseda University, Benchmarking Finnish Lemmatizers across Historical and Contemporary Texts evaluated Finnish lemmatization tools on both contemporary and historical data. The study made use of the Project Gutenberg corpus and, for the first time, included the Trankit tool in a comparison of Finnish lemmatization. A key finding was that Murre preprocessing significantly improves lemmatization results for dialectal and historical texts, while its impact on modern Finnish is minimal. In the image, Aki Morooka is talking about normalization experiments. A timely application of artificial intelligence to foresight was presented in the paper ORACLE: Time-Dependent Recursive Summary Graphs for Foresight on News Data Using LLMs. The study developed a new method in which temporally evolving recursive summary graphs are constructed from news data using large language models. The ORACLE approach enables the analysis of developments and emerging trends in news content by combining temporal structure with language model–based summarization. The fourth paper, co-authored with the University of Helsinki, Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures, focused on machine translation for endangered Uralic languages. The study compared reasoning-based and non-reasoning architectures of OpenAI’s GPT models and analyzed their performance on low-resource languages. The results provide valuable insights into which types of language model solutions are best suited for supporting small and endangered languages. Metropolia’s lightning talks: agile openings on topical themes Metropolia’s visibility at the IWCLUL workshop was not limited to full research papers but extended strongly to the lightning talks as well. The lightning talks provided a concise yet substantively rich overview of rapidly developing research directions that are central to language technology for Uralic and other small languages. The lightning talk UralicMCP: Turning LLMs into Experts in Endangered Languages with MCP presented a new Model Context Protocol (MCP)–based extension to the UralicNLP library. The core idea of UralicMCP is to connect large language models with rule-based language technology tools such as a morphological analyzer, inflector, lemmatizer, and dictionaries. This makes it possible for language models to perform NLP tasks even in endangered Uralic languages for which they have little to no training data. Experiments presented in the lightning talk showed that, with MCP, language models can succeed in tasks that would otherwise be impossible for them. Lev Kharlashkin addressed the current state of the Karelian language. The second lightning talk, From Toki Pona to Uralic: A Grammar-Constrained Pipeline for Low-Resource Language Generation, addressed a methodological approach to training language models for low-resource languages. The work used an extremely controlled language such as Toki Pona as a testbed for grammatically guided synthetic data generation. The goal was not Toki Pona itself, but a scalable method that can be transferred to morphologically rich Uralic languages. The lightning talk highlighted how explicit grammatical constraints and validated synthetic data can compensate for the lack of large datasets. The lightning talk Did Karelian Survive the Year? A Small Data Update provided an up-to-date snapshot of the digital vitality of the Karelian language. The talk presented a lightweight yet repeatable data collection process used to analyze Karelian-language online content, particularly in news and article texts. The results showed that Karelian is actively produced online, especially in short news formats, and that even a small but regularly updated dataset can provide meaningful insights into the current state of an endangered language. The fourth Metropolia lightning talk, Evaluating Finnish Dialect Normalization in GPT Models with and without Reasoning, focused on dialect normalization of Finnish using language models. The study compared traditionally fine-tuned GPT-style models with models explicitly equipped with reasoning (chain-of-thought). The results showed that strong pretraining in the Finnish language is more crucial than explicit reasoning, and that reasoning-based fine-tuning can even degrade normalization performance in this task. The lightning talk highlighted important insights into when and how reasoning capabilities should be leveraged in language technology applications. Artur Roos explained what Uralic languages can learn from synthetic languages. From research to practice: AI in support of small languages The IWCLUL workshop highlighted how Metropolia’s AI research brings together theoretical linguistics, practical language technology, and societal impact. Both the full research papers and the lightning contributions demonstrated that large language models are not viewed at Metropolia as standalone, general-purpose solutions, but rather as tools that can be guided, constrained, and complemented with linguistic expertise. The common denominator across Metropolia’s presentations was the reality of endangered languages: limited datasets, rich morphology, and the need for transparent and maintainable solutions. Whether the focus was on rethinking assessment in education, translation of Uralic languages, the digital vitality of Karelian, or normalization of dialectal Finnish, the research emphasized approaches that work even when ready-made data or perfect models are not available. The workshop reinforced Metropolia’s role in the international language technology community as an actor that brings together artificial intelligence, open-source development, and the needs of language communities. At the same time, it demonstrated that research on small languages is not a side track of AI development, but one of its most important testbeds: it is precisely there that the assumptions, limitations, and design choices underlying language models are forced into the open.

Metropolia Develops AI Solutions for Internal Needs

23.6.2025 Exploring AI

Under the leadership of Development Manager Mika Hämäläinen, Metropolia’s AI team is developing various solutions based on large language models to address the organization’s challenges. The core idea is to solve real problems in a user-centered and agile manner. Since large language models are constantly evolving, there is no longer a need to develop the AI itself — instead, our task is to adopt AI and integrate it into everyday life in an easy-to-use form. Our team currently includes software developers Lev Kharlashkin, Melany Macías Morán and Leo Huovinen, as well as student interns Yehor Tereshchenko, Sheng Tai and Aki Morooka. The tools we have developed are named OpintoHain, Oracle, Grant Writing Assistant, Curriculum Tool, and Moodle AI plugin. OpintoHain OpintoHain was developed as part of a project led by Sonja Saarikivi, with the goal of creating a tool for lifelong learners. The target audience consists of individuals external to Metropolia who wish to update their skills and study at Metropolia — whether by taking a single course or potentially pursuing a suitable Master’s degree. We responded to the challenge by developing a chatbot that understands the course offerings of Metropolia’s Open University. The tool is powered by a RAG (Retrieval-Augmented Generation) model that is familiar with Metropolia’s courses and degree programs. It also includes a multi-agent system with dedicated agents for course and degree recommendations, as well as for study guidance. The OpintoHain tool is available for testing on Metropolia’s website. Oracle Foresight has taken on an increasingly important role at Metropolia — everyone is expected to anticipate future developments, but how? We set out to address this challenge with the Oracle tool, which ingests online content such as news articles and job postings. Based on this input, we can analyze the data using vectorization and clustering techniques. We have already developed methods for identifying weak signals and megatrends, detecting drivers of change, conducting data-driven scenario work and implementing an automated multi-agent version of the Delphi method. The guiding idea is that AI processes foresight data into a ready-to-use format, so that the end user can gain maximum benefit from the insights, even if they have little to no prior knowledge of foresight practices themselves. In putting Oracle into practical use to support real-world applications, we are supported by the foresight working group led by Maani Nyqvist, along with foresight expert Marita Huhtaniemi. Grant Writing Assistant The importance of external funding is growing in the higher education sector. Competition for funding is fierce, and often even strong applications go unfunded. We are developing an AI tool in collaboration with Maarit Haataja, Director of RDI and Project Services, and her team, to enhance Metropolia’s chances of securing external funding. In EU Horizon funding calls in particular, it is crucial that every section of the call for proposals is addressed within the application. Even a strong application can be rejected if it fails to mention even a single sub-point. Grant Writing Assistant automatically analyzes the call for proposals and compares it with the content of the application. Any missing elements are clearly reported to the user, who can then choose to correct them manually or have the AI automatically insert the missing content. The tool is also capable of identifying risks and breaking the project down into work packages. Curriculum Tool Writing curricula is a time-consuming process. Each course-level curriculum should reflect both the goals of sustainable development and Arene competencies. To support this, we developed the Curriculum Tool, which analyzes curricula and visualizes the content of degree programs from the perspectives of Arene competencies and sustainable development. In the development of this tool, Metsälintu Pahkin played a valuable role as a liaison with the degree coordinators. You can read the scientific publication describing the tool for more details. Moodle AI Plugin The Moodle AI Plugin was developed for teachers, enabling them to automatically generate assignments directly in Moodle based on their own course materials. The core idea has been to integrate AI directly into a familiar tool, rather than creating a separate system. Senior Lecturer Tricia Cleland-Silva served as a valuable liaison with the teaching staff during the development process. You can read the scientific publication describing this tool for further insights.

Metropolia AI at Pedaforum 2025: Demos, Dialogue and a Lot of Excitement

18.6.2025 Exploring AI

Earlier this month, the Metropolia AI team, led by Mika Hämäläinen, had the exciting opportunity to take part in Pedaforum 2025, Finland’s largest higher education pedagogy conference, held right in our home base of Myllypuro, Helsinki. Our team ran an interactive demo booth showcasing five AI-powered tools we’ve been developing to support educators, researchers and students. Throughout the day, we had the pleasure of engaging with a diverse and enthusiastic crowd - from university lecturers and curriculum designers to grant officers and edtech innovators. And yes, we gave out plenty of candy too 🍬. What We Showcased: 5 Practical AI Tools for Higher Education We believe that the best way to explore AI is to build with it - and that’s exactly what we’ve been doing. Here are the tools we presented. 🎓 AI Moodle Plugin A favorite among many visitors! A tool designed for teachers to analyze their Moodle course content and slide decks. By using large language models (LLMs), this plugin helps educators uncover structural gaps, identify learning objectives and reflect on the clarity and alignment of their materials. 📘 Curriculum AI A co-pilot for working with degree programme curricula - whether you’re revising learning outcomes, mapping competencies or aligning to national frameworks. Curriculum AI helps make curriculum work more collaborative and intelligent. 💸 Grant Writing AI GrantWritingAI helps researchers draft and refine funding proposals using LLMs. The tool streamlines ideation, structure, and writing support, significantly reducing the time and stress involved in grant applications. 🔮 Oracle – Foresight Tool This tool empowers educators and researchers to conduct horizon scanning and foresight exercises using AI. It synthesizes large volumes of trend data to support future-oriented thinking in academic planning and innovation strategies. 🎯 OpintoHain – AI-Powered Course Recommender One of the stars of the day, OpintoHain uses GPT-4o and multi-agent workflows to recommend relevant courses to learners - whether they’re Metropolia students or lifelong learners from outside our university. Attendees were able to upload their CVs, define learning goals and receive curated learning paths in real time. Community Response: Insightful, Encouraging and Energizing Leo Huovinen and Melany Macías Morán The response we received was overwhelmingly positive. Participants appreciated the hands-on nature of our demos and the clear focus on augmenting, not replacing, educators' expertise. It was especially rewarding to hear feedback from colleagues at Aalto University, University of Helsinki, LUT University and other institutions across Finland. Attendees expressed excitement not just about the tools themselves, but about the broader message: AI can be a meaningful part of academic life today - not just tomorrow. What We Took Away Sheng Tai, Lev Kharlashkin, Yehor Tereshchenko and Aki Morooka Beyond showcasing technology, Pedaforum gave us a chance to listen deeply. Educators shared their needs, aspirations and concerns around AI. These conversations have already sparked ideas for the next iterations of our tools.And yes - we had a lot of fun too! What’s Next? We're continuing to develop, refine and open up these tools to more users across Metropolia and beyond. If you’re curious to explore how AI might fit into your teaching, research or learning journey, we’d love to hear from you.Stay tuned for more demos, workshops and community-building around responsible and empowering uses of AI in education. 🚀