Author: Yehor Tereshchenko
How Metropolia software students worked with hardware and AI and made it to the top 5 of Finland’s largest defense hackathon
On the battlefield, drones act as the eyes of soldiers, seeing further, and as their brains, thinking faster than human operators can react. FPV (first-person view) does not allow for a calm second pass of the scene. They have literally milliseconds to make the best possible decision: whether to keep observing, reacquire a target or commit to a tactic. The cost of misreading the situation is high – a failed mission. That pressure is why perception alone is not enough. Seeing the target is not the same as understanding its environment under noise, obstruction, and uncertainty. In real world operations, the system must process sensor data on board and act in a way that is both explainable and safe, without relying on an operator every frame. At the Aalto Defense × Junction Hackathon in May 2026, a group of Metropolia students, Yehor Tereshchenko, Nadiia Haidash and Diana Antoniuk, took on exactly that problem: building a real-time perception and reasoning system for autonomous drones using edge compute and sensor data. Rather than creating another demo that only draws YOLO bounding boxes, the goal for the weekend was to develop mission-level behavior, including stable observation, cautious orbit-style tactics and searching when confidence drops. Our journey Hackathon time is not calendar time. A task that looks like “two hours” becomes a day; a day becomes a night on the floor. This is how our weekend actually went. The team discussing the project Friday -“It’s just wiring.” As with every hardware project, the start was optimistic. The task was to connect the NVIDIA Jetson Orin Nano Super to the drone through a maze of connectors and adapters. In parallel, the software development began with the repository structure, the first scripts, and the idea that we would have a camera and something to detect something by Saturday night. Designing a path on the floor Friday night to Saturday 02:00 and 04:00 -runs to the store. When the bench is missing that one connector that makes the story possible, there's no time for debate. You go to the store. At 2 a.m. and again at 4 a.m. on a mission for parts is a hackathon genre of its own. Everything was kept on the floor and we kept going. Sleep was a rumour. First trip to the store The calm walk to the store 2.0 Saturday morning - the battery that refused to work. Jetson would not reliably power up from the battery path we had engineered. A whole day disappeared into rethinking the power tree, soldering, an ammeter, a step-up module, and the occasional smell of something that was not supposed to get that hot. Fingers included. The solution was almost insultingly simple: stop fighting the universe by adding unnecessary conversion stages. That lesson alone made the Saturday worthwhile. Creative chaos Saturday day andnight - software catches up. Camera on Jetson. Detection coming alive - first shaky, then repeatable. Simple object detection became a pipeline; the pipeline became telemetry; telemetry became something we could show without apologizing. Object detection with first versions of software Sunday, 05:00 -leave for the forest. After working through Saturday night, we left at around 5 a.m. to film a demo video in real outdoor conditions - wind, background noise and a person walking through the frame like a real target. It was cold and early, but absolutely worth it. Filming demo video in real outdoor conditions Sunday morning - “this is what we built.” Forest footage, moving targets, the stack still running: see the scene, update belief, show phase and macro on the dashboard. Real conditions do not care about architecture diagram. On Sunday,one hour before the deadline. Of course, the polished submission layer came last. In the final hour, we pulled together the application materials - the story, the screenshots and the pitch deck - because builders build first and explain second. Sunday- the pitch. Minutes before going on stage, something broke. Again. You don't cancel; you think carefully, fix what you can and demonstrate what works. We gave a live pitch to everyone at the event, showing the detection, mission state and the loop updating in real time. The team pitching It was a really rewarding experience. The team was one of the top five teams competing for acceleration. Presenting in front of the entire hackathon audience did not feel like a consolation prize. For a project developed over the weekend, it proved that the idea could withstand the challenges of hardware, deadlines and public presentation. Although we didn't win the overall prize, we received feedback saying that we had exceeded the scope of the challenge. We can't call it a loss - it was a win for us! Hackathons teach you two things at once: ship the demo and learn how big the idea wants to become. What was actually built Our weekend build focused on vision-follow and mission tactics on edge hardware demonstrated in simulation and bench conditions with live telemetry, as a foundation for higher-stakes use cases. Happily holding a drone, which survived a pitch Perception transforms the camera stream into structured evidence. Using an onboard camera, class detection, person-centric filtering and AI-based computer vision, the system identifies and tracks relevant targets over time. These outputs are packaged as observations: Is someone present? Is there motion, occlusion or instability? This layer answers the question, “What does the machine think it sees right now?” System detecting Reasoning sits on top of this. A decision engine evaluates those observations, then selects a mission intent and high-level modes, such as holding and observing, following cautiously, or resuming search when belief weakens. This is intentionally above low-level motor commands. The flightcritical loop stays deterministic and fast; optional slower AI narration runs beside it, not inside it. See. Decide. Commit. Repeat. The system recognises and interprets pixels as evidence. It decides on a tactic within constraints. It then commits, with the control and executor translating intent into safe cues. Then it repeats this process for every frame. LLM narration runs on a separate slow path: a background thread snapshots telemetry every ~6 seconds and calls Ollama (the ~3B class model). The prompt is a compressed slice of mission state and loop health. The reply is advisory only; it never blocks the hot path. If Ollama is unreachable, the flight loop and planner keep going; the UI simply shows that the narrative channel failed. Nano-class models for fast object detection were the spine: low latency on Jetson, person-first for vision-follow, with room to add classes and heads. Pose and “vital” structure were on the roadmap - pose estimation and finer body structure to move from a box on a human to a model of where the system is looking. Object detection We designed toward a richer battlespace picture: Ammunition and gear cues: helmets, armor, ballistic goggles, ear protection; weapon families (rifle, shotgun, pistol, EM weapons, drone counter-UAS nets, and similar categories as training labels). Obscurants and cover: structured cover, masking nets, smoke/fog/haze—detection under degraded visibility, not only clear sky demos. Not just “human”: separating soldiers, civilians, operators, volunteers, press; status bands (alive, lightly injured, KIA-class outcomes) and risk of escalation along a timeline—ethically fraught, technically interesting, and exactly the kind of problem defense AI forces you to confront carefully. The system tracking a target in real outdoor conditions Under the hood, AeroRozum runs on an NVIDIA Jetson Orin Nano Super devkit, wired to a drone stack and USB cameras. Connecting it to a drone stack and a moving target scenario was more complicated than the architectural diagrams. The hardware stack The concept was developed of an on-device VLA-style vision-language model, rather than a cloud-only Copilot for drones. The concept is that a fast detection pipeline feeds structured world state, while a mission planner LLM retains contextual information about the payload, mission intent, surroundings and recent history, ensuring that recommendations remain grounded in what the edge stack can actually perceive. This combination - milliseconds for perception, seconds for narration/planning and deterministic control in between - is the architectural design. The team working Lessons learnt We arrived as students, fuelled by previous courses, caffeine and the kind of teamwork that only comes when time is of the essence. When we think “this will take an hour”, we should mentally translate that to “maybe a day”, and that is normal. That power electronics can humble you faster than coding. That debugging for 20 minutes before a demo is not a failure - it's part of the job. Redoing a drone power path while the judges are scheduling the next team is still an achievement if you walk on stage and demonstrate your progress. That AI on the edge is not just one model - it's a combination of detection and belief, macros, slow language and logs, and the art lies in keeping those elements separate and together at the same time. We are tired, we are proud, and we are not finished yet! From here, the path involves field hardening, cleaner camera setup on Jetson, tighter progression from simulation towards cautious hardware integration, and spending more time outside with the same approach: measure, explain, repeat. If we want autonomy to feel understandable rather than mystical, then we must treat every frame as a decision. We built AeroRozum to make those decisions visible. The Team The contents of this blog reflect the collective effort of Metropolia students Yehor Tereshchenko, Nadiia Haidash, Diana Antoniuk, participating in the hackathon. With gratitude to Metropolia for the foundation that let students attempt something this ambitious; thanks to Aalto Defense, Junction, partners, mentors, and every team that shared the floor with us. Attempt to setup servo drop mechanism at 3 am, was not included in the final setup
Metropolia students created AI context-aware NPC during Supercell Hackathon
Non-player characters (NPCs) in games have typically followed one rule: players act, and the character reacts - within strict, predefined limits. They often guide a user through a certain story, predefined by a creator. That can be the main storyline, side quest, etc. Even though it can be various, frequently there is not much diversity in the game line through which players go. While it can be fun for the first time, after replaying a game, it becomes boring and even at the start players have limited abilities to go outside the "borders" defined by the developer. That is where AI can help because it can react to almost everything, and this reaction is unique even in the same circumstances. So why not use it in game development to make the final product and user experience even more inspiring!? What happens when that reactivity becomes context-aware, unscripted, and emotionally intelligent? That's exactly what a group of Metropolia students set out to explore during the Supercell AI Hackathon, hosted by Junction in Helsinki this May. Their entry, developed in just over 24 hours, landed an impressive 3rd place among nearly 50 international teams and offered a glimpse into the future of interactive AI. The team after receiving the "Supercell award". The teammates worked on the Myllypuro campus for two days: on Friday, they brainstormed about ideas and attempted to find inspiration and on Saturday, very active prototype development. After several hours of efficient discussion, they came up with an idea of making the following reactive-NPC prototype: “Purr-suit of Attention” illustration showing the cat and its witch AI-companion A game where the player isn't in control - the AI is The team's prototype, Purr-suit of Attention, flips the classic power dynamic between player and NPC. Instead of commanding the world, the player steps into the paws of a curious cat living with an AI-powered fantasy witch - a non-playable character that responds dynamically to everything the cat does. The task of the game is to explain to the AI what the cat wants to use different activities and make the NPC do a certain action. So, how cat-witch (player to NPC) interaction goes? Meow? She speaks. Knock over a bottle? She sighs, laughs, or reacts with surprise. Jump on a table? She might ignore it, become indignant, or think that the cat is hungry. Meows, scratching the front door, staring at it for a long time? The NPC will definitely think that the cat wants to go inside, but will she open the door? It depends on the mood and a bunch of other possible factors that cannot be described through "if-else statements" The cat meows at a locked door as the witch offers to open it. The twist? None of these reactions are scripted. They're generated in real time using a large language model, meaning each play through is unique and emotionally rich. And the most impressive thing is that the player has no limits or boundaries! "Cat" can do whatever, and the AI will interpret the player's actions itself. How it works? The system relies on a full-stack integration of modern tools: 1. Unity 6 and C# for gameplay mechanics, API calls and animation control, etc. Unity Editor scene view with the cat, witch, and setup. 2. Python (Flask) for backend logic, handling game events and states, LLM calls; 3. Google Gemini for generating context-aware intent, choosing animation to play and speech. Split-screen of the Python backend code alongside the Unity Play window and debug logs. 4. Sesame Voice for natural, spoken responses Actions taken by the cat are tagged and flagged as events (meow, jump, scrap, etc) and states (looking at and near objects). These are processed by the backend, where the AI evaluates them in context using techniques like high-low temperature, prompt enhancement (system, static and dynamic prompts; moving "IMPORTANT" flag), reattempts, and random selection according to heightening parameters from the AI response. The result? A witch NPC that doesn't just respond - she feels alive, true (but a little bit "slow-witted person" because the response from free LLM's API takes some time😊). Unity editor with cat script visualizing and backend answer in logs The Metropolia team worked closely, handling different roles from backend AI logic to design and animation scripting. The final prototype - developed from scratch in just one day - impressed judges and participants for its creativity, interactivity, and potential for further development. More Than a Game The cat sleeps under the task list while the AI-witch teases, “Already sleeping kitty?” The significance of this project goes beyond the podium. It reflects a broader shift happening in AI and game design: toward responsive, emotional, and emergent behavior in digital characters. As large language models become more accessible, fast and controllable, the line between code and personality continues to blur. It isn't just a whimsical experiment, it's a prototype for a world where games don't just entertain - they converse, react, and surprise. 🎮 Watch the gameplay demo 📄 Junction submission 🏆 Award ceremony moment Final cozy art: the witch sips a warm drink by the fire as the cat naps. The contents of this blog reflect the collective effort of Metropolia students (Yehor Tereshchenko, Artur Roos, Unai San Segundo, Kartik Patel) participating in the hackathon. With gratitude to Metropolia for giving an opportunity to join from Myllypuro's co-working places (especially on Saturday), for previous knowledge, and bringing team members together as first-year students; also, thanks to Supercell, Junction, and all participants.