On the battlefield, drones act as the eyes of soldiers, seeing further, and as their brains, thinking faster than human operators can react. FPV (first-person view) does not allow for a calm second pass of the scene. They have literally milliseconds to make the best possible decision: whether to keep observing, reacquire a target or commit to a tactic. The cost of misreading the situation is high – a failed mission.
That pressure is why perception alone is not enough. Seeing the target is not the same as understanding its environment under noise, obstruction, and uncertainty. In real world operations, the system must process sensor data on board and act in a way that is both explainable and safe, without relying on an operator every frame.
At the Aalto Defense × Junction Hackathon in May 2026, a group of Metropolia students, Yehor Tereshchenko, Nadiia Haidash and Diana Antoniuk, took on exactly that problem: building a real-time perception and reasoning system for autonomous drones using edge compute and sensor data. Rather than creating another demo that only draws YOLO bounding boxes, the goal for the weekend was to develop mission-level behavior, including stable observation, cautious orbit-style tactics and searching when confidence drops.
Our journey
Hackathon time is not calendar time. A task that looks like “two hours” becomes a day; a day becomes a night on the floor. This is how our weekend actually went.
Friday -“It’s just wiring.”
As with every hardware project, the start was optimistic. The task was to connect the NVIDIA Jetson Orin Nano Super to the drone through a maze of connectors and adapters. In parallel, the software development began with the repository structure, the first scripts, and the idea that we would have a camera and something to detect something by Saturday night.
Friday night to Saturday 02:00 and 04:00 -runs to the store.
When the bench is missing that one connector that makes the story possible, there’s no time for debate. You go to the store. At 2 a.m. and again at 4 a.m. on a mission for parts is a hackathon genre of its own. Everything was kept on the floor and we kept going. Sleep was a rumour.
Saturday morning – the battery that refused to work.
Jetson would not reliably power up from the battery path we had engineered. A whole day disappeared into rethinking the power tree, soldering, an ammeter, a step-up module, and the occasional smell of something that was not supposed to get that hot. Fingers included. The solution was almost insultingly simple: stop fighting the universe by adding unnecessary conversion stages. That lesson alone made the Saturday worthwhile.
Saturday day andnight – software catches up.
Camera on Jetson. Detection coming alive – first shaky, then repeatable. Simple object detection became a pipeline; the pipeline became telemetry; telemetry became something we could show without apologizing.
Sunday, 05:00 -leave for the forest.
After working through Saturday night, we left at around 5 a.m. to film a demo video in real outdoor conditions – wind, background noise and a person walking through the frame like a real target. It was cold and early, but absolutely worth it.
Sunday morning – “this is what we built.”
Forest footage, moving targets, the stack still running: see the scene, update belief, show phase and macro on the dashboard. Real conditions do not care about architecture diagram.
On Sunday,one hour before the deadline.
Of course, the polished submission layer came last. In the final hour, we pulled together the application materials – the story, the screenshots and the pitch deck – because builders build first and explain second.
Sunday- the pitch.
Minutes before going on stage, something broke. Again. You don’t cancel; you think carefully, fix what you can and demonstrate what works. We gave a live pitch to everyone at the event, showing the detection, mission state and the loop updating in real time.
It was a really rewarding experience. The team was one of the top five teams competing for acceleration. Presenting in front of the entire hackathon audience did not feel like a consolation prize. For a project developed over the weekend, it proved that the idea could withstand the challenges of hardware, deadlines and public presentation.
Although we didn’t win the overall prize, we received feedback saying that we had exceeded the scope of the challenge. We can’t call it a loss – it was a win for us!
Hackathons teach you two things at once: ship the demo and learn how big the idea wants to become.
What was actually built
Our weekend build focused on vision-follow and mission tactics on edge hardware demonstrated in simulation and bench conditions with live telemetry, as a foundation for higher-stakes use cases.
Perception transforms the camera stream into structured evidence. Using an onboard camera, class detection, person-centric filtering and AI-based computer vision, the system identifies and tracks relevant targets over time. These outputs are packaged as observations: Is someone present? Is there motion, occlusion or instability? This layer answers the question, “What does the machine think it sees right now?”
Reasoning sits on top of this. A decision engine evaluates those observations, then selects a mission intent and high-level modes, such as holding and observing, following cautiously, or resuming search when belief weakens. This is intentionally above low-level motor commands. The flightcritical loop stays deterministic and fast; optional slower AI narration runs beside it, not inside it.
See. Decide. Commit. Repeat.
The system recognises and interprets pixels as evidence. It decides on a tactic within constraints. It then commits, with the control and executor translating intent into safe cues. Then it repeats this process for every frame.
LLM narration runs on a separate slow path: a background thread snapshots telemetry every ~6 seconds and calls Ollama (the ~3B class model). The prompt is a compressed slice of mission state and loop health. The reply is advisory only; it never blocks the hot path. If Ollama is unreachable, the flight loop and planner keep going; the UI simply shows that the narrative channel failed.
Nano-class models for fast object detection were the spine: low latency on Jetson, person-first for vision-follow, with room to add classes and heads. Pose and “vital” structure were on the roadmap – pose estimation and finer body structure to move from a box on a human to a model of where the system is looking.
We designed toward a richer battlespace picture:
- Ammunition and gear cues: helmets, armor, ballistic goggles, ear protection; weapon families (rifle, shotgun, pistol, EM weapons, drone counter-UAS nets, and similar categories as training labels).
- Obscurants and cover: structured cover, masking nets, smoke/fog/haze—detection under degraded visibility, not only clear sky demos.
- Not just “human”: separating soldiers, civilians, operators, volunteers, press; status bands (alive, lightly injured, KIA-class outcomes) and risk of escalation along a timeline—ethically fraught, technically interesting, and exactly the kind of problem defense AI forces you to confront carefully.
Under the hood, AeroRozum runs on an NVIDIA Jetson Orin Nano Super devkit, wired to a drone stack and USB cameras. Connecting it to a drone stack and a moving target scenario was more complicated than the architectural diagrams.
The concept was developed of an on-device VLA-style vision-language model, rather than a cloud-only Copilot for drones. The concept is that a fast detection pipeline feeds structured world state, while a mission planner LLM retains contextual information about the payload, mission intent, surroundings and recent history, ensuring that recommendations remain grounded in what the edge stack can actually perceive.
This combination – milliseconds for perception, seconds for narration/planning and deterministic control in between – is the architectural design.
Lessons learnt
We arrived as students, fuelled by previous courses, caffeine and the kind of teamwork that only comes when time is of the essence.
- When we think “this will take an hour”, we should mentally translate that to “maybe a day”, and that is normal.
- That power electronics can humble you faster than coding.
- That debugging for 20 minutes before a demo is not a failure – it’s part of the job. Redoing a drone power path while the judges are scheduling the next team is still an achievement if you walk on stage and demonstrate your progress.
- That AI on the edge is not just one model – it’s a combination of detection and belief, macros, slow language and logs, and the art lies in keeping those elements separate and together at the same time.
We are tired, we are proud, and we are not finished yet!
From here, the path involves field hardening, cleaner camera setup on Jetson, tighter progression from simulation towards cautious hardware integration, and spending more time outside with the same approach: measure, explain, repeat.
If we want autonomy to feel understandable rather than mystical, then we must treat every frame as a decision. We built AeroRozum to make those decisions visible.
The Team
The contents of this blog reflect the collective effort of Metropolia students Yehor Tereshchenko, Nadiia Haidash, Diana Antoniuk, participating in the hackathon. With gratitude to Metropolia for the foundation that let students attempt something this ambitious; thanks to Aalto Defense, Junction, partners, mentors, and every team that shared the floor with us.
Comments
No comments