Sam Swift-Glasman outlines Megaset’s experience participating in the CAM Scale-Up UK Programme.
Learn more about their speciality, the testing process and the results.
Generative AI and AI systems and machine learning models are becoming a ubiquitous in
every single system and every single industry in the world. The validation of these systems
and the validation of the data sets that are used to generate them is crucial when dealing
with safety sensitive cases, such as autonomous transport.
What is also becoming a trend globally is the use of synthetic data to fill holes within coverage of data sets and for also for use to validating models
Gartner predict that by 2030 the majority of data sets used to train AI will be synthetically generated.
The danger is that these models are being trained on massive data sets,
which might not be representative of real world data. They may be bias, and it might not encapsulate and serve every member of society. This is where synthetic data, data
that is generated algorithmically, procedurally, digitally, becomes incredibly useful.
As Sam explains
“There might be difficulties with data sets for privacy, for security. Maybe it’s very hard
to find certain objects in the real world. Maybe they’re very rare, but we still need to represent them and we need to make sure that our systems are robust against these.”
Megasets are focusing on bringing rigor to synthetic data sets. Just because a data set is synthetic doesn’t mean that if shouldn’t have to go through the same rigorous process that would be the norm with any physical system.
Every part of their platform is unit tested so that different users can interrogate the data sets, and the components of their data sets, ensuring accountability all the way through.
Their mission is to make these tools accessible and accountable, which will help to solve a myriad of problems within the space. Providing users with reliable data fosters robust problem-solving, enabling thorough root cause analyses throughout the data set.
In the case of synthesizing data which is analogous to a real world vehicle platform,
with all the sensors that you would typically find on a real world system, Megasets have developed synthetic cameras, synthetic lidar, synthetic radar, and with each of these annotating objects in digital environments. Through simulations and scenario “fuzzing,” they efficiently generate vast datasets.
By following this process you can generate a very accurate digital twin model, which can then be used to train the models in the first place, it can be used to test those models within
the development cycle and then at the end, this can also be used to validate the models
to give some confidence on their real world physical testing, before going to the full physical test at the end of any application.
This meticulous process yields highly accurate digital twin models, pivotal for initial model training, iterative testing within development cycles, and eventual validation before transitioning to full-scale physical testing. Megasets’ approach exemplifies a commitment to precision and reliability in data synthesis and application
Sam adds
“In that sense, we are reconstructing variants of a specific scenario, and studying how different events trigger different scenarios within that scene. So that could be public transportation timetables, school opening times, rush hour, any of those and how we could accurately synthesize and generate reconstructions and simulations of those events.
For Megasets’ test program there were two pillars.
The first was working with Smart Mobility Living Labs: London, taking the SMLL vehicle with it’s complete sensor platform along with live camera feeds in London to capture all the complexities that entails London VRU and pedestrians. This provided an invaluable opportunity to delve deeply into the intricacies of synthetic radar, which is very complex to characterize, yielding a wealth of rich data for analysis.
Secondly working with HORIBA MIRA they were able to undergo a very rigorous program of physical testing using the SMLL vehicle.
This phase involved capturing diverse pedestrians with varying skin tones, mobility devices, and assorted objects commonly encountered in street scenes. The data collected served to characterize sensor responses across these scenarios, forming the basis for evaluation against synthetic reconstructions in the project’s next phase.
Megasets were keen to make sure that every data point that they were capturing had an emphasis on diversity, inclusivity and accessibility, so that when they generate data sets for real world scenarios, they are not excluding large parts of the population.
“For us, we really wanted to be working with the same engineers who are used to test automotive applications because at the end of the day, we’re making automotive data sets. So this is really invaluable for us.”
Sam shares
“Now is the time for this because there’s a lot of people who are at the coalface, so to speak, making decisions on what goes into a data set that maybe isn’t taken into account all of the stakeholders in society.
We all have unconscious biases, so developers need to rely on a set of rigorous tools that can help make sure from the beginning of the development of the models that they are deploying are robust and that they do serve all of the users who will be relying on these models for day-to-day living.
The CAM Scale-Up UK program has provided accelerated development for Megasets, in terms of internal KPIs, and developing metrics with the Testbed engineers. This has given them the confidence that their data sets can be used to bring autonomous systems onto the streets and help bring safe transport to everybody.”
{{ teamMember.name }}
{{ teamMember.title }}