AI IS the future of ad filtering and the way to propel forward is to stay ahead of the curve
Acknowledgments: This article was made possible by contributions and collaboration from the entire Machine Learning team at eyeo.
Last year marked a milestone with the launch of Project Moonshot, a bold journey to wield AI’s power for ad filtering. We accomplished our one huge, crazy goal – a moonshot – to create a minimum viable product that implements machine learning to automate ad detection.
Machine learning is not just changing the world; it's shaping the future we've always imagined. I'm proud that eyeo is at the forefront of revolutionizing ad filtering by harnessing the power of AI to create a fair internet for everyone. -Dr. Humera Noor Minhas
This AI-enabled automation not only increases the efficacy of our solution but also makes it more robust, sustainable and scalable. But our journey to the moon isn’t over with a single leap, in fact, it’s only marked the beginning. What we discovered through the success was how important it is to continue innovating, transforming and above all, remain competitive in the evolving ad tech landscape today. Artificial intelligence is the future of ad filtering, but it’s important to stay ahead of the curve. This blog reflects on our journey thus far, unveiling lessons and a glimpse into our road ahead.
Lessons from the frontlines
The digital advertising landscape is dynamic, driven by technological advancements, changing consumer behaviors and shifts in regulation. The speed at which machine learning is evolving in the ad tech landscape is a reflection of the industry's need to stay ahead to remain competitive.
For instance, we’re aware of the recent advances in Generative AI and the potential it holds. There’s been early work onusing GPT-based systemsfor ad filtering, however, the results are mixed and have a long way to go. At eyeo, we have been collaborating withTU Munichto use GPT-based systems for generating filter lists which lie at the core of present-day ad-filtering solutions. This has the potential to disrupt the industry and change its ways of working.
We were under the impression that training machine learning models would be the hardest part of the job. However, we couldn’t have been more wrong, the lion’s share of our time was spent in delivering our models to the end users. This was further complicated by the recently introduced MV3 restrictions. Being compliant with these changes entailed an overhaul of our delivery, preprocessing and inference mechanism.
The model binary size was kept as lean as possible because of extension store policies on individual file sizes which can be downloaded on the fly. In order to optimize the module size we had to make several adjustments.
Model Weight Quantization which almost halves the model size.
Limiting the number of layers in our Graph Convolutional Neural Network and using sparse architectures.
Importing minimal third-party dependencies and implementing customized preprocessing steps to make things more efficient.
To ensure a seamless user experience, our SLAs in terms of prediction latency had to be in the order of milliseconds. To achieve this feat, we leaned on WebGL, which provides an interface to harness the graphics processor of the system and limit the predictions to specific parts of the webpage that are likely to contain ads.
Features and model architecture
From a model choice and architecture perspective, we found Graph Convolutional Neural networks to outperform Computer Vision and NLP-based models. They proved to be more robust, worked directly on the HTML of the webpage and used a set of language-agnostic features. In terms of classification strategy - graph classification excelled in comparison to node classification.
By performing extensive feature analysis, we were able to identify a subset of features which are most important and prominent for detecting ads. This further reduced our model size, improved prediction latency and made the model more robust to HTML layout modifications. Through our experiments, we can attest to the oft-mentioned adage - “A machine learning model is as good as its features”.
Some domains actively try to circumvent ad blockers and this is where we needed to be more vigilant and proactive with our monitoring solutions. We realized the need for a process and protocol in place to make data-driven decisions on when to update and roll back our models. It is imperative for any production-level AI solution to have data drift detection systems in place which enables us to gauge model performance in the wild.
Unveiling the potential for the future
For our future endeavors, we are working on scaling our solutions to the millions of domains out there and providing our end users with the best in class real-time ad-filtering experience.
Towards this goal, we are taking the following concrete steps:
Developing the next generation of foundational models for ad-filtering.
Automating the benchmarking of our models and building statistical frameworks to detect concept and data drift.
Finally, as a company focused on protecting user privacy, we are always looking for ways to ensure that our solutions don’t infringe on users' personal space and towards this direction we intend to integrate our privacy-preserving AI algorithms into our anti-tracking extension.
Project Moonshot has been a remarkable journey so far and we’ve learned that continuous innovation, transformation and remaining competitive are the key ingredients to staying at the forefront of the evolving industry. We’re determined to stay ahead of the curve to ensure we continue transforming the internet into a trusted, sustainable and accessible place for all stakeholders.
To find out more about ourAI initiativesjoin us at the Ad-Filtering Dev Summit in Amsterdam (or online) which will be held on the 4-5 October 2023. Dr. Humera Noor Minhas and Parinitha Hirehal’s session “The Now versus The Future: The Impact of AI on Ad Filtering” will highlight the major role AI plays in the future of ad filtering and its potential to disrupt the future course of the industry.