ANALYSIS: We need trust, transparency, as AI grows

Our information ecosystem is in trouble. Here’s how we can fix it. Part four of four

By Tessa Sproule

WE AS HUMANS have to learn how to communicate with AI. We have to learn what it’s good at helping us with, and what it might mess up if we’re not watching it.

The role of the modern digital citizen of a democracy has become similar to the old-school editor — knowing where a piece of information came from, assessing its credibility and potential biases, framing it within the context of the rest of the information of the day.

There is simply no substitute for human judgement. Algorithms making decisions need to be audited to help us uncover biases (unintentional and overt), and if there are biases, how our AI systems can be adjusted to limit their impact.

The role of the modern provider of news and information media is to know how AI is being used to distribute your content. It is our absolute responsibility to know exactly what instructions our machine learning systems are basing their predictions on. We must also know what training data set our AI is using to learn from; where that training set may be thin, where it may lack diversity in its examples; how it can be improved upon to deliver the right content recommendation to the person who needs it when they need it.

There is simply too much at risk, and tremendous opportunity missed, if we don’t.

What we need to do next

Beware of the hype. Today’s AI is not super-competent and all-knowing. Everything AI knows is what we’ve told it. And right now, the media industry has fallen behind in helping AI help us when it comes to news and information content.

Instead of throwing up our hands, we have a narrow opportunity at this very moment to bolster the world’s information ecosystem, putting us, curious, thoughtful human thinkers at the centre again.

BigTech marketers would like us to believe that AI systems are neutral, highly-intelligent and sophisticated. But we simply aren’t there yet. The tech world gets excited about things like “big data” and “data as the new oil”. It kind of is if you only think of it as a resource — which it is. But to my mind, we need to be thinking about it as a public resource.

Data signals can be used for good and for bad (intentionally and not). The same data training set could be used by medical researchers to uncover better diagnostic symptoms for a form of breast cancer — or used by an insurance provider to identify those customers more likely to contract that breast cancer. One machine learning system could use a training set to hire the best candidate for the job, another could unintentionally ignore female applicants because of a biased weighting in its machine learning logic.

Instead of throwing up our hands, we have a narrow opportunity at this very moment to bolster the world’s information ecosystem, putting us, curious, thoughtful human thinkers at the centre again.

We need to start with some grunt work: data-tagging. Creating structured data for information video content is not glamorous, but it is the building block of effective machine learning. Since 2014, my company, Vubble, has been doing the critical work of data-tagging news video from the world’s leading news organizations (including CTV News in Canada and Channel 4 News in the UK).

Using our unique ‘journalist-in-the-loop’ approach to annotation and our proprietary taxonomy created by journalists and library scientists, Vubble has created what we believe to be the world’s largest data training set for “ambiguous” information video content — the key to unlocking AI that can help us understand what’s happening in video, moving images, and even predicting what is happening in real life. (“Ambiguous” content is an AI term that refers to complex information that requires context for comprehension by humans, and is particularly opaque to the earthworm mind of current AI systems).

The AI systems that exist today, including Vubble’s, can only predict with slightly better-than-random certainty what is actually happening in a news video. But our AI is getting smarter every day, thanks in large part to the priority we put on transparency, human (journalistic) insight and oversight, and what’s called in the industry “explainable AI” (an emerging field in machine learning that aims to provide overt transparency, accountability and trustworthiness in AI systems).

Discovery Box: Canada, a bilingual database of news video, filterable in three ways: linear feed, keyword search and an opt-in algorithm.

Our AI has more to learn from every day as we continue to annotate information video from the world’s leading media publishers. In 2019, the Canadian government joined us to help, providing Vubble with funding via the Digital Citizen Initiative to subsidize our cloud-based data-tagging of the long-tail information video from Canada’s major news media companies.

This month, we will launch a public-facing version of this effort, Discovery Box: Canada, a bilingual database of news video, filterable in three ways: linear feed, keyword search and an opt-in algorithm.

What we’re trying to do with Discovery Box: Canada:

standardize the annotation of news video content among Canadian creators and publishers
expand the diversity of content distribution via Vubble’s proprietary ‘bias spread’ algorithmic approach
lift critical thinking among the Canadian public through explainable AI and our content assessment tool, the Vubble Credibility Meter.

In return, Vubble is providing Canada’s main news media publishers with the structured data our editors have generated around their news video. A must-have for quality, reliable AI recommendations, this structured data, if used thoughtfully and effectively, will help Canada’s news media as they move from conventional print and broadcast distribution towards AI distribution, ready to make powerful, reliable content recommendations and get the right information in front of people who need to receive it.

In 2020, it is Vubble’s mission to help Canada’s news media break through the barrier of weak and unstructured data, while building the world’s largest, context-rich data training set for teaching machines to provide top-quality, reliable recommendations at a mass scale.

We’re passionate about the innovation, research and development possibilities that promise to grow from here — from using the Vubble training data set to help companies predict changes in audience usage behaviour, to automating the real-time mass delivery of critical news information across platforms and devices — we’re pulling up our sleeves, developing new distribution tools to meet Canadians where they are, with their needs at the core of our decision-making.

We’re not doing this because we can, we’re doing this because we must

Not long ago, the media industry woke up and realized that we no longer own our relationship with our customers. We no longer run the distribution business that generates profits from our work, and most of us don’t own the relationship with the technology to get our stories out there. When that first Trojan Horse rolled into our industry in the days after September 11, 2001, we began to cede virtually every facet of our industry to BigTech.

No more.

In 2020, if we can find ways to work together, the entire Canadian media industry will be better prepared for AI’s advance into the newsroom.

The news media’s relationship with the citizens of our democracies is a partnership — one that requires trust, respect and transparency as AI enters the newsroom. We have a common goal: to help citizens access trustworthy, factual information when and how they need to receive it.

As automation advances into the media business, particularly in the distribution space, it is the responsibility of our entire industry to ensure that we move forward together, in meaningful cooperation, to defray the power and influence of BigTech in the information ecosystem.

At Vubble, we’re committed to building strong and lasting collaborations within the news media around three things: providing structured data around your large libraries of information video content; continuing to build the world’s largest journalist-annotated information video training dataset; and being a ‘sandbox’ of AI R&D, where we can all work together to test new ML methods, try out new training models, and share new learning.

In 2020, if we can find ways to work together, the entire Canadian media industry will be better prepared for AI’s advance into the newsroom. The future hasn’t been written yet — a free and informed society depends on us pulling up our sleeves and getting into the hard work of rewriting our industry’s relationship with AI. Because a healthy information ecosystem is the lifeblood of a functioning democracy.

Thanks for reading. If you have thoughts, get in touch. I’d love to hear from you!

Tessa Sproule is the Co-Founder and Co-CEO of Vubble, a media technology company based in Toronto and Waterloo, Canada. Vubble helps media and educational groups (like CTV News, Channel 4 News, Let’s Talk Science) by cloud-annotating news video, building tools for digital distribution and generating deeply personalized recommendations via Vubble’s machine-learning platform.