Speaker Details: SMPTE Toronto Technical Conference 2021

Full Name

Michael Elhadad

Job Title

Head of R&D

Company

Dalet

Speaker Bio

As a Founder and member of the Management Board, Michael Elhadad has been Director of Research & Development of the Dalet group since 1996.

Michael Elhadad graduated from Ecole Centrale de Paris and has a PhD in Artificial Intelligence from Columbia University.
He is also a Professor of Computer Science at Ben Gurion University, and the author of over 100 academic papers on Artificial Intelligence and Computational Linguistics.

Speaking At

TTC 2021 - Daily Activity - Day 1- June 15, 2021

Topic and Description

Topic: Pretrained AI Models for the Media Industry – Opportunities & Challenges

Description: "AI has come back with great fanfare in the past five years, with unexpectedly good performance on tasks such as speech to text, machine translation, face recognition thanks to progress in neural networks and deep learning. Yet, overall, the impact of AI on day to day operation in the media industry has remained modest. In a few domains, early adopters have reaped operational results: automatic caption generation and automatic metadata extraction to enhance media archives are reaching production level quality.

The operational cost of deploying AI-based solutions has remained, however, quite high: for most applications, customized AI models must be trained on proprietary data. AI models must be trained on large quantities of carefully annotated datasets, operational procedures must be put in place to ensure the quality of these datasets, complex technological infrastructure is required to update the models as more data is collected to prevent quality drift over time. The overall technological complexity and high maintenance costs explain the poor rate of adoption we have observed.

There may be some good news coming out of academic labs though: 2020 has seen the rise of a new paradigm in AI, that of large pre-trained models. The trend has emerged first in NLP applications, with the ""transformer architecture"" first illustrated by the BERT system and then by OpenAI's GPT-3 system. These models are trained in a new way, called ""light supervision"" on un-annotated text. The training signal is accumulated over un-imaginably large training datasets (hundreds of billions of sentences). These ""pre-trained models"" can then be ""fine-tuned"" with little effort and little data to create downstream application models. This approach has exploded over the past 18 months in NLP yielding uniform progress over all sub-fields (machine translation, summarization, semantic search, text classification, question answering, text generation).

In the past 6 months, the same approach has been generalized to vision - and large pre-trained models trained on images are bringing similar benefits to the vision domain: object segmentation, object recognition, visual question answering, automatic captioning).

We will illustrate the power of this new technology through provocative examples of few-shot training of pretrained models.
We will try to imagine the expected impact of ""pre-trained models"" for the media industry, analyzing specific use cases (human identity recognition in historical archives, subtitle translation). Finally, we will discuss the key gotchas that are associated to this new trend: risks for responsible AI, bias and harm, defenses against deep fake-content and the way content production processes may be affected even for non-adopters. "