User study for media metadata extraction

In Ericsson Research, we believe that richer metadata on multimedia content is a key to enabling people to engage with their media in a more interactive and natural way. In recent years, progress in computer vision and deep content analysis has been phenomenal but the field remains wide open, and taking a pure data analytics approach to identify best use-cases is very challenging. With the help of our colleagues working in Human-Computer Interaction, we determined what video consumers may like and avoided working on the wrong problem.

Watching Ipad

Have you ever thought how wonderful it would be if you had more control over your media content when watching TV? For example, if you are watching a movie with a kid, you could have the player automatically blur or jump over the not child-friendly scenes? Or while watching live TV, the player could automatically mute sound or blur images that could be disturbing, such as any scenes with blood?

At Ericsson Research Silicon Valley, Machine Intelligence and Media Technologies teams, we believed these kinds of scenarios could become reality with the use of rich content metadata. Metadata is data (information) that describes other data, in this case the information in a video. In the past, video metadata has been manually generated and often limited to, for example in the case of movies, title, cast, genre, date, original language, director and producer. More recently, scene by scene metadata has gained popularity, for example, for enabling more timely advertisement.

What do users really want?

Like most enthusiastic researchers, we were very keen to tackle this challenging problem of automatically extracting scene metadata from video content in the most efficient way, with a focus on using computer vision and machine learning techniques. Realizing the time and effort needed to extract all possible metadata from a given scene, and after many eye-opening discussions with our Human-Computer Interaction (HCI) colleagues at Ericsson, we decided to pause the coding and ask ourselves first: what is it that users really want?

For example, in which context do users feel negative emotions toward what they are watching? What strategies have they developed to avoid inappropriate scenes? How can our technology support users with their in-the-moment needs to prevent any unpleasant encounters, while ensuring the least interruption to their viewing experience? Our hope with these questions was to narrow down as much a possible the list of metadata values we would search for to enable the most meaningful use cases.

Conducting interviews on video habits

Girl with laptop

This is how our user-study journey started. With close collaboration between the design team in Ericsson’s MediaFirst product team and HCI experts and with the help of our intern Jennifer Lee, we conducted a series of studies including a survey and a semi-structured interview on people’s video consumption habits. The benefit of semi-structured interviews is that they allow participants to share experiences in their own words, while the researcher can probe their answers for clarification and unexpected lines of inquiries.

Qualitative data was systematically analyzed using coding schemes, which are developed both deductively and inductively based on all the information provided by participants through both survey and interviews. Its findings reassured us that gathering user perspectives as early as possible is essential to the success of technology innovation.

Unexpected results lead to modified research goals

Friends watching TV

It turns out that our initial assumption that people want certain content to be automatically blocked was not exactly what users may be yearning for. Our user study showed that users prefer to control what they are about to see rather than have a predefined process determine it for them. That included various avoidance and seeking behaviors that they wanted to oversee.

It also turns out that parental control had the highest potential interest and did not include blocking content automatically but giving more tools to the parent to interact with their kids on the content being consumed. There was a preference for engaging and speaking to their children rather than attempt to block and control all content. Another interesting, and totally unexpected outcome of the user study brought our attention to users’ media consumption habits in public versus in private. Participants mentioned avoiding certain content in public for fear of being judged. One other unexpected outcome of the study was media watching due to “guilt”, because a friend or family member had sent them a video to watch. All these gave us new research questions to work on – how to efficiently summarize the content of video media in cases the user does not have time or is not in the right environment to watch it.

In summary, automatic metadata extraction from video content is not trivial, and not knowing exactly what the use case for killer application is, we would have started ourselves on the lengthy and difficult process of extracting all possible metadata from the content. However, by trying this new approach in our research process, we determined what users may really want and hence modified our initial research goals.

In conclusion, this research project resulted in establishing a new internal process for user research in Ericsson Research. This process will allow us to quickly affirm or pivot our initial research hypothesis in future projects without excessive expenditure of time and resources

The Ericsson Blog

Like what you’re reading? Please sign up for email updates on your favorite topics.

Subscribe now

At the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.