“We can no longer successfully do our job without the help of automation and artificial intelligence.”

Judy King

Interview with Judy King, Director of Innovation at BBC Monitoring, UK.

Hi Judy, what is your background, and what is included in your current role at BBC Monitoring? 

I joined BBC Monitoring as a researcher in November 1999, following two years teaching English in rural Japan. And I have been there ever since, apart from a brief stint working at the BBC News website.

After 18 years, I really understand what makes BBC Monitoring tick. And this is hugely beneficial in my current role in which I head up the innovation team. Our work is very varied. On any one day we could be running a pilot to try out a new tool with one of our regional teams, collaborating with the BBC’s NewsLabs team on language technology prototypes or advising other BBC teams on using Agile ways of working in the Newsroom.

What differs BBC Monitoring from other media monitoring companies?

Many companies in this area are tech firms using AI and machine learning for brand monitoring purposes. We are quite different. We don’t only rely on technology and algorithms to find relevant information.

We employ highly-skilled, multi-lingual journalists who have a deep understanding of the media environment they are covering. This enables them to navigate through the ever-growing number of sources to spot trends and find the stories that matter.

We have a long history of reporting on developments from the world’s media. We have been doing this since the Second World War after all! And we are able to draw on this deep archive to enable our users to make sense of the present.

What are the possibilities and benefits of automation of the editorial workflow?

BBCM’s role is to understand and navigate media ecosystems to find news, spot disinformation and give context to events. Not just one, but many ecosystems, in many languages. And it is changing fast. Gone are the days when you could watch one state TV station and read a couple of newspapers to know what is going on in a country. We can no longer successfully do our job without the help of automation and artificial intelligence.

We use tools to help us keep across social and online sources, but for broadcast media it is much more difficult. There are huge benefits for our journalists to have access to speech-to-text transcripts of the broadcasts they are watching – in the vernacular language. This would enable us to keep across many more TV sources, find the information that is relevant to our users and spend more time adding context and insight to the output that they are producing.

What are the challenges of automation?

I think that the main challenge is how to fully integrate automation into the journalists’ daily work. If we were to just bolt it on as another tool available for people to use, without considering the entire workflow, we would not be able to realise all the benefits that introducing speech-to-text and other automated technology could bring.

What would your advice be on how to meet those challenges?

I think it is all about piloting and getting the technology into the workflow as soon as you can.

Of course, you also need to set the right expectations with journalists. The quality of the transcript will not be perfect and you should be clear about that from the start.

But if you can get the technology in front of journalists – even if it is not perfect – then they can start to experiment with how the automated transcripts can help them produce even more creative and original journalism.

When it comes to introducing automation of the editorial workflow, what next steps will we see in the near future that will improve it even further?

I haven’t seen any speech-to-text technology in any language that is perfect (getting people’s names right, for example, is extremely difficult). But there is a lot of focus on language technology at the moment and it is improving all the time. Even now the accuracy of the transcripts can also be improved if coupled with other technology, such as face recognition and speech recognition.

There is currently a lot of discussion about “fake news.” What do you think about the balance in the discussion between the focus on fake news compared to real news (where all facts are correct in)?

It is an extremely complicated picture. In many cases it is unclear whether what we are seeing is misinformation, which you could describe as the inadvertent sharing of false information, or disinformation, which is the deliberate creation and sharing of false information.

Our journalists are highly skilled at verifying what they see on the media they are covering. In some cases, what they see are efforts by media outlets not just to mislead and misinform, but sow confusion, undermine public trust in the media, create the impression that you can’t get to the bottom of things – that there is no truth, no facts, just opinions.

Have you recently, or are you planning to, release any new technology-based solutions that will add to or improve services for your clients? 

We are constantly looking for ways to improve our service to our customers. We recently introduced a new “fake news” tag onto our website to enable our users to more easily find articles on disinformation and propaganda. We are about to make improvements to our search functionality, to guide our users even more smoothly through our news and reference content, enabling them to quickly get to the information they need.

When it comes to the actual data behind the analysis that you do, what kind of data or media can be interesting in the future that you do not use for your analysis today? 

In the future I envisage us doing a lot more big data work, analyzing trends and how they develop across time. For example, capturing a broader swathe of media content than we are currently capable of analyzing and using it to find stories hidden in the data. We would also want to integrate this with our vast archive of monitored media output, which dates back decades.

How do you think the monitoring industry will change in the next 5-10 years, and what are the greatest challenges ahead?

If, in the coming years, technology companies continue to make leaps forward in automation and machine learning, transcription and translation will become reliable. I think that will bring the biggest change to the media monitoring industry.

But even if the language technology does improve considerably, non-specialists will still need help to navigate the increasingly complex media environments around the world. BBC Monitoring will continue to develop a reputation as source specialists, guiding our users to what matters to them.

By Renata Ilitsky

“The analysis of the unstructured part of EHR will represent an essential contribution to advances in drug safety and effectivity”

Jose Gonzalez

Interview with Jose Gonzalez, CEO of MeaningCloud, a text analytics company based in New York City.

Hi Jose, what is your background and what are your responsibilities in your current role at MeaningCloud?

My background is in the field of Artificial Intelligence. I hold a Ph.D. from the Technical University of Madrid, where I entered as an assistant professor and researcher at the AI lab of the School of Telecommunication Engineering in 1985.

Years later, in 1998, I founded my first startup, Daedalus, along with other colleagues. We were developing and struggling to sell AI solutions twenty years ago, mainly in two areas, natural language processing and data mining. A good part of our activity consisted in developing our own technology, with the financial support of national and European research programs.

By then, we were dedicating 25% of our revenue to R&D. However, marketing and selling these solutions was tough. The game changed for us when we started deploying our text analytics solutions as a SaaS business on top of AWS in 2011.

Finally, in 2015 we decided to create a new company (MeaningCloud), incorporating new investors, merging Daedalus and starting a subsidiary in the US. My role as CEO of MeaningCloud involves managing every area of the company, from the technical product roadmap and HR, to business development and to finance.

What differs MeaningCloud from other text analytics companies?

There are a few differentiating elements in our offering; the first one is our deep semantic approach to truly “understand” and interpret any piece of text, extracting not only facts and sentiments, but relationships, beliefs, desires, and intent. It means that we rely on a linguistic approach, complemented with machine learning (including deep learning) to build base models and to generate candidate rules for human curation. This linguistic approach is essential to work in high-value information discovery scenarios, where precision is a must.

The second differentiator is what we call “vertical packs.” It means off-the-shelf industry-oriented solutions to address typical business or industry use cases.

The third one is customization; in Text Analytics, one size does not fit all. Therefore, we empower our customers to add their own dictionaries, classification schemes, and sentiment analysis rules.

MeaningCloud is originally a Spanish company, but opened an office in the US a few years back. How has that affected your business?

Three years after incorporating MeaningCloud in the US, we are getting 80% of our revenue out of Spain. Our most valuable customers are in the US. The movement has deeply affected every aspect of our work, starting with the motivation and the renewed ambition of our team, who feel like they are playing in a different league. We have made a special effort to recruit people from abroad, almost reaching 25% of non-Spanish nationals in the company.

What are your greatest challenges ahead for MeaningCloud when it comes to serving your customer analysis and developing your offer?

Our most valuable customers look for the extraction of very specific insights from any information source. The ability to develop tools to carry out this process for a particular purpose, with the required coverage and precision, and within acceptable time and costs, is our most important challenge today.

You work a lot with the pharmaceutical industry; can you please share what you do for them and how their needs differ from other industries when it comes to text analysis?

In pharma and healthcare, we address some general problems from the vantage point of having integrated and developed along the years a good amount of multilingual resources (medical terminology, thesauri, clinical codes) and tools to understand the health language. For instance, we have in place market intelligence solutions to unveil opportunities and threads in real time from digital sources.

A second area is pharmacovigilance (also called drug safety), the practice of monitoring the effects of medical drugs after they have been licensed for use, especially in order to identify and evaluate previously unreported adverse reactions. We apply text analytics to identify episodes of interaction between drugs, adverse effects, etc., from reported cases, specialized forums or scientific literature.

The third area is what we call “Voice of the Patient” analytics, a specialization of the more classic “Voice of the Customer” analytics, that we have been carrying out in retail, banking or telecom industries.

A promising new area that is currently under development is around “Real World Evidence.” RWE is information on healthcare that is derived from sources outside clinical research settings (the clinical trials carried out to obtain drug approval), including electronic health records (EHRs), claims and billing data, product and disease registries, and data gathered through personal devices and health applications.

Automatic analysis of such sources allows us to know how specific drugs perform within different population groups, in patients showing differences in disease severity conditions that require other medications, in long-term treatments, etc.

How has your clients’ perception about what text analysis can do for them changed over the years?

In the past, it was difficult convincing our customers in business areas about the effectiveness and integrability of the technology. On the other side, our customer’s IT departments were afraid of integration risks and costs. This situation changed utterly years ago with the availability of SaaS solutions. These days, our business customers play with our text analytics functions inside their own Excel spreadsheets, and technical users just call our APIs in their software environment seamlessly, whatever it is.

Have you recently, or are you about to, release any new technology-based solutions that will add or improve services for your clients? If so, what solutions, and how will your customers benefit from them?

We follow a roadmap for continuous improvement of the functionality and usability of our technology. Last month, a new API was added  to our offering, the “Deep Categorization API.” It is a solution for assigning one or more categories to a text by finding snippets that match advanced semantic patterns and contexts expressed in a powerful (but simple) language made with macros and rules.

This technology has allowed us to market new services, our vertical packs. Vertical packs are solutions intended for specific industries. The first four packs are for the analysis of the Voice of the Customer (including different flavors for the retail, banking and insurance scenarios) and for Voice of the Employee analytics.

Regarding languages, in a few weeks, we will be publishing the Nordic Package, to add Danish, Finnish, Norwegian and Swedish to our current language offering. Chinese, Hindi, Arabic and Russian will follow shortly.

Finally, Summarization and Document Structure Analysis APIs will incorporate substantial improvements before the summer.

Is “fake news” a big issue for the text analysis that you do? If so, what are the challenges for the analysis you do for your customers, and how do you cope with them? 

We all know how difficult filtering noise is, in general, in social media. This noise may appear in many different ways, vacuous, idiotic, fanatic, insulting, manipulative or merely as false messages.

However, this landscape does not differ too much from what happens already in offline media. Depending on the nature of our work and the purpose of a particular client, we may be forced to filter out some kind of noise, but we cannot tell, obviously, if an individual piece of news is truthful or not. The only means to do that involves analyzing the origin and the spreading mechanisms of information across networks, something that we are not currently doing.

Regarding this topic, I would first rely on education. As educated digital citizens, we should develop abilities to distinguish honest, reliable, sensible and relevant sources of information and opinion.

When it comes to the actual data behind the text analysis that you do, what kind of data or media can be interesting in the future that you don’t analyze today?

I would bet on Electronic Health Records. What we do now is on a minimum scale. On May 6th, the US National Institute for Health has launched the research program “All of Us,” whose aim is getting one million volunteers to contribute their physical, genomic and electronic health record data. It is the starting signal for the most relevant “Precision Medicine Initiative” so far. The analysis of the unstructured part of EHR will represent an essential contribution to advances in drug safety and effectivity.

How do you think the text analysis industry will change in the next 5 years, and what are the greatest challenges ahead?

The long-term challenges (beyond five years) have to do with our ability to interpret any communication act, such as discovering, reasoning and reacting on facts, beliefs, emotions, desires, intentions, and values of people and artificial agents. Despite the current hype on Artificial Intelligence, we are still far, far away from that goal.

How do you foresee the changes and developments for MeaningCloud over the next 5 years?

We will keep on following our dream, which is going deeper and deeper in extracting the meaning of all kinds of unstructured content. The next step will be a more powerful approach to the extraction of relationships from text. Stay tuned!

By Renata Ilitsky