A few weeks ago we announced a few initiatives to make our blog data even better so that our API clients can get even more compelling and complete insights into the world of the blogosphere. As we put it: Our goal is to have the best blog data in Europe in terms of coverage, quality and immediacy.
Now we again have good news for our API clients: Until recently we have only had 4 month of historic data available through our Analytics API. As of now, we extend that data to a full 12 month, with no extra charge. With that change, we can provide companies and organisation with even better, more complete data.
At the same time we also upgraded our search language which will make it possible to search for single characters. Previously two character words were the shortest query. This is something that our customers have requested and we are happy to finally release these features. This upgrade will also make the data available faster. Before it took up to 15 min from finding a blog post to making it searchable in Analytics, going forward it will only take seconds.
And by the way, our goal is to make all our data starting from 2006 available through our Analytics API, so there is more to come!
For all users of Twingly.com the upgrade means that the public blog search can provide 12 month of historic data as well.
Every second, a huge and every increasing amount of data is published on the web. Gavagai, a Twingly Data client based in Stockholm, has developed a Technology to read, aggregate and understand this content. Fredrik Olsson, the Chief Data Officer, gives some more insights into this fascinating business and about what the startup is able to do with the blog data it collects.
At Gavagai, you do some sophisticated stuff. Please tell us in a few sentences what your business is all about? It’s about continuously reading tremendously large and dynamic text streams, and delivering timely, and actionable intelligence based on the aggregation of information therein. Of course, what is actionable depends on the information needs you as an actor in a particular domain have, be it brand management, assessing threat levels for targets-at-risk, or keeping track of the sentiment towards a particular tradable asset. Example information needs that you are able to address using Ethersource, our system, include:
* How is my brand perceived in comparison to those of my competitors’?
* Why are my customers unsubscribing from the services that I’m offering?
* When is the best time to launch this particular advertising campaign?
* How is the campaign, recently launched by my competitor, received
among my target audience?
* Where is it most likely that the on-line protests against a certain
phenomenon will be publicly manifested in terms of a demonstration?
What’s the founding story of Gavagai?
Gavagai was founded in 2008 by my colleagues Jussi Karlgren and Magnus Sahlgren, as a spin-off from the Swedish Institute of Computer Science (SICS). Gavagai was formed as a response to the many inquiries Magnus and Jussi received from people outside SICS regarding their research. Gavagai has been operational in its current incarnation since late 2010.
You are one of Twingly’s Data clients, that means you are using our API to access data from Swedish and English speaking blogs. Why do you need this information and what do you use it for?
We read data from Twingly 24/7. In particular, the Twingly live feed gives what we believe to be a very good coverage of Swedish blogs, which of course is very important to us in meeting the kinds of information needs outlined above, expressed by domestic actors.
Do you have any insights about this data from Blogs in Swedish and English you want to share? Some surprising fact or observation?
One epiphany we had some time ago was that we’re now able to aggregate and inspect attitudes and opinions of a population as a whole, that’s not necessarily visible in any of the parts. For instance, we can clearly see that Swedish bloggers are optimistic during holidays and weekends, something which is very hard to assess from the posts of any one individual. Analogously, we also pick up on aversive or hostile tendencies in the online population towards a given subject, but where it is hard to identify all the facets of the tendency in any one individual. For example, we recently set up a Xenophobic Tracker using, among other things, the Swedish blogosphere as input; the propensity of violent expressions in that context is not a pretty read.
But it’s not the peak items that we’re most pleased with. With Ethersource, we can pick up and note weak signals and tendencies where other methods fail.
What type of companies or organisations use your services?
The kinds of actors that require actionable intelligence in their efforts to manage brands, make informed decision based on the ‘temperature’ of an on-line population as a whole, keep track of the general mood in the markets, or trade with specific assets.
Your titel is “Chief Data Officer”. That’s not too common, is it? Do you think every company will need a CDO in the future?
No, I don’t think every company will need a CDO in the future. Hopefully, companies will be able to scale down on their data management activities, perhaps due to their use of tools and techniques such as Ethersource, and instead focus on their core business. Much the same way we are able to focus on our core business by obtaining data from Twingly instead of harvesting it all ourselves.
Big Data is one of the hottest buzzwords right now, which is a field you are active in. What’s the potential and biggest challenges of the increasing amount of data?
We’re currently concerned with human-generated text, so it is in the light of that the response to this question should be read.
The biggest challenge with Big Data is to stop focusing on Big Data. Big Data will, by virtue of the prevailing definition, always be slightly too big to handle with common tools. This has mainly resulted in people being obsessed with processing speed and ability to store large amounts of data. Few, if any, have focused on a layer in the so called Big Data Stack that so far has been missing: the Semantic Processing Layer. The key challenge for Big Data is to come to the point where it is easy and swift to turn massive data streams into actionable intelligence; knowledge that you and your organization can act upon in order to obtain a competitive advantage. To put it another way; the key challenge of Big Data is to be of service.
Being a researcher by training and heart, I believe that we’ve yet to imagine the biggest potential there is in harnessing truly Big Data. Let’s talk about that in a few years, when a more representative sample of our world’s population is active on-line. Then, we’ll be able to find the collective answers to questions to mankind, that we’re not able to think of now.
What’s on your roadmap for the upcoming years? Where do you see the biggest growth and potential for Gavagai?
We’ve got very exciting times ahead of us! Ethersource is already unique in the way it is able to read amounts of text that would overwhelm traditional language processing methods, handle multiple (all) languages, in real-time, and learn from variations in the input in an unsupervised manner.
Our development plans involve some fairly hefty stuff. In the short term, we’ll roll out a game changer in terms of a way of identifying the many meanings of a given concept, and use that information to disambiguate expressions of that concept as they appear in social media. For instance, imagine that you are a brand manager for Apple, Visa, “3” or some other brand with an inherently ambiguous and common name: How do you go about monitoring the attitudes and opinions towards the meaning of the word that constitutes your brand, and only that meaning? There is a solution…
The biggest growth and potential for Gavagai is as a supplier of the Ethersource technology to other companies, such as analytics firms, trading desks, governmental agencies etc, that already have an infrastructure in place, but that lacks the competitive edge the ability to understand and make sense of large text streams in multiple languages gives. Ethersource is an implementation of the Semantic Processing Layer of the Big Data Stack, and we intend to move it as such.
Everybody is talking about data from social media sources and the opportunities for businesses it brings, if one knows how to use it. But how, is the question often asked. And – do we really have to?
This year, NEXT Conference in Berlin had Data as its main topic, and in various sessions it was discussed how important data nowadays is for businesses. In autumn there will be the Research & Results in Munich where mostly market researchers come together to discuss the new challenges that collecting data from social media sources brings.
Recently I chatted with someone in classic market research about this, and it became evident that especially data from Facebook, Twitter and Blogs become increasingly important for that area.
Why is data from blogs and social media platforms increasingly important?
Blogs, Twitter and Facebook are only some examples of the platforms (see Ethority’s social media prism to the right) where consumers are and exchange their views with their friends as well as with the entire community. So everybody running a business knows that one has to be where ones customers are, in order to understand their needs, develop the products they want and most importantly – sell these products as well as offering great customer service.
But one does not only need to be present where ones customers are. One also wants to listen to what they have to say in general.
The competition is on – how popular is my product in comparison to others? Do people love my brand or do they hate it? Did they ever hear about it? Only who knows that can really optimise their products and services, find their niche, get the most out of it – and even come up with new innovative ideas for products and services, knowing the potential benefits they bring to ones customers. Maybe one would even need to find new customer groups?
Then, of course the big question is, how on Earth do I find out what’s been said about me or my brand(s)?
There are a lot of media monitoring services around – and the often quoted dilemma is that there is no service that satisfies all needs and wishes. Well, I guess that’s why there are that many around in the first place. Some of them are either specialised in a certain niche, like i.e. media monitoring for the finance or the travel sector, or they simply try to develop their services to be the best all-round solution and compete with each other that way.
Regardless which way media monitoring companies choose, we at Twingly can deliver blog data to all of them.
In fact, we get quite a few requests about our blog data and the number of our data customers more than doubled during the last 12 months. These are all our API Customers , as we call them.
So let us tell you a bit more about how we could potentially support your media monitoring business with our blog data!
We actually have 3 API’s.
Number 1: Livefeed – This is raw blog data at its finest.
You get all blog data as an XML-feed at the same time as when it enters our system. You can then save the data at your end, and perform the analysis you want to do. You subscribe per language and we give you access via an API-key. Livefeed is mostly used by our bigger clients such as Radian6 and Meltwater Buzz, who then feed our blog data into their system as additional sources.
Number 2: Analytics – 0ur search-API. This API is based on our blog search – here you can just throw a bunch of keywords into the search interface and get results across all languages (or just specific ones) and that up to 4 months back in history. Twingly Analytics is suitable for media monitoring services that don’t want to deal with the super-technical side of things but that actually want to get results for certain keywords directly out of the pool of blog data. The results come as an XML-feed and can then be used for further analysis on their own system. Silobreaker’s services are a good example for using our search API in order to find additional sources in blogs about certain topics – for example their news trends.
Number 3: RSS-API – the best of basic!
You probably know that you can search on Twingly.com for a certain keyword (i.e. Twingly as a brand) and that you can subscribe to the results via RSS for free? Now, some of you might want to use this feed commercially and at a higher volume. Then you can get an API-key from us which allows you, against a low monthly fee, to use this feed for your purposes. That can be media monitoring, but you could also use it to create a customised top list of blogs that talk about your products. The Swedish publisher Norstedts created a top list of blogged about books that they choose by using the RSS-API.
Now, if you want to know more about any of these feeds and their technique, you can either contact us or you are welcome to check out our FAQ and get back to us with specific questions.
If you are from Sweden and looking for some companies that do media monitoring, then you might want to check out Mediebevakare.se as well, where a lot of Swedish media monitoring companies present their tools and services.
We at Twingly founded Mediebevakare.se, because like you we felt that an overview of all the different media monitoring services available is needed . So please check it out and spread the word if you know someone who is looking for a service, or if you know a tool that should be listed as well.
If you have a media monitoring business on the Swedish market but you are not based here, feel free to set up your presentation on Mediebevakare anyway. We can help you if needed – just let us know!