A cup of coffee gives us 4000 blogs – 2015 in review

Yet another year has passed for Twingly. For us this is a natural time to sit back and reflect on what we have done during 2015.

Most of our continuous efforts are done behind the scenes so we’d like to take this opportunity to share some statistics for you to get an idea of what we do to deliver a great service.

2015 statistics

Twingly is a complex system (machines and humans) that continuously finds new blogs, index the posts and deliver data both to our API customers as well as to our public search and widgets. We work on lots of projects to achieve our goals and last year we created 24 new internal projects, we have almost a hundred projects in total.

Out of the 24 projects 7 were published with an open source license. For example, https://github.com/twingly/ecco which is used to replicate the blog data from our MySQL servers to the search indices. All open source projects are available at https://github.com/twingly.

Some other statistics from the previous year:

  • 2634 cups of coffee brewed in the office
  • 7717 code commits, 86 397 lines added and 86 706 lines removed
  • 1228 issues created in our bug/feature tracking system (845 closed and 383 still open)
  • 835 support tickets handled

Not everything has gone as planned though, we’ve had 28 infrastructure issues that required extra work to keep the service up and running. Thanks to our great employees only some of the incidents affected our users and all service interruptions got noted on status.twingly.com to help our customers understand what’s going on. Each incident is a great opportunity to better understand our system and improve our services and skills.

Our machines worked hard as well:

2016-01-05 10.52.48-1
A few of our hard working servers
  • Received 1 020 088 312 XML-RPC pings
  • Fetched 66.75 TiB from the Internet
  • 39.81 TiB served from our servers
  • 14.22 TiB content served from our CDN
  • Analyzed 1 084 123 550 tweets, looking for blogs

The result of the combined human and machine effort resulted in:

  • Indexed 407 859 721 blog posts
  • Discovered 10 336 602 new blogs (almost 4000 blogs per cup of coffee in the office)
  • 99.91% uptime, average for all our public services. Our API uptime is available on status.twingly.com

Christmas statistics

We took some well-deserved time off around Christmas. Once back at the office we noticed the annual Swedish blog decline for the Christmas weekend. In Sweden it’s most common to celebrate on Christmas Eve (December 24). Most bloggers also took the day off and the post graph for Swedish got a noticeable dip.

Screen Shot 2016-01-19 at 14.15.28 emoji.png
Swedish Christmas Celebration on December 24

29.42% of the Swedish posts on Christmas eve had common Christmas keywords in their title (Jul = Christmas, God = Merry and Julafton = Christmas eve).

Thanks for a great 2015!

We are really proud of what we accomplished in 2015 and are eager to evolve our services during 2016.


 

Bonus links to our previous reviews:

Visualizing the blogosphere (again)

If you’ve followed Twingly through the years, you might remember our Windows screensaver. Eight years ago we developed a visualization of the blogosphere, we could see blog activity in real time and get a sense for how the whole globe used blogs to communicate (video of the old screensaver).

preview.png

Unfortunately the screensaver didn’t age that well, it’s now grumpy and doesn’t want to play nice with modern operating systems. Also, who uses a screensaver nowadays? If you do, please consider putting your screen to sleep instead.

We thought it would be fun to see what we could implement using modern technologies. The result is a web based (using WebGL) globe that animates blog posts with coordinates in near real time.

Check out the Twingly Globe at world.twingly.com.

What do you think?

By Johan Eckerström

Finding the needles: blog discovery

first_indexed_blogs
First blogs indexed by Twingly

Twingly has, as mentioned in our last blog post, been indexing blogs since 2006.1 A few of you out there might wonder: the Internet is a big place, how do you find blogs in the enormous haystack? To answer that question, we need to journey back to 2006 and travel until present day. We will also take a glimpse at the future of blog indexing here at Twingly.

Before we start, we ought to mention that we have a very important requirement on the blog in order for us to be able to index it: the blog must have a discoverable feed in either the Atom or the RSS format. This requirement, or limitation if you will, alone makes life much easier as we can quickly discard most of the pages found on the Internet.

In the beginning, there was our “Blog Provider Monitor system”, or “Provider system“ in short. Shortly put the system consists of a set of specialized automatic indexers, also known as crawlers or spiders, with defined rules. For example one provider might keep watch on a certain blog hotel, while another could watch an aggregated blog top list.

In January 2007 we introduced an interface for automatic pings, XML-RPC ping, which enables blogs to automatically notify us when there is new blog content to be found. This enabled self-hosted blogs, which cannot be found by our Provider system, to find their way into our index.

In February 2008 manual ping saw the light on http://www.twingly.se. The ping page allows bloggers to manually notify us when they wanted to have their blog indexed by us, further increasing our coverage.

While the three systems mentioned above are great at finding new blogs, they all share a flaw – they have no memory. In theory they could find a blog once, never to find it again. To solve that problem our Automatic Ping system was born, it was fed with blogs that were deemed worthy of continuous monitoring. Blogs put into the automatic system were typically customer requests, blogs found by certain providers and blogs manually added by Twingly.

The first iterations of Autoping were quite rudimentary and our current generation of Autoping came alive in September 2012. It supports balancing (i.e. how often to ping a given blog), duplicate detection and other techniques that ensures that we do not consume more resources than necessary and retrieve blogs in a timely fashion.

For the past year we have been working very hard to further increase our blog coverage. This includes projects such as:

  • Fine-tuning and creating new Providers.
  • Finding blogs in outgoing links in newly indexed blog posts (since May 2014). The system was extended to check for outgoing links on the blog’s front page in October 2015.
  • Finding blogs mentioned in social media (several projects during 2015).
  • Re-visiting all of the blogs (over 80 million(!)) in our index, ensuring that we keep the ones that are still alive and active under automatic monitoring (ongoing since September 2015).
  • Ensuring all of our newly discovered blogs are automatically monitored (October 2015).
  • Providers capable of handling web pages creating its content dynamically via JavaScript (November 2015).

We are still not satisfied and the future holds many interesting projects.

HAL9000.svg
Twingly’s Blog AI?

A huge challenge when indexing blogs is to prevent the accidental ingestion of undesirable sites such as news sites and forums. Therefore we have quite strict rules for content reaching our systems through, for example, social media, outgoing links and XML-RPC. Naturally, the strict rules likely make us reject actual blogs. To remedy this we have instituted a “Blog AI” project which aims to solve this problem, as the name implies we want an automated system which can deem whether a given site on the Internet is a blog or not. The project is split into several parts and the first part concerns the ability to be able to detect custom domains2. We expect to see parts of that system in production soon.

Another challenge is to find and index newly published posts in a timely fashion. As mentioned, we do have our Automatic ping system, but it makes assumptions about the user’s blogging pattern based on past behavior. To overcome this problem we have started to work on the next generation of Autoping that will be using the PubSubHubbub protocol for blogs that supports it. This means that we will be able to index posts instantaneously after they have been published! We hope to have this ready for evaluation soon.

Keep an eye out for more in-depth posts and in the meanwhile check out our source discovery and ingestion documentation.

By Robin Wallin


  1. 4th of October, 2006 to be precise.
  2. A blog that is hosted on a blog platform but uses its own domain name, i.e. blog.twingly.com which is, infact, hosted on wordpress.com

We have a new site!

We’re very happy to announce the remake of twingly.com! Besides the fresh looks we’ve done extensive curation of our content. We have less pages and more concise and clear information, twingly.com is also faster and more accessible for new devices.

Twingly.com 2015-02-12
Twingly.com

Our popular blog search engine is still available at twingly.com/search. Twingly’s blog index is bigger and faster than ever and if you want your blog indexed, feel free to send a blog ping.

Technical documentation for our services has moved to developer.twingly.com. The new developer site is updated much more often than the old PDF-based documentation. Now it’s also possible to subscribe to our updates, since it’s also a blog.

Oh and if you missed it, the free beta signup is open for our Blog Relationship Manager – a tool to help companies with Blogger Outreach.

This is just the start, we’ve got more in store for 2015!

Looking forward to a bright 2015

Our Blog has been quiet for a while but with a new year to come we feel it’s about time to wake it up and give you a Twingly update.

In our opinion, blogs as a medium has found its place in the social media landscape. Facebook is more and more looking like a news flow of commercial ads. Although Twitter and Instagram are great mediums for expressing short opinions or thoughts, the blog offers a platform where the owner can elaborate on the subject and really explain why they have a certain opinion or thought about something. The blog offers space to express why a blogger likes a certain jacket for example, not only that they like it. The fact that a blog post gets indexed by search engines like Google and can be found later, makes the opinion expressed more sticky compared to other social platforms. That means more people over time can read the review of the jacket.

More companies tend to understand the value of blogs and look for bloggers to cooperate with about campaigns and blogger outreach regarding products and brands. We at Twingly have worked with blogs for over 7 years now and noticed that companies have a hard time finding the right bloggers for the right subject. Therefore during 2013 we started building a tool to make our blog data accessible in a usable way for more companies. A Blog Relationship Manager (BRM) tool that will help, especially eCommerce, to find the right bloggers for their brand and manage them in an easy way.

Since we are managing large sets of data there are a lot of challenges in the development to get all the bits and bytes right. But the progress has been good and we have at beta tool used by a quite large number of our customers today. The feedback has been positive and confirms that we are on the right path with our BRM tool.

And therefore we made the BRM beta trial available for everyone, you find the signup here: Blog Relationship Manager

This is a big step on a new and fun journey for Twingly that we all are looking forward to, and we hope to have you on-board on that journey too.

We wish you a Happy New Year!

happy-new-year-2015

Twingly connects Bubbleroom and fashion bloggers

Fashion bloggers regularly link to clothes and accessories on e-commerce websites. With eTrade we offer online shops a great solution to increase the number of incoming links from blogs. Dozens of major e-commerce sites are already integrated with eTrade, and today we have the pleasure to announce one more: The Nordic fashion retailer Bubbleroom. Since this week, every Bubbleroom product page contains the Twingly widget which shows incoming links form fashion blogs (example here). Apart from the benefits for the shop, it’s an added value for its customers as well, who can get opinions on specific fashion items from fashion bloggers.

We used this occasion to ask Kaja Braendengen, project manager at Bubbleroom, a couple of questions about the importance of the fashion blogosphere for Bubbleroom. She is also the co-founder of the Norwegian fashion website/blog Modette.no, so she is really an expert on this topic.

Why did Bubbleroom choose Twingly? What are your expectations?
We chose Twingly because we want to work closer with bloggers and it’s a really great service that benefits both us and the bloggers. Our goal is to get more links from bloggers and increase the knowledge of our brand. The links we’ll get will hopefully improve our search engine visibility even more.

Kaja BraendengenHow important are blogs for your business?
The blogs and especially the fashion blogs are very important for us – bloggers today have a lot of power in the fashion business and they have a major impact on our target audience, women 15-35 years old.

What would happen if all blogs would disappear from one day to another?
That would not be good at all, and we would lose a big channel that affects what people know about Bubbleroom and which also is a big sales channel for us.

How do you encourage bloggers to link to Bubbleroom?
We have a lot of collaborations with different fashion bloggers and they often write posts about us and our products and link to our site.

How much time does the company invest into the work with social media in general and blogs in particular?
I work with both social media and blog collaborations. Every day we keep in touch with our customers and style setters through social media, it’s very important to get feedback from this channel as people are honest and direct, and we can respond quickly to their questions. I spend at least a couple of hours a day with these channels and in contact with bloggers.

In Sweden, many of the leading blogs are about fashion. Is it similar in Norway? What are the main differences between both (fashion) blogospheres?
Yes, it’s similar, all the biggest bloggers in Norway write about fashion as the main subject and that is the most popular theme. The blogospheres are quite similar is my opinion but the blog phenomenon is much bigger in Sweden.

Do you think that the impact fashion blogs have on the fashion industry and sales will increase even more in the future?
As more and more people get access to the internet, the blogs will become even more influential. Sales will depend on your visibility online so therefore it’s crucial to build loyal relationships with powerful bloggers.

Now providing API clients with 12 month of blog data

A few weeks ago we announced a few initiatives to make our blog data even better so that our API clients can get even more compelling and complete insights into the world of the blogosphere. As we put it: Our goal is to have the best blog data in Europe in terms of coverage, quality and immediacy.

Now we again have good news for our API clients: Until recently we have only had 4 month of historic data available through our Analytics API. As of now, we extend that data to a full 12 month, with no extra charge. With that change, we can provide companies and organisation with even better, more complete data.

At the same time we also upgraded our search language which will make it possible to search for single characters. Previously two character words were the shortest query. This is something that our customers have requested and we are happy to finally release these features. This upgrade will also make the data available faster. Before it took up to 15 min from finding a blog post to making it searchable in Analytics, going forward it will only take seconds.

And by the way, our goal is to make all our data starting from 2006 available through our Analytics API, so there is more to come!

For all users of Twingly.com the upgrade means that the public blog search can provide 12 month of historic data as well.

Blogs have bigger influence on purchase decisions than Facebook has

Technorati, the famous, decade-old blog search engine and blog portal, has released its 2013 Digital Influence Report, which replaces the historical State of the Blogosphere report and deals a lot more with branding and social media marketing. You can download the full PDF here, it’s an interesting read as usual.

We want to highlight one aspect which the Technorati report uncovers and which we find quite fascinating: On page 16 Technorati published a diagram showing which online services are most and least likely to influence online purchases (in the U.S.). The top spot goes to retail sites, where the likelihood that they influence a purchase for obvious reasons is very high, 56 percent. Brand sites do also influence people’s buying behaviour with 34 percent. But close behind, with 31,1 percent, come blogs. In other words: Blogs influence purchases on the web more than Facebook (30,8 percent), YouTube (27 percent) or Twitter (only 8 percent).

That is quite an astonishing result and proves once again the power of blogs.

Technorati

Because of the relevance that blogs do have when people make purchasing decisions, the distribution of brands’ social marketing budgets doesn’t seem to follow any logic, as pointed out by Patrick Lambert on his blog. He highlights another chart from the Technorati report that shows the social budget breakdown of brands. The main share of brands’ social budget is being pumped into Facebook campaigns: 56 percent. YouTube and Twitter each get 13 percent of the total money spent on social media marketing, and only 6 percent are being put to use on blogs.

So while blogs have a bigger impact on purchasing decisions than Facebook, they only get a fraction of the brands’ advertising dollars.

That obviously doesn’t make sense, and it presents a big opportunity for all companies and brands that understand the power and relevance of blogs in the social media marketing mix.

All that means that 2013 could be the year when blogs get back into the brands’ focus. According to the statistics above, they should.

via Bisonblog

Twingly helps your organisation to launch a blog competition

From time to time some of our clients ask us whether it would be a good idea to launch a competition or sweepstakes involving blogs. Often, our answer is “yes”, since we know how much awareness about specific topics, products or initiatives can be raised through blogs. And many bloggers enjoy the occasional competition, especially if they can win attractive prizes.

Because of that, we decided to offer blog competition as a service primarily to our clients but also to other companies. If your organisation is interesting in engaging bloggers in a specific issue or campaign, we can support you in making that happen.

This is how Twingly helps you to launch a blog competition:

– Twingly creates a unique profile with your organisation’s branding on Bloggportalen.se, one of Sweden’s biggest meeting places for bloggers and blog readers, with over 118.000 registered bloggers. The profile also includes instructions for bloggers on how to participate in the competition.

– Twingly sends a newsletter to about 10.000 bloggers within the target group of the campaign, telling them about the competition.

– Twingly uses other high-traffic-channels such as the Bloggportalen.se homepage to promote the competition.

– Twingly keeps track of the participation and provides your organisation with the material you need to select the winner as well as with statistics on how the competition went.

Blog competition


One of the companies that launched a blog competition with our help is the leading Nordic e-commerce site CDON.com. The campaign asked bloggers to write a post about their favourite movie or TV show, including a link to the movie’s or show’s product page on CDON.com. The community of Twingly-owned bloggportalen.se got then the chance to vote for their favourite blog post. The winner of a gift card worth 15.000 SEK was chosen by CDON.com based on the number of votes and the movie/TV show review. The bloggers ranking on position 2 to 5 were each given a gift card worth 1.000 SEK. Here is the final top list.

Blog competitions are a great tool to spread the word about your organisation, product or issue, while at the same time giving something back to the bloggers.

In case you want to learn more about blog competitions and how to launch your own, drop us a line at sales@twingly.com! We look forward to hear from you.

“All changes should make things better”

To create great products for our users and clients, we need the best developers. Recently, two new hires have joined the Twingly development team. You already “met” Magnus Hörberg. Today, you can learn a bit about our latest addition to the team, Johan Eckerström.

Hi Johan. Please introduce yourself.
I’m a 26 year old guy originally from the Swedish town of Norrtälje, but for the last six years Linköping (where Twingly has its headquarter) has been my hometown. I still think of myself as a student, even though I quit studying and started working almost a year ago.

Why did you quit studying?
Well, I simply got tired of it and wanted to get practical experience instead. So I joined a small consulting company here in Linköping. We worked on customer specific software systems. It was great fun but I’ve always wanted to work on an in house product. So when I heard that Twingly was hiring I got excited, since Twingly has several products and great knowledge how to handle lots of data. I thought it would be a great fit.

So handling of data is your special area of interest?
Yes, I get the most satisfaction from software development when I’m building software that refines data and produces some kind of result that can help humans learn more or process more information. Writing something that can handle data in a smart way or at a scale that isn’t possible for a human is a lot more interesting than just automating simple tasks.

When did you start to become fascinated by data?
I think it all started when I learned to program a computer. I wasn’t really into computers and software until I was around 15, when I discovered Unix systems. Shortly after reading about Unix I installed OpenBSD. Once I got exposed of the Unix environment I got really interested in programming and what you can get the computer to do. From there it was a very natural step to work with data and think about what you can do with it. One of the first actual useful programs I wrote was a small manager for my digital photographies. Since I’m a very enthusiastic photographer I get loads of image files, so I needed something to analyze and sort the data. It was pretty basic and just used the EXIF data but still it helped me manage loads of files. At the university I worked on a few projects outside of the curriculum for student organisations, and I learned a lot from building those systems.

Do you still work with own projects?
I probably start a new project every week, but only a handful ever get completed during a whole year. : ) It’s mostly small tools that helps my everyday life. Like analyzing my inbox for digital receipts and summarize my costs in a spreadsheet, or an SMS service that can aggregate what the nearby restaurants menus have to offer and help me decide what to eat for lunch. I’ve stopped trying to build software for my photos since Aperture and Lightroom do such a great job helping me to manage it.

At Twingly, what tasks will you focus on?
I’m primarily a backend developer so everything behind the scenes. All members of the development team here at Twingly are working with both development and operations. One of the great benefits of working at Twingly is that the tasks are so diverse. During a work day I might be deploying new servers, make infrastructure changes for the systems and develop new features for our products. We have a lot of interesting projects planned for 2013, there will be lots of changes made to the infrastructure and to support new features for our customers.

What motivates you the most, what helps you to find the best solutions and creative ideas to solve problems?
My primary motivation is that of the end user, I want to build services that help others to do their business and simplify their workflow. All changes should make things better, and this applies to the software I write, if the solutions are small and elegant, I can more easily make changes and improve the experience for the end user. Since we are a small team it’s very easy to use different tools and find the right tool for the job. I prefer to discuss the hard problems during a cup of coffee with my peers, to find new angles to attack the problem. I get a lot of inspiration and ideas from following the developer community on GitHub, I try to read all technical articles I can get a hand on. Even if it has nothing to do with the work I’m doing right now, it always comes in handy in the future.

How will the web look like in five years? and how will that influence your work as a developer?
I hope the web will continue evolve as it has during the last five years. I sincerely wish that the standards for making web sites will keep evolving and that the browsers and tools will evolve in a coherent way, since that would make my work easier. : ) As for new services and sites, I think that the HTML5 standards will enable people to build a lot of awesome products that haven’t been possible in the past. As for my work as a developer, I will be able to build new solutions that are even more useful and easier to use. I have great hopes for the future on the web!