BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

How Tumblr And Pinterest Are Fueling The Image Intelligence Problem

Following
This article is more than 10 years old.

The Crippling Image Intelligence Problem is a four part series that focuses on the evolution of content creation from its infancy on Facebook through content logical development into more substantive mediums like the new, massively popular, low friction, low investment content creation platforms like Tumblr, Pinterest, and Posterous. The deluge of new content generation is fundamentally transitioning from text-dominated content to the more image-intensive content. This creates two main problems:

1) These content creation platforms powering the content explosion are geared for generating audiences of scale that incentivize the prolific creation of low-investment, transient content dominated by ‘dumb-images.’

2) Current image search technology, dominated by Google Image Search, is not properly positioned to effectively address the dearth of contextual information within images to facilitate scalable diffusion & discovery of content - meaning that an image dominated web has a crippling image intelligence problem.

Part 1, How Tumblr Drove the Evolution of Content into an Image Dominated Experience, illustrated the rise of content creation platforms driving massive increases in content & the progression of content into an image-dominated medium. This article focuses on how the frictionless, low investment strategic posture of content platforms fuels the image intelligence problem & why this is a problem for the web as whole.

Part 2, Tumblr & Pinterest Are Fueling the Image Intelligence Problem, is a deep dive investigation into the driving forces that profligate dumb-images (i.e. images that are only valuable to the viewer, not indexed in way that creates value for the entire audience, and do not present a monetizable business proposition for sustainable business) and the dearth of technology solutions to create image intelligence.

===================================================

Matthew A. Carroll currently runs an outdoor brand Cloven Footwear (raised $4.1m in Nov '10) and sits on the board of two tech startups in San Francisco, California.  You can follow (and show some social love) via @Fail_Harder, FailHarder on Facebook, and Quora.

===================================================

This analysis is going to focus on understanding the scope of the problem with image intelligence from the perspective of style and fashion. There are three main reasons why:

1. Consumer fashion brands are becoming increasingly reliant on individual content creators to promote a brand in the marketplace. As such, solving the image intelligence problem for consumer fashion brands presents a significant challenge that presents a significant monetizable value proposition in its solution. Additionally, I currently run a men’s footwear brand, Cloven Footwear, and have a substantial interest in figuring out this solution for purposes of running a better company.

2. Style & fashion is dictated by herd-mentality trend-diffusion through populations that are highly influenced by ‘expert’ resource trust attribution (if the Men’s style editor for GQ says it’s cool - it must be cool) and visual pattern recognition of ascribable, congruent demographics:

One of the most important aspects of fashion is that it is a highly emotional experience for purchasers of the products & the payoff of it’s use. Humans are inherently social animals & have a powerful desire to be accepted by the herd (i.e. social & professional networks) – thereby, your choice of what you wear is a direct reflection of who you are as an individual (or how you want to be perceived as a member of the herd).

Over several hundred thousand years of evolution, humans have learned to draw immensely on visual cues as a means of assessing character risk about the people we interact with &, therefore, leveraging apparel as a means of establishing an individual’s “similar to me” visual references. Fashion is one of the main external manifestations of whom we are as individuals and serves as the impetus for visual feedback to observer about our personalities.

FailHarder’s Three Future Waves of Innovation in E-Commerce

3. The bulk of content creation (particularly from Tumblr) is in the world of fashion and style, where the majority of content is dominated by images contextualizing the article while text has been subjected to a supporting role. This means that extracting insight (i.e. intelligence) from images is an exponentially increasing problem that has incredible breadth in its application to content generation. To emphasize the scope of this problem, the following graphic illustrates the top 10 tags on Tumblr in 2011 - nearly all of these tags are image-related content with fashion / style (i.e. vintage in the graphic) falling into the top 5 slots:

This world of content creation in style and fashion can broadly be broken up in to two main categories:

Traditional Editorial: These are the big player heavy hitters in the world of fashion & style content creation The majority of them are “carry overs” from the era of print media that have leveraged their offline market presence to build substantial online followings. Additionally, this category of content creators has a definite (*cough* dying) revenue model - display advertising based on the x00,000s to x,000,000s of print subscribers transitioned into online followers.

Independent Style Creators: This is the new medium of content creation dominated by empowered individuals who leverage the demand generation prowess of content platforms to build substantive online followings through aesthetic curation. These are individuals that are structuring their passions for style & fashion into a marketable content product & leveraging the demand generation ecosystems of content platforms like Tumblr, Pinterest, or Lookbook.nu to deliver Audiences of Scale in a way that enables them to compete with large, autocratic traditional channels. This category generally has followings in the x,000s to x0,000s & does not have a definable revenue model.

[Quick Summary] The Rise of Individual Content Creators

In Part 1, How Tumblr Drove the Evolution of Content into an Image Dominated Experience, illustrates how the assimilation of Facebook into lives of modern web users established a basic fluency with the content creation process through the assimiliation of user-driven engagement with the web by status updates, comments, and Facebook Likes. As this content creation on the micro-scale flourished along with user proficiency & sophistication, content creation platforms emerged to provide a more substantive medium for this newly empowered generation of users with the tools that empower expression and generate massive audiences to hear them.

One of the most significant platforms that revolutionized the content creation process was Tumblr who through collapsed the largest barriers in the content creation & distribution process:

1. Zero Technical Investment: Prior to Tumblr, the process of setting up a content channel (i.e. a blog) was dominated by WordPress & was a fairly complex process for the average web user. Tumblr’s hosted environment meant that any user could sign up and have a personal-style blog up and running in a matter of seconds.

2. Solving Audience Generation Problems: Tumblr was the first of these platforms to build a product that solved this audience acquisition problem by building a centralized content feed that proactively introduced other platform users to new content & thereby created an internal demand source that rewarded content creators

3. Reblogging Collapses Content Creation Investment Costs: A Tumblr Reblog takes a piece of content & publishes it to the Reblogger’s Tumblr blog & applies the Reblogger’s style - backgrounds, text, formatting, etc - thereby collapsing the information asymmetric competitive advantages of a piece of unique content (i.e. an article exists on MY site & you previously have to go there or share a link to direct traffic to there to consume the content).

In the case of Tumblr, the platform enabled users to voraciously latch on to new content & massively expand the audience while each syndicator enjoyed a certain level of brand development by virtue of the syndicated consumption taking place on the syndicator’s blog in a custom themed environment.

The Problem with the Rise of Individual Content Creation Domination

[Author’s Note: Although I am criticizing Tumblr & Pinterest pretty heavily, I do have an immense amount of respect for their incredible accomplishments and the insanely talented people that work there. They have empowered anyone to have a voice and be heard on an unprecedented scale. My main critique comes in response to the fact they should be striving for far more. In this new world, these platforms are in the morality business - and as my old buddy Thomas Aquinas says:

This is one reason why the village is eclipsed by political society, which proves much more useful to human beings because of its greater size and much more elaborate governmental structure. There is, however, a far more important reason why political society comes into existence. In addition to yielding greater protection and economic benefits, it also enhances the moral and intellectual lives of human beings. By identifying with a political community, human beings begin to see the world in broader terms than the mere satisfaction of their bodily desires and physical needs

Commentary on the Politics Book 1, Lesson 1 [31]

With the explosion of content, these platforms fuel low-investment quality that exposes some significant problems.

The Lower the Friction, Lower the Sense of Ownership:

Tumblr & Pinterest built products optimized for common web users to turn into content creators in ways that ostensibly were as close to absolute ZERO cost to ultimate end-users as possible - meaning that these platforms have made it possible for all users to create new content nodes with very little investment & effort. This has lead to a nuclear explosion of content as these content platforms enabled by Reblogs & Pins generated huge audiences of users fueled by the social-reward system community social proof & validating limited-investments. In this optimized world of content creation that does not cost a whole lot (a click of a re-blog button,in the case of Tumblr, creates branded content for distribution to followers), what is the point of taking care of content? This Low Friction, low cost (cost from a cognitive, time, or effort invested in the creation of a piece of content) create two very significant problems

This represents an economic trade off whereby the content creator gains exposure through the Reblogger’s network in compensation for a post-Reblog source attribution. For individual content creators - this trade off is necessary to build a sufficient audience of scale to deliver a piece of content. However, the certain dearth of investment required in the content’s generation lowers the sense of ownership to catalog & understand it.

The lower the investment cost the lower the onus for each syndicator to classify and contextualize the content - meaning that the demand generation model is a fire hose (redistributed to all followers versus specialized into a segments by interests) and thereby significantly losing value as the content diffuses through the feed consumption medium over time.

Resource Constraints Drives Intelligence Dearth

Traditional content creators like GQ or Monocle magazines have a revenue model to hire and distribute the workload involved in the publication of a magazine to specialized contributors (i.e. a photographer, stylist, writers, editors, and graphics). All of these contributors absorb themselves in the niche of the publication in terms of their work and social networks (if I worked for GQ odds are that I am buddies with the folks at Details, Esquire, or Maxim). They are consistently interacting with one another about style trends and structuring the language as it relates to a particular development in the marketplace.

Editorial Standards: This editorial class can subsequently guide how their readers relate to a particular trend or subject matter by establishing a formalized classification. For example, a particular editorial team must develop a structured method of communicating “what goes where” - that articles will be filed in category x or category y.

Distributed Experiential Knowledge: The benefits of a traditional editorial content creation system is that there are numerous players constantly working to make the content more effective. That means communication of what works - including classification of articles, language, and phraseology are ll consistently refined to tweak the message (content product / market fit), leverage industry standard best practices, and be most effectively positioned for content discovery through search (i.e. using tools like Google Keyword Tool).

Additionally, you have specialized vertical stakeholders that have the experiential knowledge about what keywords work best (come on - do you really think the average Tumblr blogger is going to hit up Google Keyword tool for each post?).

Now, let’s switch roles to the independent Tumblr content creators. This is a world dominated by volume & most independent content creators are struggling to pump out sufficient content to generate traffic & followers. Subsequently, this category is at an inherent & substantial disadvantage based on resource constraints (i.e. only one person) & high investment costs (i.e. I have to read 100s of blogs about SEO/PPC/Copy-Blogging/Emerging Social Strategies to become an expert at all subjects in contrast to the highly specialized & experienced paid-folks on the traditional channels - driving all users to need to re-learn every mistake the hard way [something that I personally am a fan of, but realize that it’s impractical in the aggregate]) t0 imbue content with any intelligence and maximize an articles effectiveness.

Furthermore, classification of tags are highly subjective. For example, here are two examples of how I would classify two different “looks” on style.

These are highly subjective based on what I see in an image & my experiential knowledge - it’s a all based on my frameworks. However, this is hardly something that is codified or how the proverbial “you” would refer to these images.

Transient Content:

Quora and Flickr are probably two of the very few internet eco-systems where the cost to play the game is very high: it takes a lot of work, skill, and earnest interaction to jockey for an identity within these communities. The content deluge created by Tumblr incentivizes the users to create new content to get get into the Tumblr feed rather than discovering old content.

The primary goal is to create new content nodes without fundamentally imbuing new value in the network. This is especially a problem in respects to image dominated platforms like Tumblr & Pinterest over text/editorial heavy platforms like Quora that can offer text-based context.

The most value comes from properly timing the right timing windows (absent workflow demand drivers like email notifications or product notifications like a Twitter @ reference). Thus leading to the classic maxim - quantity has a quality all of it’s own &, while true, it’s what generates traffic.

2.4   Current Computer Vision (CV) Technology Isn’t There:

In previous major tech innovation periods (i.e. Microsoft’s Win95 era in the late 90s, Google’s ascension to Search dominance in the early-00s, and the Apple iPhone’s decimation of the mobile establishment in the late-00s), major tech giants’ R&D & product development divisions blazed the trail in solving huge technical challenges, like Google did for Facebook with MapReduce & BigTable. As these majors capitalized on the solutions for solving these technical barriers, they successively disseminated & distributed the experiential knowledge & technological know-how (in the case of Microsoft & Google) championing a new wave of innovation by entrepreneurs & developers. Additionally, in the case of Apple’s iPhone, Google’s Android, or the Google Apps Marketplace cultivated a substantive demand source for the monetization of a platform’s necessary vertical product-line extensions and built entirely new marketplaces overnight.

In respect to image search, the bulk of hardcore technical developments are being done on the low-value personal photo side (i.e. Facebook Photos of you and your friends at dinner - no offense). That being said, Facebook Photo team is making incredible technical developments in massive uploads of user photos (i.e. 7.5 billion photos per month: 250m photo uploads per day * 30.4368 days per month).

The technical developments for delivering photos & images at scale is enabled through Facebook, but the very little is being done in the contextual understanding of photos in more substantive content creation.

I’ll be the first to admit that I am pretty JV (Junior Varsity) about my development skills, but I have a lot of fun keeping up with and reading most of the academic literature published about what’s going on in Computer Vision (CV) & Image Recognition (IR) - TechCrunch recently profiled some of the Google-financed research coming out of CMU - bitly.com/sOne7i)

Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Ensemble of Exemplar-SVMs for Object Detection and Beyond

TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

This graphic basically illustrates the current state of image search technology to provide programmatic context to an image:

That’s all fine & dandy - but that doesn’t do a whole lot for substantively helping to inform images for purposes of attribution. Based on the prevailing categorical level of search intelligence - let’s take a look at how that “would” apply to a standard grey sweater.

This is primary reason that image profileration as the dominant aspect of content presents a huge problem for which we do not have a good technical solution. As the content generation platforms enjoyed voracious growth cycles they have been strategically focused on building the platforms to scale for user growth. This is necessary but now we have a billions of pieces of content that we are increasingly unable to extract insight/intelligence from their consumption.

Notice how ostensibly they all have the same categorical tags? This means that to the computer (i.e. the one performing the search and enabling insight) is essentially ZERO identifying information that would be able to differentiate one sweater from the other once you remove it from these “training” images.

For example, if I take a photo of myself wearing a grey v-neck sweater then image search cannot provide any intelligent feedback to the user about who means the sweater in order to deliver value. The valuable insight must come from me - the content creator (the phototaker) & content creator (i.e. the person posting it to Tumblr) in order to guide other users & help the entire community extract value from it.

2.5   Understanding the Scope of Image Intelligence Problem

Content Creation is Image-Dominated Based Not Text

To illustrate this problem of Image Search, there are three samples of a user created content pieces that destroy any embedded intelligence of a particular photo through its dissemination through a content creation platform (in this case Pinterest)

1. Content Screenshot & Visual Search: I took a screenshot of the formatted content article as it would be discovered through the “Feed” - in this case the Pinterest product feed. The logic is to define a pure image stripped of embedded tags & source qualities of a

2. Platform Download & Search: The Image was downloaded from the platform (as it would be on Tumblr) and subsequently uploaded to Google Image Search. The logic is to have the embedded tags that a content creation platform would be appending to an image.

3. Content Creator Source Download & Search: The content piece was traced back to the original source - the acquisition point of the content piece in the content creation platform. The logic is to have the original source’s embedded information for Image Search facilitation.

Now - let’s see how all three of these image searches in delivering relevant information about the original content piece.

Now - let’s analyze how these same image search yield entirely different results:

What does this disparity tell us about the current state of image search? It sucks - there is almost no intelligence baked-in in to image search.