blinkx under the hood by tommy rasmussen

The Video Lifecycle is an involved, multi-step process that requires technical solutions and services at every stage. blinkx is the only technology company to offer a full suite of solutions for media ownersâ&#x20AC;&#x2122; and publishersâ&#x20AC;&#x2122; requirements at every step of the Video Lifecycle.

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER

LOCATE Today, most video content is captured and stored in encoded digital files. These files may be stored in a private content management solution on a secure, private network or they may be publicly available on the Internet. The remainder of video content is maintained as legacy archives in non-digital form. blinkx has developed the blinkx distributed Video Fetch Server (dVFS) and blinkx offline Video Fetch Server (oVFS) to traverse and identify content in digital and nondigital forms, respectively.

BLINKX DISTRIBUTED VIDEO FETCH SERVER (dVFS) The blinkx distributed Video Fetch Server is a configurable server that is capable of interfacing with a variety of video repositories (including over 50 distinct database and content management systems) in order to locate and extract video that is contained within them. The blinkx dVFS is architected as a parallel server that runs multiple spider modules which each trawl the repositories to be indexed. While some private systems can be relatively straightforward to interface with, offering some form of public interface or export function, others are extremely complex to spider. The blinkx dVFS is able to automatically aggregate from all with minimal manual configuration.

The award-winning blinkx dVFS Web Edition Spider has played a major role in building blinkxâ&#x20AC;&#x2122;s video index - now the largest in the world. The Web Edition module is available exclusively to our customers for spidering any sites that are of interest to them. As with the standard dVFS, the Web Edition is fully parallelized in order to improve scalability and is built on a complex process architecture that supports dynamic resource allocation and module self-replication in order to automatically match the scale of the task set before it. The spider trawls link structure, automatically biasing towards likely sources of video content in order to improve video yield.

In addition to static and dynamically generated HTML, the blinkx dVFS is capable of analyzing and processing popular Web scripting languages (including JavaScript/AJAX and Flash) and uses variable substitution and code pre-compiling techniques to infer content that is hidden in dynamically generated pages.

BLINKX OFFLINE VIDEO FETCH SERVER (oVFS) If stored on tape or disk, offline video content must be captured and encoded before it can be used within a digital medium such as the Internet. The blinkx offline Video Fetch Server (oVFS) facilitates this process by interfacing with standard video capture cards in order to capture streams of analog content from offline stores (such as tapes) or directly from live-air broadcast.

Parallel and massively scalable automated spiders that intelligently focus to find audio/video content Render page fully in memory, allowing videosâ&#x20AC;&#x2122; contextual metadata to be indexed Ability to index any video from any page, regardless of format (including Flash) Automatic generation of thumbnails, previews and word-timing Agnostic approach, metadata, video analysis, speech recognition and closed captioning blinkx dVFS supports MySQL, Oracle and all other common databases and standard C.M. systems blinkx oVFS interfaces directly with offline video sources including most tape and disk formats, satellite, cable and terrestrial broadcast Native support for over 50 common database and content management systems

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER

PROCESS Once a digital video has been found or created from an analog original, the next step is analysis. This process involves breaking down the piece of content into several of its constituent components and processing each one to fully understand the videoâ&#x20AC;&#x2122;s overall meaning. The resulting data can serve as a foundation for functions such as search, organization, selection or suggestion.

BLINKX VIDEOLOGGER VIDEO ANALYSIS MODULE blinkx processes video content using a multithreaded server technology known as the VideoLogger. In constant development for the past ten years, blinkx’s VideoLogger is a central management service that is responsible for marshalling each piece of content through one or more analysis modules that extract information. The VideoLogger can be used in isolation but is usually coupled directly with the Video Fetch Server (VFS) modules. The VideoLogger analysis modules use a variety of advanced image and audio analysis techniques to automatically extract information about a video in real-time, facilitating the creation of a rich

index that accurately describes the video content. This precise, time-stamped index provides finegrained access to the video content that can be used to efficiently search and locate a specific video segment for playback. Used together, the VFS and VideoLogger modules can simultaneously index and digitize (encode) input content to transform the video and audio assets into accessible, Web-ready content. Existing methods of making audio and video searchable rely on either textual metadata (added by professional editors or end-users as part of a ‘tagsonomy’) or closed caption data that is added during the television production process. Both of these approaches are significantly flawed.

Metadata is descriptive information (e.g., summaries and tags) created by a videoâ&#x20AC;&#x2122;s original editor but often omits aspects of a video that may be of interest to others. Typically, it is only applied on a per-video basis, offering generic summaries of a video, and is therefore an ineffective method of providing users with precise, granular descriptions of the subtle details in a clip. Additionally, the practice of free tagging, especially when opened to a community, is prone to spamming - where users falsely apply descriptors to content to subvert the search process.

Closed captioning is flawed primarily because it is generated by human transcribers who can suffer from high error rates. Furthermore, closed captioning is extremely rare on the Internet; recent research by blinkx suggests that less than 0.001% of all Internet video content contains any closed captioning. Even in cases where closed captioning exists, the majority of these videos only have basic titles that mark a content segmentâ&#x20AC;&#x2122;s beginning and end - rather than a complete transcript. If they exist, the blinkx VideoLogger does extract and use metadata and closed captioning as the first step in the indexing process. In addition, blinkxâ&#x20AC;&#x2122;s technology utilizes advanced speech recognition and visual analysis techniques to analyze and understand the spoken word and

visual content of an audio/video file, ensuring unparalleled comprehension of online multimedia content. The VideoLogger can also control the encoding process using third-party encoders that output in popular formats including MP4, Flash, Real and Windows Media. Using an output module, the VideoLogger controls both indexing and encoding processes to ensure synchronization between the metadata captured from the video asset and the associated digital file. The output of these analytical modules are stored as further metadata tracks alongside the digitally encoded content itself; not only does blinkx know what was said, blinkx knows exactly when it was said.

Once the video or audio stream has been indexed, encoded and analyzed, the digital video files and the descriptive analysis output are stored in the blinkx video index.

AUDIO ANALYSIS MODULE blinkxâ&#x20AC;&#x2122;s audio analysis technology uses advanced statistical methods to deal with all aspects of processing the digital audio signal from an audio or video stream. It employs a wide range of recognition technologies â&#x20AC;&#x201C; from keywordand phrase-spotting to continuous vocabulary speech, speaker and language recognition. In order to analyze the spoken words of an audio or video stream, blinkx uses audio analysis techniques that are based on neural network technology and Hidden Markov Models (HMMs) to construct an efficient acoustic model that can provide a fast, accurate and dynamic solution within rapidly changing acoustic environments, such as radio and television.

The technology is based on decomposing digitized speech into its phonetic constructs. The phonetic sequence is then analyzed in conjunction with the acoustic model and statistical probabilities to calculate the most probable sequence of words and utterances. LARGE VOCABULARY RECOGNITION Unlike traditional speech recognition systems which have fixed vocabularies, blinkx supports large text corpora, including hundreds of millions of words that can train the system and refine its accuracy according to the specific requirements of its customers.

Using patented predictive technology, blinkx’s speech analyzer can offer users the benefits of a large vocabulary speech recognition system without the overhead of a vast search space. Rather than relying solely on existing metadata to describe an audio or video clip, blinkx has the ability to retrieve a wide range of multimedia content based on the words spoken in the television or radio clip. SPEAKER INDEPENDENCE Whereas other approaches require training data from specific speakers to realize their full potential, blinkx performs consistently well across a wide variety of previously unrecognized speech sources. Transcription of speech and segmentation by speaker requires no initial per-speaker training because blinkx’s underlying technology

was developed as a tool to maintain inter-speaker independence - not to be a single user transcription tool. NON-DICTATED SPEECH Information feeds, such as news broadcasts and radio, are often difficult to transcribe due to noisy conditions and less-than-perfect articulation. blinkx’s sophisticated signal processing and statistical techniques enable the transcription engine to filter out extraneous noise, compensate for low volume levels and probabilistically predict intended dialogue.

PHONEME-LEVEL PHRASE AND WORD SPOTTING blinkx breaks down all forms of speech into phrases, words and phonemes (the smallest sound units in a particular language), offering an exceptional granularity of understanding. MULTIPLE LANGUAGE MODEL ARCHITECTURE blinkxâ&#x20AC;&#x2122;s core technology is entirely language independent, enabling multiple languages to be simultaneously processed and searched. blinkx combines both phonetic and conceptual methods to disambiguate the limitations inherent in traditional approaches, combining more accurate language recognition with effective information retrieval. Besides also supporting traditional legacy techniques such as keyword-spotting and

Boolean protocols, blinkx VideoLogger enables users to search audio data from a range of sources using multilingual natural language queries. IMAGE ANALYSIS MODULE blinkx VideoLoggerâ&#x20AC;&#x2122;s advanced video capture and analysis technology also utilize neural networks and HMMs to optimize the encoding of content in real-time. A comprehensive range of media analysis plug-ins allow for the automatic creation of metadata and the ability to search entire media streams or clips by a range of parameters such as audio, scene, speaker, location, key frame, image, on-screen text, face, token and concept.

By making video easy to identify, locate and re-use, blinkxâ&#x20AC;&#x2122;s VideoLogger allows the elements to be assembled and repurposed faster and with greater accuracy than ever before. blinkx is capable of a wide range of intelligent video analytics functions, including:

LOGO AND SCENE-CHANGE DETECTION The blinkx VideoLogger is capable of automatically detecting, analyzing and interpreting all activity within video data and can, for example, interpret and understand the significance of specific images or note scene changes. Using advanced techniques, the blinkx VideoLogger identifies and categorizes objects in a scene by size, shape, color, speed, direction, location and

time of day. Additionally, it utilizes techniques such as comparing object histories, motion detection, object sizing, object tracking, object counting and behavioral analysis; putting each object and motion in context.

FACIAL IDENTIFICATION blinkxâ&#x20AC;&#x2122;s video analysis also offers powerful biometric identification tools to enable facial recognition. However, traditional 2-D facial recognition has fundamental limitations with regard to posture, expression and lighting. The blinkx VideoLogger employs superior three-dimensional recognition techniques rather than 2-D facial matching processes for optimum performance.

ON-SCREEN CHARACTER RECOGNITION Neural networks based on optical character recognition techniques allow the blinkx VideoLogger to support advanced character recognition. Unlike template-matching used by other systems, which is dependent on receiving highquality images, blinkxâ&#x20AC;&#x2122;s visual analysis techniques provide much greater tolerance for matching poorly-defined characters. With the ability to integrate with multiple databases and automatically cross-reference and correlate identified characters with other data, the blinkx VideoLogger offers the most sophisticated, comprehensive, end-to-end solution which encompasses every aspect of character recognition together with advanced recording, retrieval and analytical capabilities.

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER

INDEX When a user requests a specific piece of content or a suggestion of something new to watch, that request is processed by the blinkx Index. The Index uses a complex, multi-dimensional, pattern-matching process to compare the request to its records on each piece of available content, and then uses its findings to create a list of the most relevant suggestions. These suggestions are fed back to the user, either in an ordered results list that can be organized in a number of ways, or by creating a channel or playlist of content pieces that can be consumed sequentially. The blinkx Index is a platform-indepedent server that typically runs as a single virtual service

supported by multiple actual nodes, running on distinct physical machines that can be in multiple locations. blinkxâ&#x20AC;&#x2122;s Index architecture supports linear scalability, redundancy and faulttolerant functionality and is supported by a fully-featured automated service architecture that is able to automatically identify service requirements and provision resources as required.

CONTROL The blinkx Control layer is accessed through a large family of functions - each of which represents a different way of expressing a particular requirement to the video index. Each causes the index to react in a different way, meeting

the need through analysis of the many content records at its disposal. These functions are split into three broad groups: Search, Category and Community.

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER

SEARCH FUNCTIONS These functions allow a user to explore and search the blinkx Index with a high degree of control and accuracy. blinkxâ&#x20AC;&#x2122;s technology supports keyword search, Boolean search, conceptual search, automatic hyperlinking, fielded or meta-search, federated search, parametric search and guided-navigation.

keyword boolean phrase conceptual contextual hyperlinked parametric guided clustered

CATEGORY FUNCTIONS blinkx Category functionality facilitates the definition of a manually designed or automatically inferred taxonomy or ontology of subjects and topics. This taxonomy can then be used as a basis for the automatic organization of contentâ&#x20AC;&#x201D; incoming media is automatically sorted into suitable categories, allowing for easy retrieval.

- manually defined taxonomy and ontology - automatically generated taxonomy and ontology - automatic categorization - automatic topic clustering

Systems powered by blinkx are capable of intuitively sorting information on a grand scale, with the assistance of Category functions. blinkxâ&#x20AC;&#x2122;s Category functions also support Topic-based Clustering that can identify trends within incoming data. For example, Topic-based Clustering can spot breaking news topics or suggest a taxonomy that will best describe a particular corpus of video.

- theme spotting - user and community categorization

COMMUNITY FUNCTIONS With the Community functions, a blinkx-powered system can infer profiles based upon the consumption and creation of content by users and groups.

‘leaders’, suggest user groups based on those who have similar interests and recommend content based upon the creation, consumption and sharing of content.

blinkx supports both implicitly generated profiles that are built on automatic observation of user actions, and explicit profiling based on user-driven preference setting and training. In all cases, blinkx understands that individual users can have many diverse interests - blinkx profiles are multi-faceted by design.

- thematic clustering - explicit and implicit profiling - alerting - community suggestion - group selection - audience/content demand profiling

These profiles can be used to support a wide variety of recommendation strategies: alert users to new content that will be of interest to them, automatically identify users who are topic

SORTING AND FILTERING Regardless of the function used, blinkx’s Index results lists can also be sorted and filtered in a number of ways:

SORTING

FILTERING

BY CONTENT PROVIDER: preferential weighting of content from specified sources over all other content providers in an index

SAFE FILTER: pre-populated filter which blocks inappropriate content in order to facilitate family-friendly results lists

BY RELEVANCE OR DATE: view content results based on best possible match to search query or freshness of content

QUALITY FILTERS: ensure that only content of certain quality levels (measured by frame rate, resolution, bit rate or destination network latency) are returned

BY ARBITRARY METADATA: sort a results list by any of the fields of metadata which describe each piece of content (e.g., sort by author, security clearance, number of comments or number of views)

FORMAT FILTERS: block content encoded in formats the user does not wish to view, allowing for return of files only in Flash, Windows Media, RealPlayer any required combination of content formats

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER

MONETIZE Persuasive automated advertising must combine historical, demographic knowledge of a user with an understanding of the content that he or she consumes at a given moment in time. Put simply, knowing who is watching what and why makes it easy to select highly relevant advertising. blinkxâ&#x20AC;&#x2122;s ad platform, AdHoc, achieves this by first capturing these inputs and then automatically synthesizing them to deliver a selection of advertising that aims to best monetize a particular user or content event.

UNDERSTANDING PEOPLE When combined with a form of user identification harness (such as a user account or Internet browser cookies), blinkx’s profiling technology can follow a specific user over an extended period of time. blinkx’s analysis of users’ media consumption enables it to automatically build a multi-faceted profile for specific users’ interests.

UNDERSTANDING CONTENT As described in the “Process” section, blinkx’s VideoLogger analysis modules can extract and understand all information about a video clip.

UNDERSTANDING INTENT Users often watch the same piece of content for different reasons. The key to capturing the intent of a given user at a specific point in time is to follow the search activity that led them there. If a user finds a piece of content based on search activity through a blinkx-powered engine, blinkx is able to capture both the search leading to the consumption of that content and also the failed searches that preceded it. SYNTHESIS AND SELECTION blinkx’s AdHoc platform combines the accessible inputs with manually defined business rules and applies the remaining analysis to databases of available ads. blinkx partners can decide which ad databases are leveraged – their own, the blinkx platform or third party ad networks.

Once relevant ads have been identified, AdHoc can deliver them in a variety of ways. BLINKX AD DELIVERY FORMATS PRE-, POST- AND INTERSTITIAL-ROLL These popular ads are full-screen video ads that are played before, after or during a piece of content (respectively). Though extremely arresting as they play directly in the video player window, roll ads are less popular because they interfere with the clip playback experience. Roll ads are priced based on an impressions model.

BLINKX UN-ROLL UNIT The blinkx Un-roll unit allows the viewer to engage with a brand continuously throughout a video. The experience begins with a branded curtain that draws back to reveal the video. As the video plays, touch-points such as overlay ads and logos appear at contextually relevant moments within the video, made possible by blinkxâ&#x20AC;&#x2122;s AdHoc technology. The video ends with a clear call-to-action and the viewer has the option to continue to the advertiserâ&#x20AC;&#x2122;s Web site. OVERLAY ADVERTISING Like user-initiated ads, overlays appear in the video player. In contrast, however, overlay ads are display advertisements in themselves, containing graphical elements and a form of messaging or call-to-action that is always visible.

VIDEO SEO While it is possible for users to further investigate by clicking on the ad, overlay advertising is generally priced based on an impressions model. TEXT AND BANNER ADS Text and banner ads are the most traditional form of Internet advertising and are used extensively in non-video content. However, blinkx AdHoc is format agnostic and can therefore apply its understanding of a viewer and content to these ads, selecting keyword and banner ads that are of relevance to the current user and/or content.

Online video requires much more sophisticated methods of Search Engine Optimization (SEO) than traditional text-based content, due to complexities and nuances inherent in its form. blinkxâ&#x20AC;&#x2122;s advanced technology solves this problem. Publishers can take advantage of blinkxâ&#x20AC;&#x2122;s unrivaled understanding of video content by integrating our technology with their Web properties to greatly augment existing SEO initiatives and strategies. AUTOMATIC VIDEO ANALYSIS AND PROCESSING To optimize video content for search engines, it is necessary to create as much textual information about it as possible, in order to maximize

ways the video can be searched and retrieved. Like traditional text-based technologies, blinkx generates textual information about standard titles, categories, and user-created tags, but in addition, blinkx actually listens to, watches, and reads video content. This means blinkx has the power to analyze and process not only textual content, but also audio and visual video components, using speech recognition and visual analysis technologies. These processes greatly increase the number of words associated with a given video, thus driving more traffic to it. blinkx’s enhanced sources of descriptive data also enrich SEO content by allowing Web developers easy access to video content, no infrastructure involvement necessary.

ENTITY EXTRACTION blinkx is able to generate a massive amount of textual information about a given piece of video, so it’s critical to be able to refine searchable information into the most relevant descriptive units or entities, that describe the video’s most basic components. blinkx’s technology automatically and accurately assesses context and pulls out key words, so publishers attract the most relevant possible audience to their video. Video entities such as names, statistics, and locations are extracted from information associated with a video object, like comments, titles, tags, and audio. After identifying these entities, blinkx tags a publisher’s content with broad descriptions of the video’s content.

These descriptions automatically populate the conventional components of a Web page, like titles, html, or optimized URLs, to make a page as accurately searchable as possible. Additionally, these extracted entities can be used to provide superior navigation via site structure, categorization, and taxonomical integration. CONCEPTUAL UNDERSTANDING For optimum SEO, it is essential to understand the key concepts or themes associated with a videoâ&#x20AC;&#x2122;s content, because they determine relevance and ultimately drive navigation. In order to establish these concepts, blinkx automatically extracts and indexes textual information about the video to further its conceptual understanding.

Video assets and generated tags are then scrutinized to identify overall concepts, to yield more accurate, relevant results than are possible with keyword-based search technologies. By recognizing complex concepts within the video, rather than simply scanning inferior tagging, blinkx determines relevance based on actual content, not subjective human interpretation. blinkx delivers advantageous ways for search engines to drive relevant audiences to publisher content by placing it in context, whether in search results or associated links.

LOCATE PROCESS INDEX CONTROL MONETIZE DELIVER DELIVER Once relevant videos and necessary advertising have been matched, the content must be delivered to the viewer. This is a two-step process: first, the multiple pieces of content are displayed in the form of a search results list. Then, once a specific video has been selected for viewing, it must be played to the user.

STEP 1: DISPLAYING VIDEO SEARCH RESULTS It makes sense for search engines to display results for text-based web pages as text to efficiently assist users. While many pieces of video content also have textual titles and metadata which could be used to provide relevant summaries, these are far-removed representations of the content itself. They force the user to judge moving images and sound simply on the basis of words. In order to allow for efficient user appraisal of results, blinkx has developed â&#x20AC;&#x153;Moving Thumbnail Generationâ&#x20AC;?, a unique, patented display method to summarize video clips.

BLINKX’S MOVING THUMBNAIL GENERATION blinkx’s patented Moving Thumbnail Generation technology analyzes every incoming video file and creates a number of visual thumbnails – short, compressed video segments that represent different points in time of a given clip. Thumbnails are generated either arbitrarily (e.g., every minute, every 10 seconds, at the start and end, etc.) or, more typically, based on specifically identified events within the video (e.g., the utterance of key words, the appearance of a famous face, etc.). Later, when a video is listed as relevant to a user’s search, blinkx not only returns the textual summary and title of the relevant video, but it also displays the Moving Thumbnail that most closely demonstrates why a given video is relevant to the search.

For example, in the case of a longer form video that covers more than one topic, the Thumbnail will feature screenshots related to the search query. A user is therefore able to swiftly view and assess how relevant a given video is to their. STEP 2: DELIVERY OF CONTENT In-browser streamed video is an inexpensive method of delivering content via popular streaming technologies such as Real streaming format, Windows Media, Flash Audio/Video and Apple’s Quicktime. blinkx supports all of these formats and, in addition, can deliver inbrowser video using a number of lesser-known standards. Direct streaming of video, on the other hand, is an expensive process.

To guarantee high-quality content and low latency, content providers often use a costly content delivery network that charges a high price per data unit of transferred content. With regard to online video, these costs can accumulate rapidly, especially in the case of popular clips that are viewed by thousands or millions of viewers in short periods of time. To maintain both quality and latency while dramatically reducing bandwidth costs, blinkx has built BBTV (blinkx Broadband TV) - a peerto-peer streaming and download technology that vastly improves the efficiency of providing large-footprint content to a user.

BLINKX HYBRID PEER-TO-PEER STREAMING blinkxâ&#x20AC;&#x2122;s hybrid peer-to-peer technology supports the on-demand streaming of television-quality content over a typical home broadband connection. For instance, BBTV uses seed servers to originate content and, as increasing numbers of users select and watch a given media file, redistributes the content to the network of peers. Peer-to-peer streaming shares distribution costs at various levels of a network hierarchy, resulting in lower delivery costs without sacrificing playback quality.

BLINKX DRIP-DOWNLOADING blinkx’s drip-download technology is a large-file download manager that allows content owners to slowly download media to a user’s computer over a period of time. The drip-download takes place unobtrusively in the background, automatically throttling its download rate based on user activity, thus minimizing reduction of a user’s productivity. Drip-downloading is an efficient, low-impact method of delivering extremely high quality content to a user when streaming which a peerto-peer approach simply cannot deliver. BBTV’s drip-download service is typically used to deliver HD or DVD-quality films at full-feature length.

BLINKX DIGITAL RIGHTS MANAGEMENT

In addition to employing blinkx’s peer-to-peer and drip-download technologies, BBTV also supports the Microsoft Windows Digital Rights Management (DRM) architecture. This technology ensures secure handling of owners’ and licensors’ content. Only authorized viewers can watch DRM-protected content and re-broadcast or share when explicitly allowed to do so.

DRM