POINT OF view
Metadata, Content Models, and Taxonomies: Strategies for Smart Content Management By: Laura Lerner, Senior Manager of Content Strategy, SapientNitro
TRUE or false? • Only developers care about metadata. • A content model is the same thing as a content matrix. • It doesn’t matter how content is managed as long as users get what they need. • Taxonomy is another name for a site map. If you answered true to any of these or aren’t quite sure, then you’ve come to the right place. Let’s clear up any confusion on metadata, content models, and taxonomies and put to rest any questions you have about what they are, the relationships and differences between them, and how they help make digital experiences work. METADATA Metadata is structured information that describes, explains, locates, or makes it easier to retrieve, use, or manage an information resource. And an information resource, in this case, is something content related: text, images, video, sound, or anything that conveys meaning to a user. Metadata has many benefits, including: • Enabling machine readability when describing things in their constituent parts, • Powering content management and flexible delivery to maintain content as a strategic asset, and • Enabling effective retrieval through disambiguation, that is, removing the guesswork about what content means. There are three flavors of metadata. They include administrative (date created, author, and version), descriptive (for example, category, audience, and keywords), and structural metadata (for example, title, file type, and related documents).
In this example from IMDB.com, you can see a few pieces of metadata for just one image (the movie poster) on this page, including the height, the alternate text, and the title. That’s just a small portion of the metadata that’s been captured for this page.
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view CONTENT MODEL The content model is meant to create order from all that metadata. A content model: • Specifies the metadata that is required to manage a digital experience. • Provides the content blueprint and architecture for static and dynamic content by capturing system data and rules. • Enables CMS development by documenting requirements. So, let’s take a look at a high-level view, using IMDB.com again as an example. We’ll start with content types, the basic structure pieces of any content strategy. IMDB’s content types include general information like Movie, Studio, Actor, and Director. Each content type, even though they’re separate, can still have relationships with each other. For example, a Movie is produced by a Studio, is directed by a Director, and stars several Actors, who have appeared in many other Movies and worked with several Directors. But how do you know if you have a good content model? A high-quality content model adopts applicable best practices and should:
When do I need to create a content model? • Create a content model when there’s a new experience design or CMS and content management requirements haven’t been documented. • Don’t create a content model when an existing CMS is in use, no updates will be made, or content will migrate 1-to-1.
• Enable reuse by separating content from presentation. This ensures that content is operationally efficient and saves resources and money. • Provide and enforce governance by marrying user and content manager needs. Working within specified requirements helps streamline decision-making and ensure optimal content performance. • Enable the active management of content. You should know how long content has been there, if it’s viable, and who wrote it (among other performance measurements) in order to track and adjust it. • Document business rules that power content delivery and retrieval. Relationships and categories can classify content so users can find and access content, and to ensure that content lives in the appropriate channels. • Be built with a widely accepted standard, such as Dublin Core. Using a standard framework helps speed up development concerns and offers a great library of attributes and classification types.
Now let’s look at the content model from a more in-depth, tactical level. Starting with content types, a content model must also describe the attributes of those content types, which can also require controlled vocabularies and taxonomies. If we continue with the IMDB.com example, attributes of the Movie content type include the Title and Genre. The content model also includes the system rules behind content types and their attributes. These rules include any piece of logic that is needed for a particular attribute. Examples of rules include: • Default values. These are the values that are prepopulated for content authors. For example, the content type Press Release might have the attribute Distribution to indicate what audiences can view the release. The default value of Distribution might be set to Public (rather than Internal Only) because content managers expect that most press releases will be made publicly available.
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view • Required fields. These are the attributes that must be completed for a content type to be saved or published. For example, a content type might include both the Title and Subtitle attributes, but only the Title must be authored. • Format or validation. These rules provide authors guidance about the form the content or metadata must take for a particular attribute. Common validation rules include character counts, acceptable characters, and date format. For example, if a content type includes an Expiration Date attribute, the rule for the attribute would specify the required format, such as mm/dd/yyyy or DD Month YYYY. • Repeatable content. The content model should specify which attributes can have multiple values. This is especially common when an attribute uses a taxonomy of values so that content managers can assign multiple categories a particular content type. • User editable. This rule specifies that an author, rather than the CMS, can edit the value of an attribute. For instance, the date of when content is created may be system generated, but the title of that content would be user editable.
An optional piece of the content model is a content entry template. These templates help technology developers create the CMS forms that authors will use to create content within the CMS. They show the preferred layout of attributes per content type. So with all this information and all this complexity around creating a content model, where do you start? Well, you start by asking a lot of questions to fully develop a working, scalable, sustainable content model. Questions like: • What content and metadata already exists? • What standards are in use? • How will users find, access, and share this content? • Will translation and localization be required? • How is content acquired, created, approved, updated, and retired? • How will content performance be measured? • How will the content be stored? • How does the CMS assemble and manage content? • How will content be retrieved?
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view Once you have a good understanding of the pieces within a content ecosystem, you will want to distill that input into core content types. To do that, look at the content and information you’ve assembled through four basic lenses: 1. Uniqueness Uniqueness is judged by the type of information the content contains, where it appears, and its functional requirements; for instance, an article contains a particular type of information versus an image gallery. These two totally unique types of information give them different functional requirements and, therefore, comprise two different content types. 2. Reuse Reuse also helps inform what becomes a content type. A good example is a press release; you may want to reuse pieces of one to feed other portions of the site. And while you might store all press releases in the news section, you might want to be able to consume the title in other areas of the site, such as a widget on a homepage. 3. Presentation While a good content model should usually separate content and presentation, sometimes we need to have very fine control over the treatment of text or images. So when we have very specific delivery requirements, looking at the layout and the format is important. 4. Source Decide where the information is coming from. Is a particular business unit producing it? Does a third party provide the content? Because the content source can drive requirements around how the content is managed, it can be easier to segment this content by type. Once you’ve defined the content types, you can then break down the attributes. Take it step-by-step: 1. Itemize distinct pieces of content (such as a title, body, and image or some combination thereof). 2. Describe what the content is (the content’s aboutness, in terms of subject, audience, or purpose). 3. Define the relationships (how this content type will interact with other content types and user experiences). 4. Track its lifecycle (what you need to know about how it was created, changed, and archived). 5. Understand ownership (who created, updated, and approved it). 6. Specify where content goes and how to find it. Typical attribute examples include the title, description, and keywords (all important from an SEO perspective), as well as subject/category, author, date published, language, and unique ID. These attributes are crucial to clearly identify content by the system that’s managing and delivering it. TAXONOMY So what about taxonomy? At its highest level, taxonomy is a hierarchical list of controlled terms (that the business has defined and approved) that supports the classification, management, delivery, and retrieval of content. The content model specifies when a new or refined taxonomy is required. That means that taxonomies can’t replace a content model, and you should never build a content model around taxonomy.
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view Sometimes, due to its hierarchical nature, taxonomy is confused with a site map. A site map is how a user experiences a digital presence, often including user-facing terms, short cuts, and redundancies that we wouldn’t include from a content management, or taxonomy, perspective. It’s important to note that developing taxonomies is a whole process in itself, a process of art and science that’s out of this paper’s scope. Given all that, what’s the point of taxonomies anyway? If they’re not site maps, why do we need them? Take a look at these three examples that illustrate good use.
1. A parametric search. This is essentially powering a search that enables a user to filter and manipulate results using multiple parameters. If you look at Gettyimages.com, for instance, under the music tab, you’ll see that files are classified multiple ways: by genre, themes, lyrics, and so on. Those are all controlled groups of metadata terms — taxonomies — assigned to music files behind the scenes to enable users to search and refine through those parameters. 2. A recommendation engine. At TED.com, suggestions for content that might also be of interest are offered, based on talks that have been previously viewed by the user. For instance, if you watch a TED Talk about archaeology in space, the metadata assigned to this particular talk — an Archaeology tag from a subject taxonomy — indicates the user’s interest in a particular subject to recommend other content that might be of interest as well. 3. Related content. Related content is another way taxonomies are used. At IMDB.com, viewing a particular movie’s page can also provide the user with links of photos and videos tagged with that movie title, creating a relationship from one content type to others. There’s a lot more taxonomies can do, especially on the back end, and we’ve only touched the surface by looking at a few examples of how they work for users on the front end. But for now, let’s switch gears and look at taxonomy before it’s implemented in a content management system. Sometimes, it’s not so “behind the scenes” and we do see things that end up in the navigation.
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view
For instance, at Dodge.com, there’s a vehicle taxonomy that goes through the models and breaks them down into trim levels, which shows a good example of taxonomy being represented in navigation. So there’s a hierarchy from the make to the model to the more specific styling categories.
Taxonomies are a big deal. From a North American perspective, we’ve got the American National Standards Institute (ANSI), the National Information Standards Organization (NISO), and the ISO who document guidelines for controlled vocabularies and taxonomies, such as ANSI/NISO Z39.19. Many organizations adhere to the recommendations and standards they provide.
BRINGING IT ALL TOGETHER Now that you understand these three pieces of the puzzle, all that’s left is to bring it together. NPR is a great example of a solid, multichannel content model in action. From a Google search, notice how NPR propagates one item of content across multiple channels. So the metadata behind this powers a website, Twitter feed, tablet, and smartphone presence; this content is reused across all these channels through a presentation layer and scaling. NPR also uses this model to relate all its types of content, including audio files from its radio programming. In this way, NPR is managing this particular story everywhere it needs it to go. And since NPR uses a highly structured model with robust metadata behind it, NPR’s content can even be found on channels that NPR doesn’t control.
IDEA ENGINEERS
© Sapient Corporation, 2012
POINT OF view IN SUMMARY Metadata gets content from point A to points B – Z. It is a super structure that allows us to get visual experiences to a user and gets the business results we need. The content model defines required metadata by marrying the needs of content consumers and managers. And the attributes (including taxonomies) provide the guardrails for valuable, well-maintained experiences. While metadata, content models, and taxonomies can sound like a bunch of technological gobbledygook, they’re really the magic behind all the experiences we deliver — so it’s time to put them in the limelight where they belong.
About the Author A Senior Manager of Content Strategy, Laura Lerner is the Midwest regional lead for Content Strategy based in our Chicago, IL office and lead moderator of our global content strategy community. Her nearly 15 years of experience in enterprise content management, editorial strategy, and business process management has resulted in her modeling more content, both personally and professionally, than she cares to remember.
Laura Lerner
IDEA ENGINEERS
© Sapient Corporation, 2012