Table of Contents Executive Summary......................................................................................................................... 4 Annual Report Recommendations.................................................................................................. 6 Meaningful Information is the New Currency of Business ........................................................................................7 Evidence-Based Decision Making ..............................................................................................................................9
Message from the Chief Business Architect ................................................................................. 12 Enterprise Business Information Management ............................................................................ 13 Data Governance ....................................................................................................................................................13 Background ........................................................................................................................................................13 Governance Framework.................................................................................................................................................. 14 EBIM Data Governance ................................................................................................................................................... 14 EBIM Data Stewards........................................................................................................................................................ 15
EBIM Data Governance Progress and Update....................................................................................................15 Data Standardization ..............................................................................................................................................19 Background ........................................................................................................................................................19 Metadata Management .................................................................................................................................................. 20 Data Registry ................................................................................................................................................................... 22
EBIM Data Standardization Progress and Update ..............................................................................................23 Information Repositories ........................................................................................................................................29 Background ........................................................................................................................................................30 Knowledge Hierarchy ...................................................................................................................................................... 30 Systems Thinking and Data Quality ................................................................................................................................. 33
EBIM Information Repositories Progress and Update .......................................................................................34
Annual Survey Results................................................................................................................... 37 Background .............................................................................................................................................................37 Survey Results .........................................................................................................................................................37 RACI Matrix .............................................................................................................................................................42
Acknowledgements....................................................................................................................... 45 Terms Glossary .............................................................................................................................. 46 Appendix ....................................................................................................................................... 48 Pre-EBIM Background .............................................................................................................................................49 Data Steward Process Model ..................................................................................................................................50 Data Steward Information Repository Dimensional Model Recommendation .......................................................51 Annual Survey Worksheet and RACI Matrix ............................................................................................................53 Data Standardization Artifacts ...............................................................................................................................54
Table of Figures Figure 1 Enterprise Business Information Management (EBIM) program ...................................................................13 Figure 2 Distribution of Responsible Business Lines ..................................................................................................... 16 Figure 3 Component Reusability Index ........................................................................................................................17 Figure 4 Data Steward Technical Subcommittees in 2012...........................................................................................18 Figure 5 Screen snapshot of Data Registry ..................................................................................................................23 Figure 6 Standardization Index ....................................................................................................................................25 Figure 7 Diagram of the Industry and Government Standards Organizations managed within the EBIM program ...27 Figure 8 Value of EBIM program .................................................................................................................................29 Figure 9 Knowledge Hierarchy .....................................................................................................................................31 Figure 10 Flow of Knowledge .......................................................................................................................................32 Figure 11 Information Repository and Data Standardization Gap ..............................................................................34 Figure 12 Information Repository Areas ......................................................................................................................35 Figure 13 Data Topic Categories and Sub-Categories..................................................................................................38 Figure 14 Data Categories in Priority Groups ..............................................................................................................40 Figure 15 Data Categories and Sub-Categories in Priority Groups ..............................................................................41 Figure 16 RACI Matrix Diagram ...................................................................................................................................43 Figure 17 Data Standardization Initial Challenges ...................................................................................................... 49 Figure 18 Data Steward Process Model .......................................................................................................................50 Figure 19 Annual Survey RACI Matrix ..........................................................................................................................53 Figure 20 Data Registry Index Page .............................................................................................................................54 Figure 21 Data Registry Profile Page ...........................................................................................................................55 Figure 22 Data Exchange Factory Page .......................................................................................................................56 Figure 23 Data Exchange Factory Component Browser ..............................................................................................56 Figure 24 Data Exchange Factory Change Log ............................................................................................................57 Figure 25 Data Exchange Factory Mind Map ..............................................................................................................57
Executive Summary As global research clearly states, there is a strong correlation between the use of international industry standards 1 and economically strong industries and countries . Also, as global research clearly states, there is a strong value2 added benefit for relying on international industry data standards in public sector financial management . As countries continue to respond to the financial crisis, they continue to embrace transparency and open government principles – in fact, by end of 2012, 28% of all countries worldwide have endorsed open government principles and 3 the use of international industry data standards . Because the private sector has witnessed enormous economic gains in both using industry standards and delivering digital services securely and efficiently, the US Federal Government has shifted towards becoming an information-centric organization, embracing the strategic and tactical value of the primacy of data, and emphasizing the importance of using international industry standards 4 where appropriate . The global demand for more data at information-centric organizations has its root cause in several areas: improving decision-making and gaining insight into performance optimization by breaking down siloed pockets of information; untangling the complexity of disparate operational data realities that are impacted by external factors like regulations or global drivers; developing a strategic and operational focus to manage the intricacies of data as it transforms into business-relevant facts to then be analyzed and synthesized for business-focused meaningful information; rationalizing an “information cacophony” of internal and external data-owners into a masterpiece symphony that provides a path for organizational longevity and continued business relevance; and understanding 5 and exploiting the intricate web of interdependencies of trusted, meaningful information . The value of an information-centric organization is witnessed in the Fortune 500 Top 20 – a ranking of the best corporations. Each of the Top 20 has revealed a, “quicksilver agility, rapidly shifting their product mix and producing more goods at 6 little new cost” . While “first-to-market” was the mantra of the last century, this current century, as the Fortune 500 Top 20 reveal, is about “smart-to-market” and relying on the primacy of data - that the data be trusted, in context, and fit-for-use for business-relevant analysis and for the synthesis of meaningful information. Even though the Fortune 500 Top 20 organizations unmistakably achieved their success via several factors, one of the factors most highly cited was their ability to capitalize on the primacy of data to be agile, to be innovative, to create new markets, and to thrive. As most other forward-thinking US Federal Government agencies have done, this Bureau has embraced the importance of the primacy of data by investing in several initiatives: establishing the position of Chief Business
1
“Today’s State of the Art Global Solutions for CEO’s”, ISO International Standards Report, http://www.iso.org/iso/ceo_brochure.pdf 2 “Public Financial Management Responses to an Economically Challenging World”, International Consortium on Governmental Financial Management and Grant Thornton, 2011, http://www.gt.com/staticfiles/GTCom/Public%20sector/ICGFM/2011%20ICGFM%20Global%20Financial%20Managers%20Su rvey%20Report.pdf 3 “Open Government Declaration”, Open Government Initiative, http://www.opengovpartnership.org/open-governmentdeclaration 4 st “Digital Government: Building a 21 Century Platform to Better Serve the American People”, OMB, 23 May 2012, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government-strategy.pdf 5 “The Role of the Chief Data Officer in Financial Services”, Cap Gemini Insights and Resources June 2012, http://www.capgemini.com/insights-and-resources/by-publication/the-role-of-the-chief-data-officer-in-financial-services/ 6 “Fortune 500 Top 20”, 2012, http://finance.yahoo.com/news/fortune-500-top-20.html
4
BFS Data Stewards | 2012 Annual Report
Architect; approving the Enterprise Business Information Management (EBIM) program, including the Data Stewards program, Data Standardization (enterprise metadata management) and Information Repositories; establishing the Business Intelligence and Analytics Division and the centralization of an enterprise approach for Business Intelligence; establishing the Business Integration Division to unlock and exploit the importance of application development and standards implementation for the enterprise; supporting deepening the integration of security policies amidst a call to publicly release data; pursuing new technologies like cloud and mobile computing in lines of business; and, steering lines of business towards building Information Repositories to make data available for stakeholders (publicly-accessible and privately-accessible). While this Annual Report reflects only one part of the intricate tapestry of an information-centric organization – the EBIM program – it also includes over the past year a cavalcade of successful partnerships with the Business Intelligence Division, the Business Integration Division, with US Federal Governmental outreach, with the Federal Reserve Banks, and with industry experts. Even though there have been several areas of cooperation and more importantly information-sharing between these partnerships, there is an opportunity, as the Data Steward Annual Report Recommendations make clear, to orchestrate these initiatives at a strategic executive level. The purpose of this Annual Report is to provide meaningful metrics and narrative of the progress of the EBIM program and the successful partnerships formed, to report on the data/information management annual survey, and to make recommendations to EBIM stakeholders for the future of the program. Here are some metrics from this past year’s activity: • 699 unique data elements were approved by the Data Stewards (representing a total of approximately 40,000 fields of metadata) • 25% of all data elements approved this past year originated from an industry standard • Just under 10% of all data elements originated from an international industry standard • 46% of all approved data element content was either rated Highly Reusable or Medium Reusable, reflecting good direction towards enterprise interoperability and federal government interoperability • Revenue Collection Management (RCM) line of business implements the most industry standard data elements • The Data Stewards Subcommittees formed successful partnerships with the Business Intelligence and Analytics Division, the Business Integration Division, the Office of Financial Innovation and Transformation, and Federal Reserve Bank subject matter experts • All Information Repositories have implemented or will implement the Data Stewards-approved information-sharing platform • None of the Governmentwide Accounting (GWA) stewarded content and Debt Management Services (DMS) stewarded content originated from an industry standard The Data Stewards Annual Report uses benchmarks from a global community of highly successful organizations with a proven ability to transform data into actionable results. The core of the EBIM program was founded on a world-class metadata management industry specification which is continually improved upon to integrate both the rigor of global industry best practices and the agility of emerging, innovative ideas and processes. Additionally, extracting value from data isn’t simply a private-sector benefit or a benefit that is specific to an industry, extracting value from data is an organizational benefit regardless of culture, language, or economy. In this light, comparing our management of data to highly successful global organizations reflects the essence and core of an informationcentric organization. For these reasons, the Data Stewards Annual Report Recommendations comprise the best possible balance of a constellation of factors that include internal, Deparmental, US Federal Governmental, financial services sector, and global industry aspects.
BFS Data Stewards | 2012 Annual Report
5
Annual Report Recommendations “If we could first know where we are and whither we are tending, we could better judge what to do and how to do it.” — Abraham Lincoln, June 16, 1858 The summer of 1858 was pivotal for Abraham Lincoln because not only did he win the Republican nomination for the Senate but he also forged a very public path that would culminate within three years in the Civil War. In essence, Lincoln, as history would record, knew exactly where he was “tending”. What he lacked then as expressed in the quote, was an ability to know how the country would respond. As Lincoln knew where he was “tending”, so has the leadership of this Bureau; because it has repeatedly made investments in the areas of Business Intelligence, Business Analytics, Data Security, Business Process, and Data Standardization – ingredients towards becoming an information-centric organization. These ingredients are essential because private sector organizations as well as public sector organizations are entering a new world where meaningful information is the 7 currency of the global enterprise . Taking action towards meaningful information as the currency of the global enterprise is critical to this Bureau because of the following: • • • •
Alignment with global public sector initiatives towards the emphasis on analysis (“in the public sector, 8 analytics must become a core management competency” ) 9 As a defense against the kind of fiscal collapse that sparked a global recession 10 Alignment with US Federal Government guidance (“Information as a national asset” ) Alignment with leading research analysts and industry associations.
There is a global focus on the primacy of data. In the insurance industry, a global survey of industry leaders reflected that 49% anticipated new sources and techniques in the use of data analytics to be a key competitive differentiator and that 14% indicated their emphasis on analytics to mature to a point where a majority of 11 decisions are automated . Information-centric organizations in the global insurance space can only proceed to these commanding heights by entrusting that their data governance and metadata management disciplines provide the means necessary to build an analytics framework. Ultimately, any analytics framework is based on trust. Business leaders depend on Business Intelligence teams to extract meaningful information from their data. Business Intelligence teams depend on business leaders to supply key performance indicators for them to track. Those key performance indicators can only be accurate if the data that went into the reporting of these metrics is accurately and precisely defined. The data can’t be accurately and precisely defined unless there has been a systematic process for managing the metadata of the data (content semantics) and the metadata of the high level business processes (context semantics) through governance and enterprise conformance. Simply put, if the metadata isn’t right, then the tower of trust crumbles.
7
“Information is the New Currency of Business”, Data Direct Network Analyst Reports 2011, http://www.ddn.com/en/analystreports/information-new-currency-business-says-jean-luc-chatelain 8 “The Power of Analytics for Public Sector”, IBM, http://www.ibm.com/smarterplanet/global/files/es__es_es__cities__the_power_of_analytics_for_public_sector.pdf 9 “Financial Stability - New Council and Research Office should strengthen the Accountability and Transparency of their Decisions”, GAO – September 2012, http://www.gao.gov/assets/650/648064.pdf 10 “National Strategy for Information Sharing and Safeguarding”, Information Sharing Environment, http://www.whitehouse.gov/sites/default/files/docs/2012sharingstrategy_1.pdf 11 “Top Insurance Industry Issues 2012”, PwC, http://www.pwc.com/en_US/us/insurance/publications/assets/pwc-topinsurance-industry-issues-2012.pdf
6
BFS Data Stewards | 2012 Annual Report
As this Annual Report demonstrates, becoming an information-centric organization has involved scaling unprecedented terrain and has involved navigating unchartered waters. The Enterprise Business Information Management (EBIM) program has returned significant value to the lines of business who have made commitments to joining the Data Stewards – in fact, Payment Management (PM) has entrusted their Payment Information Repository (PIR) data model upon the standardization work of the Data Stewards; Revenue Collection Management (RCM) has made a wholesale investment in the standardization of their data upon the standardization work of the Data Stewards; and Debt Management Service (DMS) has even made a commitment to enforce Data Stewards approved standards upon their transactional systems. These accomplishments would only be possible if the value of data standardization within this Bureau resonated not just with technologists but also with business leaders. As these business decisions indicate, these accomplishments towards standardization are a positive benefit for this Bureau and for its customers and these same kinds of accomplishments are witnessed by global organizations – both private and public. As the insurance industry reveals, there is a growing trend for the global reliance on the primacy of data for useful decision-making. The ultimate goal of the EBIM program isn’t to standardize data for the sake of data but to remove obstacles towards the synthesis of meaningful information, to empower the analysis of trusted data, and to assist decision-makers in arriving at decisions using a combination of evidence-based facts in context with their business processes and business drivers. These goals reflect not only the goals of this program but also reflect the transformative power of data to the US Federal Government, as the Under Secretary of Commerce for Standards, Patrick Gallagher, stated at a symposium for public and private sector business and technology experts: “[we] are really looking at a new paradigm, a place of data primacy, where everything starts with the consideration of the data rather than consideration of the technology. This is a real shift from the way [the business of government has] historically thought about this… we've moved to designing these technologies around the data and in some cases defining the problems around the data”. The Data Steward Annual Report Recommendations that follow are based on the research conducted after a year of activity within the EBIM program and are compiled into two tightly interconnected sections reflecting the difference in breadth and depth of the work needed to resolve them. These Recommendations reflect a balance of this Bureau’s business reality and a continuous investigation of global data benchmarks not solely as a platform for meaningful information but to better understand the tidal shift from information scarcity to information excess and to view data in this way: “[d]ata are becoming the new raw material of business: an economic input almost on 12 a par with capital and labor”.
Meaningful Information is the New Currency of Business In the book Metaphysics, Aristotle concluded that synergies existed where the “totality [was] not, as it were, a 13 mere heap” but where the whole is greater than the sum of the parts. This was a pivotal illumination as this provided a greater sense of truth because for him it identified the proper granularity of decomposition. At a smaller scale but in much the same way, after an inaugural year of implementing the three components of the EBIM program, these represent not just a decomposition of the attainment of meaningful information but truly represent a synergistic effect.
12 13
“Data, data, everywhere”, The Economist, http://www.economist.com/node/15557443 Metaphysics Book VIII, Aristotle, http://classics.mit.edu/Aristotle/metaphysics.8.viii.html
BFS Data Stewards | 2012 Annual Report
7
The three components of the EBIM program – Data Governance, Data Standardization, and Information Repositories – align with the concepts of knowledge management – people (governance), process (standardization), and technology (information repositories). This was a purposeful alignment – data isn’t solely a data management problem but also a knowledge management opportunity. The structure of the EBIM program and the methods incorporated to implement the program have revolved since its inception upon the management of knowledge, systems thinking, quality management, and root cause analysis – disciplines necessary to understand context, understanding, meaning, and purpose. As the EBIM Information Repositories section clearly states, the Bureau’s metadata must be managed with precision and in a world-class metadata management framework – for this reason, the EBIM program has based its metadata management framework on the timeproven and globally implemented ISO 11179 specification. But even so, the EBIM program doesn’t end with metadata or data, it ends with knowledge – and knowledge for this Bureau is materialized through a Business Intelligence and Analytics program that has made great strides in demystifying and articulating Business Intelligence Principles and goals. As a future information-centric organization, this Bureau is investing in the tools, talent, and techniques that bridge where we are today with where we want to go in the future. This takes leadership to steer executive discussions towards becoming an information-centric organization; to corral Department-level issues relevant to the primacy of data and to the synthesis of meaningful information for not just this Bureau but with an eye towards the US Federal Government; to be properly funded with an appropriate business support team to interact seamlessly with users across all lines of business; to be accountable for the direction of standardization, transparency, and open government data; and to align business and IT initiatives around information as a strategic asset. For these reasons, the Data Stewards make these recommendations: Recommendation #1: Identify an Executive Champion for Strategic Information Stewardship. Global private-sector information-centric organizations like the Fortune 500 Top 20 and a few US Federal 14 Government agencies continue to invest in the executive establishment of a Chief Data Officer (CDO) – a single seat of authority which is staffed and budgeted to be accountable and responsible for the strategic direction of data, analytics, and meaningful information. In a similar way, the benefits of an Executive Champion resemble the proven, consistent successes of a CDO in private-sector organizations. The benefits of this recommendation include an end-to-end strategically-focused data and information management program, including best practices, process controls, resource allocations and priorities; business and IT alignment around information as a strategic asset; focus on data quality, reduce operational costs, enhance resource utilization, achieve required lineage and traceability, and drive overall greater efficiency in the management and consumption of information assets across this Bureau; provide accurate and timely decision-making which would allow this Bureau to reduce risk, lower regulatory capital requirements, be lean, and capitalize on new business opportunities; and finally, because the Executive Champion would lead the strategic information management program to examine and understand the business from multiple perspectives, this champion could transform cost centers into shared service centers by relying on meaningful information to proactively address emerging opportunities and improve customer 15 experience to increase loyalty, create and deliver new services across channels . This recommendation is intended to establish an Executive Champion, who is accountable for the above and for compelling standardization for this Bureau, whether through an existing Executive forum or creating a new one, for not only the
14
“Making the Case for a Chief Data Officer”, Michael Vizard, http://www.itbusinessedge.com/cm/blogs/vizard/making-thecase-for-a-chief-data-officer/?cs=46314 15 “The Role of the Chief Data Officer in Financial Services”, Cap Gemini Insights and Resources June 2012, http://www.capgemini.com/insights-and-resources/by-publication/the-role-of-the-chief-data-officer-in-financial-services/
8
BFS Data Stewards | 2012 Annual Report
data/information architecture but also for other closely linked architectures, including infrastructure architecture and technical architecture. Recommendation #2: Establish a Cross-Government Data Standardization Forum During the last decade there has been a growing demand from the public to understand federal financial data and how and where tax dollars were spent — providing transparency into federal finances has become one of the greatest challenges facing government managers today. However, the lack of consistent data structures across agency-managed financial systems contributes to an inability to provide government-wide transparency. In addition, the significant variation in data structures inhibit the government’s ability to achieve economies of scale through the cost-effective use of shared service solutions and the consolidation of financial management activities at a department or government-wide level. Jointly with the Data Stewards program, the Treasury Office of Financial Innovation and Transformation (OFIT) is engaged in a data normalization initiative to transform data from disparate agency systems to existing standardized data formats, contents, and definitions. The recommendation is that the Executive Champion sponsor this work and collectively with the Data Stewards and with OFIT spearhead an appropriate governance forum to deliberate, discuss, and decide transparency and financial management issues across the US Federal Government, with other international public sector partners, and with industry experts.
Evidence-Based Decision Making When questioned about his knowledge of certain facts related to Iraq, former US Secretary of Defense, Donald Rumsfeld responded in this way, "there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also 16 unknown unknowns – the ones we don't know we don't know." Mr. Rumsfeld’s statement provides a certaintyto-uncertainty gap: that in fact, what “we know” is microscopic when compared to what “we don’t know”. His statement is equally applied to private sector as it is to public sector. Great organizations aren’t solely rewarded on making decisions based on “what they know”; they are rewarded by making decisions in spite of it, by a relentless pursuit to understanding the ecosystem in which they function. In the book “How the Mighty Fall”, the author writes, “[e]very institution is vulnerable, no matter how great. No matter how much you have achieved; no matter how far you have gone; no matter how much power you have garnered, you are vulnerable to decline. There is no law of nature that the most powerful will inevitably remain at the top; anyone can fall and most 17 eventually do.” The bridge between what “we know” and what “we don’t know” isn’t simply one to cross; it is one to build carefully and systematically with precision and accuracy. That bridge needs to withstand the certainties of the environment - the strategic vision and goals and transparency in an austere budget climate - with the uncertainties of a larger ecosystem - another global catastrophic financial failure. That bridge needs to withstand the ordinary and the extraordinary and span the tried with the untried. For this Bureau, when it comes to building an information-centric focused bridge, that bridge is based on the first two recommendations above but also reinforced by the following recommendations. Recommendation #3: Collaborate with the Chief Security Officer (CSO) and Chief Information Security Officer (CISO) to Develop Meaningful Data Security Policies
16
US Department of Defense, Former US Secretary Donald Rumsfeld Remarks at a DoD News Briefing, http://www.defense.gov/transcripts/transcript.aspx?transcriptid=2636 17 “How the Mighty Fall”, Collins, Jim, http://www.amazon.com/How-The-Mighty-Fall-Companies/dp/0977326411
BFS Data Stewards | 2012 Annual Report
9
Establishing meaningful data security policies for an information-centric organization is not only a requirement to protect data but also a mission-related strategy of this Bureau and for the US Federal Government. As he clearly 18 articulated in his “National Strategy for Information Sharing and Safeguarding ” document, President Obama stated, “This [document] aims to strike the proper balance between sharing information with those who need it to keep our country safe and safeguarding it from those who would do us harm. While these two priorities—sharing and safeguarding—are often seen as mutually exclusive, in reality they are mutually reinforcing. This Strategy, therefore, emphasizes how strengthening the protection of classified and sensitive information can help to build confidence and trust so that such information can be shared with authorized users.” This recommendation is meant for stakeholders to approve the expansion of the EBIM program so that two enterprise functions work jointly and directly with the Executive Champion to govern data security policies: 1) security policy management and enforcement, from the Chief Security Officer and Chief Information Security Officer; and 2) the work of the Data Stewards, the safe keepers of enterprise data standardization and metadata management. Recommendation #4: Conduct an Information Repository Maturity Assessment (Joint Partnership by Data Stewards and Business Intelligence and Analytics Division) Since the Data Stewards have embraced the business direction to share data both among the Information Repositories and the FIR by establishing an ongoing Technical Subcommittee to continue the build out of the information-sharing platform, it has become evident that the enterprise Information Repositories are at various stages of maturity. Earlier this past year, the Business Intelligence and Analytics Division surveyed senior management and discovered that the lines of business were investing in talent and tools to extract meaningful information from their repositories. While this survey shed light on the data access area of Information Repositories, this recommendation seeks to shed light on the data staging and presentation areas of Information Repositories. Meaningful information cannot be trusted nor synthesized if the underlying metadata or processes and platforms have not been tuned to optimize both enterprise data quality and query performance management. This recommendation is that the Data Stewards should, following the lead of the Executive Champion, conduct an Information Repository Maturity Assessment as soon as possible jointly with the Business Intelligence and Analytics Division. The Data Stewards will report on the assessment to the Executive Champion and periodically reassess the maturity progress of the Information Repositories. Recommendation #5: Establish an Ongoing Information Repository Forum to Share Information-Centric Concepts like Architecture, Analytics, and Techniques (Joint Partnership by Data Stewards, Business Intelligence and Analytics Division, and Business Integration Division) Historically, the successes of this Bureau have largely been operations based, meaning operational activities like processing receivables or issuing payments or producing statutory documents. To be an information-centric organization, the future successes of this Bureau will be based on an information-savvy strategy that exploits undiscovered analytical opportunities, enhances current services and operating models, and establishes new offerings and services. These strategic initiatives should be driven by an Executive Champion but implemented by the appropriate forum that is uniquely qualified to: interpret those initiatives for the Information Repositories, measure the success or failure of those initiatives and report back to the Executive Champion, and continuously be a part of the dialogue towards those information-centric strategic decisions. This recommendation is meant for stakeholders to approve this work so that the Data Stewards, who would be working in conjunction-with and under the direction-of the Executive Champion, are accountable for implementing the information-centric concepts like architecture, analytics, and techniques to complete this work and partner with the Business 18
“National Strategy for Information Sharing and Safeguarding”, http://www.whitehouse.gov/sites/default/files/docs/2012sharingstrategy_1.pdf
10
BFS Data Stewards | 2012 Annual Report
Intelligence and Analytics Division, Business Integration Division, and subject matter experts at the Federal Reserve Banks, financial agents, and industry experts.
BFS Data Stewards | 2012 Annual Report
11
Message from the Chief Business Architect Data. It is everywhere. It is inherently in every system that operates within our Bureau and inherently in every document that supports those systems. It is in email, it is in social media, it is at work, it is at home. It is ubiquitous. Data even spawns more data. It is like a virus - it can spread anywhere at anytime in any place under any condition. It is resilient. And as unmanageable as it might seem, our responsibility is to manage data. In fact, our responsibility is to capitalize upon it. In truth, data did not magically appear as a critical enterprise asset. In a sense the importance of data has been apparent since the dawn of corporate business. By the early twentieth century, Frederick Taylor, the father of scientific management, was so enamored with data that it was the lure of it that led him towards industrial efficiency and towards world renown. Even though, data (as a science) has been a part of corporate business for over a century, it wasn’t until the last sixty years that the true value of data was within reach. By then, technology matured enough to take advantage of volumes of data, thus heralding the age of information. Taylor wasn’t amassing data for data’s sake, no; he was amassing data to synthesize meaningful information for business use. There is nothing new about transforming vast amounts of data into meaningful information. The value of this kind of transformation, as Taylor demonstrated, has been a corporate benefit for at least a century. The difference for any modern corporate business isn’t improving 12
BFS Data Stewards | 2012 Annual Report
upon the data-to-information transformation; the difference then is the speed and scale in which that transformation occurs. And optimizing upon that transformation is the difference between organizations that succeed and ones that fail, between organizations that will last and ones that will crumble. To be successful at optimizing the transformation from data to meaningful information, organizations must embrace three simple rules: data must be governed; data must be managed; and data must retain context. These three rules are simple; yet contain a lot of complexity hidden underneath that veil of simplicity. This is why the work of the Enterprise Business Information Management (EBIM) program is critically important to the future of this Bureau. It is not by chance that the EBIM program has these three same simple rules at the epicenter of the program: data is governed via the Data Stewards; data is managed via industry-standard metadata management framework; and data context is preserved via the Data Registry. I believe that this Bureau is in a much better strategic position to uncover meaningful information because of the success of the EBIM program. There is still much to do – we’re at the beginning of this transformative journey. There are still important matters for our leadership to decide about the implementation of the EBIM program. I remain as optimistic as ever that as an organization we have opportunities to be innovative, to find efficiencies, and to be resourceful with the kind of meaningful information that can be synthesized from our data.
Michael J. Murray Chief Business Architect Bureau of Fiscal Service
Enterprise Business Information Management In 2010, the legacy Financial Management Service (FMS) Executive Board (EB) approved the Enterprise Business Information Management (EBIM) program as submitted and presented by the Enterprise Architecture Division (EAD) within the then-known Business Architecture Office (BAO) Assistant Commissioner (AC) line of business. Today, the current AC accountable and responsible for the EBIM program is the Chief Information Officer (CIO) within the Information and Security Services (ISS) line of business, specifically the Business Information 19 Architecture Branch (BIAB) within the EAD under the guidance of the Chief Business Architect . Figure 1 Enterprise Business Information Management (EBIM) program
As the above diagram illustrates, the EBIM program consists fundamentally of three components: Data Governance, Data Standardization, and the Information Repositories. At the time of the EB approval only the Financial Information Repository (FIR) was cited (this was expanded to include all Information Repositories). In tandem, these three components reflected the best thinking at that time to better understand how data was to be managed as an enterprise asset, what data was to be managed, and why the future of FMS as an analytical service provider was inextricably linked to the management of data.
Data Governance Background Of the three components of the EBIM program, the Data Governance component required a commitment by each of the legacy FMS lines of business to submit a representative (or “Data Steward”) who would represent their line of business on equal footing with representatives from other lines of business. First and foremost, the EBIM 20 Business Data Governance component was to be fulfilled by business analysts as opposed to data architects . This decision propelled this component to be focused on the business nature of data rather than solely the technology-
19
For more information on Pre-EBIM Background, please refer to Appendix. Forrester, “Best Practices: Establish Your Metadata Plan”, Peyret et al, http://www.forrester.com/Best+Practices+Establish+Your+Metadata+Plan/fulltext/-/E-RES57055?docid=57055 20
BFS Data Stewards | 2012 Annual Report
13
nature of data. In effect, the EBIM Data Governance component would establish - and could only be successful as the bridge between business-specific and technology-specific perspectives. Governance Framework The governance framework within EBIM Data Governance has been defined unilaterally as: the process of decision-making (decision rights and accountability) and the process by which decisions are implemented (business 21 alignment, measured) . From this definition, the EBIM program identified three fundamental characteristics that are still employed today: (1) to resolve conflicts and make decisions - authority; (2) to discuss and describe the proper functioning and acceptance of those decisions - legitimacy; and (3) to invoke the efficacy-of and 22 achievement-of consensus - participation . These three characteristics – authority, legitimacy, and participation – have remained critical to the success of the governance program not only because of the broad appeal of these to any governance program but because these characteristics represent the true value of the governance component to the EBIM program. Authority was granted via the legacy-FMS EB approval of the EBIM program; Legitimacy was earned via the discussions that surrounded the standardization of content; and Participation was implemented throughout the process to ensure that all decisions were reached through consensus. EBIM Data Governance The working definition of EBIM Data Governance is seen through the lens of these three characteristics - authority, legitimacy, and participation - but in much finer detail. EBIM Data Governance is defined as a strategic initiative involving multiple business lines and is the operating discipline for managing data and information as a key enterprise asset. This operating discipline includes organization, processes, and tools for establishing and 23 exercising decision rights regarding valuation and management of data . The key principle within this definition is the emphasis on the management of enterprise operational data and information. At the time of the legacy-FMS EB approval to implement the EBIM program, the relevant mandate was to define the common data elements uniformly and to ensure the implementation of those common data elements. By having this specific focus, the EBIM Data Governance component would provide for the definitions of common data elements (including data exchanges, business rules, business processes, business glossary, and metadata) which would be arrived at via consensus. Because of this, the key metric of the success of the EBIM Data Governance program was and continues to be consensus-based interoperability – which would be wrought by both implementing existing industry standards and creating industry-strength data standards. The value of the focus on consensus-based interoperability is that it constrains the work, output, and documentation of the EBIM Data Governance program. Without this kind of constraint, the value of the program would be immeasurable, interminable, and clearly lack the focus and discipline needed to return any meaningful value to the EBIM stakeholders – of which included the Chief Business Architect (CBA) and each AC that nominated a Data Steward.
The key metric of the success of the EBIM Data Governance program was and continues to be consensus-based interoperability.
21
United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) and Gartner Defines ‘Governance’, 3 Sept 2012 (G00237914) 22 United Nations Development Programme – Latin America. Governance defined. 23 Based on the Data Governance definition from the National Association of Chief Information Officers (NASCIO). NASCIO Data Governance Series Part 1: http://www.nascio.org/publications/documents/NASCIO-DataGovernance-Part1.pdf
14
BFS Data Stewards | 2012 Annual Report
EBIM Data Stewards The representatives from the legacy FMS lines of business within the governance component of the EBIM program are called “Data Stewards”. The definition of a Data Steward is a person delegated the responsibility for managing 24 a specific set of data resources . While this definition is broad and has its roots in and serves the international data standardization community, members of the Data Steward program refined this for the Bureau of Fiscal Service (BFS). These refinements include a Data Steward is responsible to represent the interests of their line of business and to be business-focused and have business-related responsibilities. In addition to being businessfocused, a Data Steward must be multi-lingual: speak the language of business and know the language of technology. The main responsibility of a Data Steward is to represent the interests of their line of business. By doing so, not only could consensus-based interoperability be achieved within the governance component, but a steward could by process, collaboration, and a steadfast weekly-meeting schedule, participate within a forum that ensures the sharing and understanding of their business processes and business data needs with other Data Stewards. EBIM Data Governance Progress and Update After a year’s worth of Data Stewards work, the governance framework based on the three characteristics – authority, legitimacy, and participation – has remained relevant. It is anticipated that these characteristics will still influence the future of the Data Stewards activity albeit two primary events in 2012 related to the FMS and BPD consolidation have brought these into question: 1) the FMS and BPD consolidation has widened the potential reach of all three characteristics (this would impact the breadth of the Data Stewards into the BPD lines of business); and 2) the FMS and BPD consolidation has introduced an exploratory enterprise-wide task to review the the over-all organizational governance (this would impact the reporting of the Data Stewards activity). Consolidation aside, the work of the EBIM Data Governance component has been extremely successful. The work of voting and approving content began immediately on 9 January 2012 and continued unabated until 17 December 2012 exceeding the original goals of the program which were to approve two large data exchanges. While the “Anticipated Schedule of Activity” document for the Data Stewards this past year was modified throughout the year, it reflected an ongoing desire from the lines of business to standardize content and to be conformant to the standardized content. For this past year, the below table contains the lines of business that submitted data exchanges. Table 1 Distribution of Submitted Content
Line of Business GWA DMS PM RCM
Number of Data Exchanges Submitted 6* 3* 2 2
*includes content not yet approved by Data Stewards
The above table reflects a healthy desire from the lines of business to have their content reviewed by the Data Stewards for standardization. This past year, every data exchange that was submitted from a line of business for review and approval by the Data Stewards involved a change to the original submission – for some line of businesses those changes were more significant than for others. In other words, every submission had to modify
24
International Organization for Standardization (ISO) “Data Steward” definition.
BFS Data Stewards | 2012 Annual Report
15
some of its data elements in order to conform to the data standards as approved by the Data Stewards. This was a critical step towards the reusability of common components and ensured a path towards interoperability. To better understand the below diagram which illustrates the distribution of responsibility across the lines of business at a data element level, here is an example: within the Payment Management (PM) line of business, two data exchanges were submitted for review – the Treasury Disbursing Office (TDO) payment request detail record interface (payment request) and the Non-Treasury Disbursing Office (NTDO) payment detail record interface (paid payment). Each of these data exchanges contains several data elements with a lot of similarity – included within 2526 those data exchanges are the Treasury Account Symbol component data elements (which are stewarded by GWA who is responsible for these data elements). This is an example of a line of business (PM in this example) data exchange which contains data elements that are managed by another line of business (GWA in this example). Figure 2 Distribution of Responsible Business Lines
Payment Management 6% Government wide-Accounting 20%
Information and Security Services 32%
Distribution of Responsible Business Line
Revenue Collections Management 30%
Debt Management 12%
As is evident from this diagram, Information and Security Services (ISS) stewards (or is-responsible-for) the highest percentage of standardized data elements. This is due to two fundamental reasons: 1) ISS is responsible to ensure that highly reusable data elements like “Country Code” or “Currency Code” or “DUNS Number” are properly managed and in alignment with industry standards as managed by ISO, World Wide Web Consortium (W3), or Duns and Bradstreet (D&B); and 2) ISS is responsible when several lines of business use a common non-industrystandard data element like “Identifier” or “Bank Account Number” or “Error Code” in their data exchanges. This annual diagram will evolve more dramatically every year as more content becomes standardized. If more content from a specific line of business is submitted, the responsibility of that line of business to “steward” its data elements increases. Another conclusion to draw from this diagram is that it reflects the level of commitment by each line of business towards the standardization of content – the higher the percentage, the greater the dependency on standardization.
25
“Memorandum for all CFOs and Deputy CFOs”, 14 Jan 2011, http://www.fms.treas.gov/ccmm/final-CFO-letter-01-14-11.pdf The GWA Treasury Account Symbol components are itself an example of a “Class” datatype or an aggregate data element because it is composed of multiple underlying data elements. http://fms.treas.gov/registry/TreasuryFiscalService/2519/index.html 26
16
BFS Data Stewards | 2012 Annual Report
The key metric that directly reflects the work of the Data Stewards is the measurement and management of consensus-based interoperability. The below diagram reflects the level of interoperability by quantifying the distribution of reusable data elements (interoperability) within all of the approved data exchanges this past year. Figure 3 Component Reusability Index
Medium Reusable Content 26%
Component Reusability Index
Low Reusable Content 53%
Highly Reusable Content 20%
Prior to the work of the Data Stewards, the “Low Reusable Content” percentage was much higher, reflecting a much fractured, non-centralized, non-governed approach to the exchange of data between the systems from the different lines of business. Even though the past efforts of the Enterprise Data Architecture (EDA) team, which was principally responsible for the standardization of data elements via XML data exchanges, was successful in standardizing content, that content was never brought to a governance body such as the Data Stewards where all lines of business could assure the reusability and interoperability of it. In other words, prior to the Data Stewards meetings, there was not a governance forum to discuss the sharing of data elements. Perhaps more significant than identifying reusable content, the certainty of any kind of reusability among all the data exchanges analyzed this past year by the Data Stewards was, simply, unknown. The analysis by the Data Stewards to identify reusable content was only possible because each steward reviewed each and every data 27 element via a methodical process. Even more revealing, in spite of the methodical multi-step process to identify sameness and difference (semantic equivalence) between data elements, just under half of the 1600 data elements reviewed this past year were categorized as either “Highly Reusable” or “Medium Reusable”, ensuring the right trend toward achieving interoperability. The skew towards a high percentage for “Low Reusable Content” is attributed to two factors: 1) some content will not be reused (single-purpose data elements are common in industry as well as in government); and 2) some content remains to be reused (the evolution of standardization within a line of business will, as it matures, reuse single-purpose data elements). The deliberation of semantic equivalence of data elements was mostly negotiated within the pre-voting phase with the submitting content provider – typically, the voting and alternate Data Stewards submit content from one of their lines of business systems and their business processes or data subject matter experts – and the BIAB team. In
27
Explained further in the EBIM Data Standardization section
BFS Data Stewards | 2012 Annual Report
17
other words, by the time the Data Stewards were ready to vote on the data elements within a data exchange, the 28 data elements had been deeply scrutinized and broadly vetted for reusability . The Data Stewards meetings, first and foremost, have provided an exchange of ideas based on business processes and business data requirements towards the purpose of achieving consensus-based interoperability. Achieving interoperability isn’t solely a technical metadata mandate, but it is also, and more importantly, a business-driven mandate so that the outputs of the Data Stewards make clear and unambiguous the meaning of terms, definitions, rules, and relationships that are an integral aspect of standardization. As an example of the importance of interoperability to reach business-based consensus, the below diagram depicts the Subcommittees created this past year for the purpose of identifying and better understanding a particular issue. Figure 4 Data Steward Technical Subcommittees in 2012
The Subcommittees provided an appropriate forum to bring in subject matter experts from this Bureau (as is evident from the joint partnerships formed this past year), from Federal Reserve Bank experts, other Financial Institution experts, and even industry experts so that their recommendations would get pushed forward to the general Data Stewards for approval or rework. The two Technical Subcommittees (XML Namespaces and Conformed Dimensional Modeling) were able to reach unanimous consensus (with active participation by ISS, GWA, RCM, PM, and DMS) and propose recommendations that were subsequently approved unanimously by all voting Data Stewards. As implementers of the EBIM Data Governance component, Data Stewards and the subcommittees that have been spawned this past year have been and continue to be accountable and responsible for achieving consensus-based interoperability. As has been evident throughout the year, this has not been a simple matter and has involved several subject matter experts to inject insight towards the use-of and purpose-for business processes and
28
Please see Appendix for more information on the tasks and responsible parties involved in standardizing content.
18
BFS Data Stewards | 2012 Annual Report
business data. As a testament to the importance of consensus-based interoperability, 95% of all voting decisions by the Data Stewards were unanimous. The few decisions that weren’t unanimous were very technical in nature and required dedicated time to carve the seemingly amorphous issues into understandable process and data 29 segments .
As a testament to the importance of consensusbased interoperability, 95% of all voting decisions by the stewards were unanimous.
Lastly, the Data Stewards are at the center of the EBIM program. The Data Stewards are the caretakers of data standardization and thus rely on representing their business lines judiciously and willfully; are active participants towards approving content; and are responsible for ensuring the implementation of that content.
Data Standardization As EBIM Data Governance is best understood when viewed as an underlying platform for the exchange of ideas based on business processes and business data, EBIM Data Standardization is best understood when viewed as a tightly integrated information framework. In essence, a good, solid data standardization program becomes the 30 “Rosetta Stone” for understanding any organization, sector, or industry. Background In a speech last year, Secretary Panetta summarized what was commonly understood to his followers and to the media as his “pet-project” in this way: “We live in a global world...Languages are the key to understanding that world. If we are going to advance stability in some of the countries we are fighting in today, we have to be able to understand what motivates those countries, what motivates their people, and to understand their culture, beliefs, faiths, 31 ideologies, hatreds and loves.” Secretary Panetta concluded that language and cultural training are critical to the nation’s economic, diplomatic, and security interests. Even though he cited the importance of language as a strategic interest to the nation, Secretary Panetta wasn’t just referring to knowing the content of language; his speech was instead referring to the importance of knowing the context of language. In 32 fact, there is nothing that humans communicate that does not have context. Without the right context of language and culture, it would be impossible to understand the nuance of what another is trying to say or to write. Without the right context, we would be, quite literally, unable to communicate.
Without the right context, we would be, quite literally, unable to communicate.
Translating between languages is hard and never accurate and always fraught with unintended consequences. The reason is because good translation is more than just translating words, it has to include and account for the context of culture, the context of the individual speaking and the context of the individual listening, the context of
29
Data Steward Revenue Collections Management/TRS data standardization discussions concerning “Account Number” and “Account Classification” data elements 30 Federal Enterprise Architecture, “Data Reference Model”, http://www.whitehouse.gov/sites/default/files/omb/assets/egov_docs/DRM_2_0_Final.pdf 31 US Department of Defense, US Secretary Leon Panetta Remarks at the Defense Language Institute Foreign Language Center, http://www.defense.gov/news/newsarticle.aspx?id=65118 32 “Human Communication: Principles, Context, and Skills”, Book, et al, http://www.amazon.com/Human-CommunicationPrinciples-Context-Skills/dp/0312398492
BFS Data Stewards | 2012 Annual Report
19
the action and the interaction, the context of the environment, the context of the surroundings, and the context of the situation. The challenge of understanding any language isn’t solely in the translation of words; rather it’s in the maintaining of context. Secretary Panetta knows this and asserts that understanding language and culture is in our 33 national interest on par with military might, economic power, and political diplomacy. Context is to human communication what metadata is to digital communication. There is simply no substitute for context in successful human communication and there is no substitute for metadata in successful digital communication. For human communication or digital communication, decisions rely upon context, knowledge 34 needs context, information requires context, and facts are engulfed in context. Just like the critical importance of context in language, metadata carries the essence of digital communication by supplying the meaning of terms, the purpose of terms, the rules for terms, and the relationships between terms. In speaking and in writing, humans cannot communicate without knowing the right context. In producing (e.g. analytical reports) content or in consuming (e.g. data exchanges) content, application systems cannot communicate without knowing the right metadata. Metadata Management Because of the strategic value, broad applicability, and utility of metadata, different industries and global associations have defined metadata differently. For this reason, the EBIM program has based its definition of metadata on the American National Standards Institute (ANSI) National Information Standards Organization (NISO) – a cross-industry and crossgovernment consortium of organizations – definition which states that “metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information 35 resource.” This broad definition is valuable to both industry and government because it provides a roadmap to better understand information content that is deemed high-value and high-worth. Metadata isn’t just about having appropriate categories to better understand a data element; in fact, it is about dissecting and decomposing anything to better understand the parts of any whole, whether that “whole” is a data element, business process, IT service, government regulation, an artifact, or any concept. Without metadata, it would be impossible to unilaterally understand anything in the digital world. Humans need context to understand meaning and purpose; not unlike humans, machines need metadata to understand meaning and purpose.
There is simply no substitute for context in successful human communication and there is no substitute for metadata in successful digital communication.
The management of metadata has a rich history of implementations in industry and government world-wide. Recently, the thrust of the global importance of metadata management has been spurred by several factors, including (but not limited to): better understanding of the global economic recession and better insight into the potential
Without metadata, it would be impossible to unilaterally understand anything in the digital world.
33
US Department of Defense, US Secretary Leon Panetta Remarks at the Defense Language Institute Foreign Language Center, http://www.defense.gov/news/newsarticle.aspx?id=65118 34
Please see the
Information Repositories section for more information 35 ANSI NISO, “Understanding Metadata”, http://www.niso.org/publications/press/UnderstandingMetadata.pdf
20
BFS Data Stewards | 2012 Annual Report
solutions to avert this catastrophe in the future; providing insight into global sustainability and the unintended consequences of factors that influence it; and better use of technology like Master Data Management and Big Data techniques by marketing agencies for slicing-and-dicing customer traits and behaviors. Each of these factors is fundamentally based on the management of metadata. The most common, successful technique for managing metadata has been the ISO 11179 Metadata Registry standard (ISO 11179). In fact, ANSI NISO recommends the 36 use of ISO 11179 for metadata management. The United Nations Centre for Trade Facilitation and Electronic Business (UN/CEFACT) which crafts data exchange standards for global commerce has based its entire metadata management on ISO 11179. And another, the Universal Business Language (UBL), an international governance body with members from global cross-industry groups, academia, vendors, and government, has based their 37 specification on ISO 11179. The plethora of global support for ISO 11179 and the depth and breadth of the standard as a metadata management framework is incomparable to any other means of metadata management. For these reasons, the EBIM program has based its metadata management framework upon ISO 11179. ISO 11179 wasn’t designed for a specific industry or solely an academic purpose; instead, it was designed from the beginning to be implemented in any industry, in any sector, in any language. The only constraints, other than technology infused constraints, are an ability to understand in breadth and depth the purpose of metadata management. ISO 11179 provides a framework to unequivocally and systematically categorize any content; embed business rules in situ or at any aggregate level; extend the framework for unanticipated causes; implant rigor and discipline in the complex management of code listings (value domains and taxonomy); and surround every managed item within a tightly controlled environment. The value of ISO 11179 as a metadata management framework isn’t control; rather it is resiliency. By decomposing the meaning and purpose of an ISO 11179 administered item (for example a data element, a business glossary term, a business rule, an information asset, a strategic objective, or a business process) within a framework, it is possible to better understand any administered item in its intended use and in its unintended use. The challenge for managing data isn’t solely to manage what has been discovered and specified but to be pliable and agile yet durable and robust for the undiscovered, for the uncertain, and for the unrevealed. Relying on ISO 11179 affords the EBIM program the framework to view the management of data not as a data management problem but as a knowledge management opportunity. The data standardization perspective within EBIM (and supported by the metadata management framework) is that data isn’t intended only for atomic,
The challenge for managing data isn’t solely to manage what has been discovered and specified but to be pliable and agile yet durable and robust for the undiscovered, for the uncertain, and for the unrevealed. Relying on ISO 11179 affords the EBIM program the framework to view the management of data not as a data management problem but as a knowledge management opportunity.
36
ANSI NISO, “A Framework of Guidance for Building Good Digital Collections” (A NISO Recommended Practice), http://framework.niso.org/ 37 Universal Business Language, “The Universal Business Language”, Bosak, Jon, http://www.ebxml.eu.org/Documents/The_Universal_Business_Language.ppt
BFS Data Stewards | 2012 Annual Report
21
transactional purposes. Yes, strict data guidelines are necessary to complete a business transaction – the work of the Data Stewards ensures this kind of operational guidance. However, the use of data to generate facts, to be synthesized for meaningful information, and to be culled for knowledge and better decision-making has a great 38 dependency upon the metadata management framework. This lens provides the EBIM program a holistic view on information – not just data at an atomic level (defining metadata only for a field) but also the ability to make links between fields, to make anticipated and unanticipated connections across records, and, when fully mature, to have an ability to infer relationships among other higher levels of abstraction like information visualizations, data exchanges, and business intelligence reports. To be precise, the very essence of the meaning and purpose of data is to systematically enable the synthesis of meaningful information – this would not be a reality without the agility and rigor of ISO 11179 as the EBIM metadata management framework. Data Registry The EBIM Metadata Management Framework includes two critical perspectives: 1) metadata acquisition (or collection) and management (Metadata Management); and 2) the reporting of the metadata in context (Data Registry). The Data Registry is the web-based solution for reporting the metadata of approved Data Steward content which this past year has included data exchanges, data elements, value domains (code listings), business glossary terms, responsible stewards, business ownership (identify the industry standard that manages a data element), and other meaningful metadata for better understanding approved content. The EBIM Data Registry is in alignment with several industry and US Federal Government Data Registries – the National Information Exchange Model (NIEM) Data Registry and the US Environmental Protection Agency (EPA) Data Registry. The alignment is an important criterion to achieve interoperability between industry and government and to begin to eradicate siloes. The below screen snapshot provides the breakdown in functionality of the index listing.
38
Please see the EBIM Information Repositories section to better understand the relationships between data, facts, meaningful information, and knowledge.
22
BFS Data Stewards | 2012 Annual Report
39
Figure 5 Screen snapshot of Data Registry
EBIM Data Standardization Progress and Update Of the three EBIM components – Data Governance, Data Standardization, and Information Repositories, the EBIM Data Standardization component is the most mature. Several factors leading to the current standardization component are described here: 1) several past failed attempts within legacy-FMS for enterprise metadata management have provided lessons-learned opportunities; 2) research analysts like Forrester and Gartner continue to deliver high-value recommendations for the proper governance of data and the importance of 40 metadata management; 3) ongoing global public sector trend to incorporate open industry standards ; 4) current trend for organizations to maximize the use of data via technology solutions like Business Intelligence (BI), Master 41 Data Management (MDM) and Big Data ; and 5) managing for information-centric and shared platform 42 43 innovation under a global budget control climate by analyzing and mining accessible data sources. These factors illustrate the importance of data standardization to global public sector organizations like the recent UK Cabinet Office announcement that all public sector bodies in the UK must comply with centrally established open standards principles. This mandate was approved and put into law on 1 Nov 2012. Because it has based its
39
Please see the Appendix for more screen snapshots of the Data Registry. “Open Standards for Government Transformation: Enabling Transparency, Security, and Interoperability”, The World Bank Institute Workshop 2009, http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTINFORMATIONANDCOMMUNICATIONANDTECHNOLOGIES/EXTE DEVELOPMENT/0,,contentMDK:22103439~menuPK:2643963~pagePK:64020865~piPK:51164185~theSitePK:559460,00.html 41 “Tapping into the Power of Big Data”, Pwc Technology Forecast 2010, Issue 3, http://www.pwc.com/us/en/technologyforecast/2010/issue3/features/big-data-pg1.jhtml 42 st “Digital Government: Building a 21 Century Platform to Better Serve the American People”, OMB, 23 May 2012, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government-strategy.pdf 43 “Public Financial Management Responses to an Economically Challenged World”, Grant Thornton, January 2011, http://www.gt.com/staticfiles/GTCom/Public%20sector/ICGFM/2011%20ICGFM%20Global%20Financial%20Managers%20Su rvey%20Report.pdf 40
BFS Data Stewards | 2012 Annual Report
23
44
45
metadata management framework upon a metadata schema (Dublin Core) that is based on ISO 11179 , the UK Cabinet Office anticipates completing their shift towards open standards in the near future. Partly on account of the rise of data as an economic enabler for a recession-hit global economy, central governments of several nations including the US Federal Government have embraced open government initiatives like endorsing and embracing the use of open standards for interoperability between public 46 sector entities and industry . To complete this, the current Administration has produced, in accordance with the “Digital Government: Building a 21st Century Platform to Better Serve the American People” strategy document, guidance as to the management of government information. The guidance fundamentally states that all government information, much like the UK Cabinet Office 47 announcement, should implement open standards .
For this Bureau, the Data Stewards have embraced open industry standards not because of a US Federal Government mandate to use open standards but because the nature of financial management is entrenched in industry.
For this Bureau, the Data Stewards have embraced open industry standards not because of a US Federal Government mandate to use open standards but because the nature of financial management is entrenched in industry. Whereas the financial industry has for the past century evolved its dependence on industry standards, global public sector entities too have evolved their dependence on industry standards. The nature of financial management data includes reliance upon industry norms and standards, for example, when making a check payment, the PM data elements support the conventional means to issue that check payment. Conversely, when receiving a check payment, the RCM data elements support the conventional means to process that revenue collection. The following diagram provides a quick snapshot into the breakdown of content that has been approved by Data Stewards as those standards relate to reusing industry standards. This is another diagram whose contents will vary greatly as more lines of business have content that is reviewed by the Data Stewards. For this Bureau, one out of every four data elements that was approved by the Data Stewards is maintained by industry associations or industry standards organizations.
44
“e-Government Metadata Standard”,August 2006, http://webarchive.nationalarchives.gov.uk/+/http://www.cabinetoffice.gov.uk/media/273711/egmsv3-1.pdf 45 “Dublin Core Metadata Element Set – Reference Description”, 1999, http://dublincore.org/documents/1999/07/02/dces/ 46 “Open Government Declaration”, Open Government Initiative, Sept 2011, http://www.opengovpartnership.org/opengovernment-declaration 47 st “Digital Government: Building a 21 Century Platform to Better Serve the American People”, May 2012, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html#conclusion
24
BFS Data Stewards | 2012 Annual Report
Figure 6 Standardization Index
While other US Federal Agency programs like the National Information Exchange Model (NIEM) and the Environmental Protection Agency (EPA) who issue data standards haven’t the reliance on industry standards as this Bureau, maintaining this index for the future can only materialize if lines of business continue to rely on industry standards. The trend towards the second half of the year, as the Data Stewards approved content from GWA and DMS who both notably rely very little if any on industry standards, was to establish government-only, non-industry based standards. The reason for the industry-friendly approach to data standardization within the EBIM program is that due to the nature of financial management, the data requirements for issuing payments for example is almost entirely based on industry (for example, ACH Debits/Credits and Wire payments). Unfortunately, it is anticipated that the creation of government standards will increase and reliance on industry standards will decrease as more content is approved by the Data Stewards in the upcoming year. This is a reflection of the manner in which the lines of business have managed their data.
Unfortunately, it is anticipated that the creation of government standards will increase and reliance on industry standards will decrease as more content is approved by the Data Stewards in the upcoming year.
Whereas ISS, PM, and RCM lines of business are strong users of industry standards, all DMS content submitted this past year has been under the ownership of this Bureau (no reuse of industry standards). For DMS, this past year only one data exchange was approved by the Data Stewards – that of the Treasury Reporting on Receivables (TROR), a report specific to government agencies and another the CRS IAI has been through early phases of refinement. While it might be too early to caution DMS management on the paucity of industry standards in the foreseeable future, the Data Stewards will continue to monitor and report on the use of standardized content within DMS.
Just as DMS, GWA has not had any approved Data Steward content this past year that was business-owned by an industry standard. Marginally better, only 4% of GWA data elements were business-owned by OMB (the rest, 96% were business-owned by this Bureau). The context behind the paucity of external standardization sources is BFS Data Stewards | 2012 Annual Report
25
explained: 1) only a small fraction of the GWA data exchanges were analyzed this past year (so there is likelihood that data elements will surface that are industry standards, for example the Minimum Accounting Data Elements (MADE) subsection of the InterAgency Agreement (IAA) data exchange has an opportunity to reuse industry standards), and 2) the GWA line of business issues standards, on behalf of the Department of Treasury, for compliance by US Federal Agencies; GWA not only drives standardization, GWA, with the support of OMB and the Department of Treasury, seeks to enforce the standardization of financial management. Even so, the paucity of industry standards within GWA could be an opportunity for GWA to consider an industry-friendly partnership to spur both dialogue and governance into areas of commonality. This recommendation is supported by findings in the “Public Financial Management Responses to an Economically Challenging World” that reinforce the importance of international standards for public financial 48 management . From that document, these diagrams to the left were distilled from a global survey of public sector leadership which reveals that public financial management is shifting away from proprietary standards and towards a reliance on international standards.
A global survey of public sector leadership reveals that public financial management is shifting away from proprietary standards and towards a reliance on international standards. 48
“Public Financial Management Responses to an Economically Challenging World”, International Consortium on Governmental Financial Management and Grant Thornton, 2011, http://www.gt.com/staticfiles/GTCom/Public%20sector/ICGFM/2011%20ICGFM%20Global%20Financial%20Managers%20Su rvey%20Report.pdf
26
BFS Data Stewards | 2012 Annual Report
In general, the breadth of industry standards used within the data of this Bureau is exemplary of the direction that other global industry associations like the successful Interactive Financial Exchange (IFX) and the Open Travel Alliance (OTA) have steered their data standardization efforts – to reuse data standards to increase interoperability and to widen the possibility of future synergies with other communities. The below diagram provides a snapshot of the various standards bodies referenced within the data of this Bureau. Figure 7 Diagram of the Industry and Government Standards Organizations managed within the EBIM program FPDS (GSA) 2%
NIST NSS 0% 0% IRS 3% OMB 4% ABA 4%
USPS 28%
ANSI 1% D&B 1%
Fedwire 6% FRB 3% IETF 2% SWIFT 0%
ISO 5%
Plastic 26%
NACHA 15%
The content for this diagram is more stable than that of the other metrics provided in this report, in that the number of referenced standards can remain constant or increase with more content that is approved by the Data Stewards. The high-degree of referenced standards reflects a desire by the Data Stewards to remain as industryfriendly as possible when it comes to the data definitions (metadata management) for this Bureau. This is a distinct strength of the program and a clear indication of the importance of industry-government harmony for the benefit of our daily operations and for the benefit of an information-centric future of this Bureau and of the US Federal Government.
The global outcry for data that provides transparency and accountability can only be met by a clear understanding of the metadata.
The global outcry for data that provides transparency and accountability can only be met by a clear understanding of the metadata. As global public sector leaders identify data sets that provide transparency and accountability, the meaningfulness of that data set will be measured by these principles as outlined by the Open Government Initiative organization (the central body of management for the
BFS Data Stewards | 2012 Annual Report
27
49
countries pledging to be open with their data) : 1) increase the availability of information about government activities (open standards for interoperability; 2) support civic action (the data economy in partnership with public and industry); 3) implement the highest standards of professional integrity throughout our administrations, and 4) increase access to new technologies for openness and accountability. In whole these principles rest upon an increasing importance of not just data, but more importantly upon the definitions of data, i.e. the metadata. In order for public sector business leaders to open up data that is meaningful for transparency and accountability, it is paramount that those decisions align and reverberate throughout their organizations. As intensely as those leaders seek to open up their data, so must their programs be as intensely focused in the methods to know their data (metadata) and know their business (business process). For this reason, from an information-centric perspective, the precision and accuracy (or quality) of the metadata is paramount to the global outcry for transparency data. Globally, the most successful programs, like the Data.gov and Data.gov.uk efforts, maximize their return on investment by providing not just data with metadata (like business definition and string-length) in a non-machine accessible format (like PDF or Word documents) but by providing metadata in a manner that is embedded with the data (using open standards solutions like XML Schema and Linked Data). At this time though, the Data Stewards have refrained from mandating that all data exchanges release their formats in open standards solutions; there are true business benefit trade-offs that would undermine a mandate such as this. While the data exchanges couldn’t be mandated to use open standard solutions, the BIAB (under EA direction) have transformed the non-open standard data exchanges into open standard data exchanges for the purpose of documentation and representation. This internal transformation to open standards has several benefits: 1) uniformity of data acquisition format; 2) identification of reusable components is much easier after format has been standardized (lesson-learned from past failed attempts); 3) build other services on top of open standard solution (like web-site navigation, use of online mind maps, and searchable change logs are a few examples); and 4) align with other industry setting organizations, federal agencies, and industry associations who envision a greater ease in which to unlock the value of their data by automating the discovery of their metadata.
Because the EBIM Data Standardization component is the glue between Data Governance, Information Repositories, Data Quality, and Business Processes, the value of the EBIM program mostly rests upon it.
In effect, because the EBIM Data Standardization component is the glue between Data Governance, Information Repositories, Data Quality, and Business Processes, the value of the EBIM program mostly rests upon it, as displayed in the following diagram.
49
“Open Government Declaration�, Open Government Initiative, Sept 2011, http://www.opengovpartnership.org/opengovernment-declaration
28
BFS Data Stewards | 2012 Annual Report
Figure 8 Value of EBIM program Information-Centric Organization Visualize hindsights, insights, and foresights that create opportunities for better decision-making cured from evidence-based facts. From Trusted, Meaningful Information to Better Decision Making Deliver trusted and meaningful information based on business process so that business analytics can discover insights, trend hindsights, and rationalize foresights. From Data to Trusted, Meaningful Information Taming the tide of data by understanding and standardizing metadata, business rules, provenance, context, and industry best practices to create trusted information. Data and Process Business owns both the data and the process. Integrating the effects of data and process creates a synergistic effect for data standardization.
From an information-centric organization perspective, the value of the EBIM program isn’t to solely justify standardization. Instead, the EBIM program is a “means to an end” where the results of the EBIM program yield a consistent and interoperable foundation for the synthesis of meaningful information – where any Business Intelligence tool, any Service Oriented Architecture (SOA) service, or any Application Programming Interface (API) – can be assured of quality, trusted data.
Information Repositories At the time of the initial approval, the legacy FMS EB expressed an understanding and deep appreciation for the difficulty of the center of gravity of data – standardization. Standardization remains central to the EBIM program because it is the common glue that links together these essential parts: 1) links the Data Stewards with a rigorous, yet agile world-class metadata management framework; 2) connects the business process with the business intelligence and analytics expectations; 3) binds transactional data to evidence-based facts, and 4) is the nexus for interconnecting business initiatives and meaningful information towards an information-centric strategy. Prior to the EBIM approval, an informal but steadfast four-year initiative had accumulated standardized data elements between the PM, RCM, and GWA lines of business. While this informal initiative was successful, it provided the kind of insight into the gaps needed to be filled to ensure a successful formal initiative. Another important initiative that coincided with the start of the EBIM program was the advent of the Payment Information Repository (PIR). Originally seeded by OMB E-Gov to investigate the possibility of linking a payment with a vendor award, PIR was embarking on a mission that would take the operational payment data and identify common data BFS Data Stewards | 2012 Annual Report
29
elements with an obligation. To complete this, it had become apparent that the only government-wide authoritative source possible to link a payment to an obligation would involve integrating with the Federal Procurement Data System (FPDS). FPDS is a system that manages the document-level contract content reported by agencies where each record contains a unique composite document identifier (a set of four fields based on a Procurement Instrument Identifier – PIID). At that time, the PIR team identified that records sourced from the IPP system contained the FPDS unique composite document identifier fields. By linking a payment record with an IPP record that contained the FPDS unique composite document identifier fields, the PIR team would be able to link a payment file to an obligation and thus satisfy the OMB E-Gov initiative. While PIR commenced, this Bureau had begun work to conceptualize the Financial Information Repository (FIR) with the added backdrop that the Information Repositories would integrate with FIR. These separate but related enterprise initiatives reinforced the importance of an information-centric strategy for not just technology considerations but for broader business/mission considerations such as this Bureau’s response to the DATA Act and the impacts of those decisions upon the management of data. Including these business drivers has been an essential ingredient into the ability of the EBIM program to successfully complete the work of this third EBIM component. For this reason, the purpose of the EBIM Information Repositories program is directly linked to the overall strategy of data for this Bureau which has materialized in these internal and external factors: 1) this Bureau’s response to the DATA Act; 2) this Bureau’s response to the Dick Gregg Testimony; 3) this Bureau’s response to the current Administration’s “Digital Government: Building a 21st Century Platform to Better Serve the American People” focus on information-centric strategies which also includes an emphasis on data security; 4) this Bureau’s formation and implementation of the Business Intelligence and Analytics team, 5) this Bureau’s response to the shifts and trends of industry best practices for the management of data and techniques towards the synthesis of meaningful information; and 6) this Bureau’s ongoing partnership with the Office of Financial Innovation and Transformation (OFIT) to interoperate and standardize content with other US Federal Government lines of business. These factors steer the direction of the EBIM Information Repositories solution. With these factors in mind, the goal of the EBIM Information Repositories component is to provide the techniques to optimize the exchange of data across all the lines of business repositories so that internal or external business users can discover new insights, analyze past activity (hindsight), and build the correlations and tolerances towards an evidence-based platform so as to anticipate future outcomes (foresight). Background Knowledge Hierarchy 50 Just as language is vital to human communication, metadata is vital to digital communication . To illustrate this, 51 The Knowledge Hierarchy (or Wisdom Hierarchy) depicted in the diagram below represents a simplistic overview of the relationships between the various stages leading to knowledge (or Business Intelligence/Business Analytics). The Knowledge Hierarchy in a digital environment begins with metadata which is at the intersection of both 52 context and understanding. The intersection – metadata management – is the core of the Knowledge Hierarchy. If the metadata is wrong, then the understanding and/or context will be wrong and if the understanding and/or context are wrong, then any knowledge gleaned will be wrong. The emphasis on correct metadata translates into
50
Please see the EBIM Data Standardization – Metadata Management section for a more detailed explanation. The Wisdom Hierarchy: Representations of the DIKW Hierarchy, Rowley, Jennifer, http://jis.sagepub.com/content/33/2/163.abstract 52 “Experience Design”, Shedroff, Nathan, http://www.amazon.com/Experience-Design-Shedroff-NathanPaperback/dp/B008AUGHI6 51
30
BFS Data Stewards | 2012 Annual Report
data quality outcomes. If there is a lack of insight into the metadata of an organization’s data, then there is a lack of insight into the quality output of that organization – “[t]he organizations that do not have a robust metadata capability will face the same fate as the Neanderthals. They will be overtaken and competed out of existence by the organizations that make effective use 53 of metadata“.
“The organizations that do not have a robust metadata capability will face the same fate as the Neanderthals. They will be overtaken and competed out of existence by the organizations that make effective use of metadata.“
Figure 9 Knowledge Hierarchy
es Pr
t en
st Pa
As the diagram makes evident, in order to provide meaning to data, facts are distilled from the data; this process of evaluation doesn’t seek to interpret the data into hypothesis or theory but rather represents a systematic, controlled process to gather objective and verifiable observations. This step is supported by several global industries but none more so than within the scientific research industry. With the issue of the “Fourth Paradigm: Data-Intensive Scientific Discovery”, a seminal book that cobbled together from over seventy leading scientific contributors the quintessential purpose and future of science, the editors (and many contributors) suggest altering the scientific method by replacing the hypothesis-driven nature of their work with a new scientific methodology 54 driven by data . Industries becoming data-driven are nothing new, but significantly altering the scientific method which has had its origins in Ancient Egypt and the Greeks of classic antiquity is; and with it would change the landscape of science and, as the editors posit, would lead to new discoveries built on availability and interoperability.
53
“Metadata is the DNA for Your Organization”, Adelman, Sid, http://www.informationmanagement.com/issues/2007_48/10001359-1.html 54 “Fourth Paradigm: Data-Intensive Scientific Discovery”, Hey, Tony et al, http://research.microsoft.com/enus/collaboration/fourthparadigm/4thparadigm_science.pdf
BFS Data Stewards | 2012 Annual Report
31
When facts or observations retain the appropriate level of business context and business understanding, they are synthesized into meaningful information. Data, Facts, and Information are seen in the past, as images in a rearview mirror. Knowledge, on the other hand, is seen in the present, as an active accumulation of past experiences in combination with the gathering and connecting of data, facts, and information. Building from the Knowledge Hierarchy, the next diagram represents the hierarchy as a flow where transactional data is cleansed, refined, and harmonized into evidence-based facts so that meaningful information - insight, hindsight, and foresight - can be extracted from a trusted source. Figure 10 Flow of Knowledge
As select transactional systems transmit data to an Information Repository, the data and context for that data are retained from the perspective of the transactional system – in the form of a relational, normalized view where that transactional system’s goal is to quickly persist the data according to specific rules. Even though there are often standardized data elements that transactional systems must conform to, the transactional system’s relational, normalized view retain a particular physical model style and system complexity that is specific to the purpose of that system. The more transactional systems a line of business needs to complete the mission of that line of
To be effective for a line of business and more importantly to be effective to an enterprise, [evidence-based facts are] where the heterogeneous world of transactional system data [are] homogenized; where the diversity of transactional system data structures are conformed into a single consistent platform. 32
BFS Data Stewards | 2012 Annual Report
business; the more diversity is introduced, the more complexity is required, and the more variation must be managed. This seeming-chaos represents the left-most part of the diagram. In order for a line of business to centrally interpret and act upon its pool of transaction data, that line of business will need to generate evidence-based facts where those facts are distilled from the transactional systems’ data. This is a critical step. In practical terms, this is the entry point into data centralization for a line of business. To be effective for a line of business and more importantly to be effective to an enterprise, this is where the heterogeneous world of transactional system data is homogenized; where the diversity of transactional system data structure is conformed into a single consistent platform. The purpose of enterprise conformity isn’t to lockdown the value of the data; it is to liberate the possibility of it. Like the analogy of an iron chain being as strong as the weakest link, the single consistent platform is only useful when all Information Repositories can conform to it. Once conformity is evident, the fruits of an Information Repository provide a trusted source of standardized content in a standardized structure accessible in a standardized manner. This is represented by the right-most part of the diagram. To that end, the goal of the EBIM Information Repository program has been to provide the necessary techniques and insight into establishing this kind of value for each line of business and for the enterprise. Systems Thinking and Data Quality The importance of the primacy of data isn't just that there is so much of it (Big Data); the importance of the primacy of data is in deriving value from it (Data Quality and Knowledge Management) in context with business direction and in harmony with external factors (Systems Thinking). This multi-dimensional approach to better understanding data isn’t a “nice-to-have”; it is a requirement towards a platform for the synthesis of meaningful information. The metadata must be accurate and precise, the content of data must conform to the metadata, and the evidence-based facts must be calculated based on the metadata and content in context. These must be unquestioned – it is the only means to harness the kind of value that data analytics can reveal. A Business Intelligence tool can only render what has been provided; if the data that has been provided is wrong, the tool will continue to produce wrong results. Fixing the tool isn’t the solution; ensuring enterprise conformance to data standards is.
The purpose of enterprise conformity isn’t to lockdown the value of the data; it is to liberate the possibility of it.
The importance of knowledge management is a defining criterion for the approach to this EBIM program but it is by no means the only approach embraced by the EBIM program. To be grounded, the EBIM program has since inception embraced a systems thinking approach. Problems aren’t always linear; the cause-and-effect relationships aren’t always evident; outcomes aren’t always understood with just the facts; and hidden opportunities lay dormant waiting to be discovered. From an engineering perspective, any system is submerged in complexity – from the functionality of our bodies, to smartphones, to government regulations – and the more interconnected systems there are, the more layers of complexity are introduced. The Director of National Intelligence, Dr. James Clapper stated, “[w]e no longer operate largely on the principle of compartmentalization, that is, sharing information based on ‘need to know.’ We now start from the imperative of ‘responsibility to share,’ in order to collaborate with and better support our intelligence consumers—from the White House to the 55 foxhole“ . The concentrated ability to collectively link quality-controlled data sets is an effective path for
55
"How 9/11 Transformed the Intelligence Community", Wall Street Journal, http://online.wsj.com/article/SB10001424053111904537404576554430822300352.html
BFS Data Stewards | 2012 Annual Report
33
meaningful information to emerge which if managed appropriately leads to a decision-making advantage. The advantages aren’t solely for leadership to better gain cost controls or meet current key performance indicators; by exploiting that concentrated ability, it is also about discovering new opportunities, about uncovering new ideas hiding in plain sight, and about linking what we know today with grander possibilities that escape our imagination. To be prepared for this concentrated ability, this program has embraced a systems thinking approach that connects several dimensions EBIM Information Repositories Progress and Update In the summer of 2012, through the governance component of the EBIM program, the Data Stewards approved the creation of the Conformed Dimensional Model Technical Subcommittee (CDMTC). The CDMTC, not unlike the other Technical Subcommittee that was formed in the spring of 2012, was formed to investigate, discuss, and recommend a technical solution for approval by the Data Stewards. In this particular case, the CDMTC membership included voting and alternate Data Stewards (from GWA, RCM, PM, and DMS), data architect subject matter experts from this Bureau and from the FRB Kansas City, and external expert consultation from government experts at GSA and OFR and from industry experts Ralph Kimball (one of the “fathers” of data warehousing) and Margy Ross (Kimball Group) and Brian Hopkins (Forrester). From the onset, this CDMTC, as with all other Technical Subcommittees, was given a short-time period to formulate a recommendation. The below diagram embodies at a notional level a path for how the lines of business Information Repositories were to exchange data with each other and with the FIR by commencing with how the transactional systems data travels from agency submission to Information Repositories and onward to the FIR. Figure 11 Information Repository and Data Standardization Gap
Because of the importance of transparency and open government to the current Administration specifically and to global public sector open government initiatives in general, and because of the legacy FMS EB approval of the EBIM program, and because of the legacy FMS EB approved Business Intelligence and Analytics Division, it was decidedly evident that this Bureau was pursuing an information-centric strategy. For this Bureau, one of the critical steps within an information-centric strategy is to require the systematic amalgamation of operational data for analytical purposes. The diagram above represents the systematic amalgamation as being based upon two vectors: 1) the Data Stewards data standardization effort, as is evident on the rightmost side of the diagram, and 2) 34
BFS Data Stewards | 2012 Annual Report
the importance of reporting and analytic query performance among the Information Repositories and the FIR, as is evident from the leftmost side of the diagram. The primary task of this Technical Subcommittee was to bridge the data standardization effort with the importance of reporting and analytic query performance into a cohesive and implementable technique for the Information Repositories to follow.
The primary task of this Technical Subcommittee was to bridge the data standardization effort with the importance of reporting and analytic query performance into a cohesive and implementable technique for the Information Repositories to follow.
To better understand the layout of Information Repositories, the CDMTC prepared the following diagram to illustrate the three basic areas and their functionality: 1) Data Staging area is a storage area and set of processes that clean, transform, combine, deduplicate, household, archive, and prepare operational data for use; 2) Data Presentation area is where the Information Repository data is organized, stored, and available for direct querying by users, data access tools, and other analytical applications; and 3) Data Access area is where a variety of capabilities are provided to business users to leverage the presentation area for analytical decision making. All 56 data access tools query the data in the data presentation area . From the diagram, as data sources make their data available to an Information Repository, the data becomes transformed according to approved standardized data elements (as approved by the Data Stewards) while retaining their original context and meaning. The next area is where the data aligned with the final recommendation of the CDMTC – the usage of conformed dimensional models at all the Information Repositories. The last area represents the universality of access once the data has been fashioned into a platform optimized for data standards and query performance. Figure 12 Information Repository Areas
The rationale for the approved Data Stewards recommendation is in the Appendix - Data Steward Information Repository Dimensional Model Recommendation.
56
“The Data Warehouse Tool Kit� Second Edition, Kimball, Ralph
BFS Data Stewards | 2012 Annual Report
35
Because of the importance of the CDMTC, the Data Stewards approved the continuation of their work and have required updates at every meeting. As a final note to the work of the CDMTC, this work group has had consistent attendance from all the data architects from the Information Repositories (includes both this Bureau’s data architect employees and that of the Information Repository service providers (such as FRB). The value of this work group is actively being measured by the Information Repository data architects. As with the EBIM Data Standardization component, the work of the Data Stewards is again central to this component. Consensus-based interoperability has been pivotal to the program and to this component.
36
BFS Data Stewards | 2012 Annual Report
Annual Survey Results Background As a consequence of providing and servicing the three EBIM components – Data Governance, Data Standardization, and Information Repositories, the Data Stewards are uniquely positioned at the intersection of their line of business strategy and goals and operationalization of data standardization. Because of this unique intersection and because of a growing interest in the overall management of data, the Data Stewards inaugurated a data-related annual survey of their leadership (not all Data Stewards were able to speak directly with their line of business leadership). This survey afforded the EBIM program an invaluable snapshot into the dynamic interests of the management of data by the leadership of each of the lines of business. To remain relevant to the lines of business that have committed to the Data Steward program, it was important to the EBIM program that the leadership within a line of business directly responds to this survey. In fact, just over 50% of the content submitted was provided by either the AC or the Deputy AC of a line of business. Back when the EBIM program was approved by the legacy FMS EB, one of the clear early goals of program was to formalize the process of enterprise data standardization through governance that was driven by the mission of the Bureau. Several external factors, for example ongoing budget control climate and ongoing Transparency initiatives like the DATA Act, and internal factors, for example the FMS-BPD consolidation and the ramp-up and maturity of the information repositories, have broadened the relevance of data as it relates to the synthesis of meaningful information which itself relates to the importance of hindsight, insight, and foresight for decision-making. These factors have weighed upon the decisions and deliberations of the Data Stewards, the Chief Business Architect who is the sponsor of the EBIM program, OFIT, and the leadership of the lines of business. The past year has witnessed a flurry of activity in our understanding of the value of Information Repositories as a vehicle for an informationsharing to synthesize meaningful information from both the operational quality content and the analytical quality content. In effect, this survey was intended to provide the Data Stewards further insight into their leadership’s understanding, goals, and visions of the management of data and to juxtapose that input with the EBIM approved initiatives directly related to the three components – Data Governance, Data Standardization, and Information Repositories.
Survey Results The Data Stewards collaboratively and with active participation methodically completed the process of combining and merging similar data topics and then further investigating other data topics to yield a meaningful set of data topic categories – an aggregation of the individual topics that had been collected. The below alphabetical-order listing contains the enumerated categories and a high level description of each. Table 2 Alphabetic-order listing of Data Topic Categories
Categories
Topic Description
Business Glossary
The listing of mission-related business terms, their definitions (pulled from authorized sources), and other metadata that aid in the understanding of approved Data Steward content; for example, the list would contain the term and definition of “Obligation” or the term and definition of a “Voucher”. A business process is a collection of related, structured activities or tasks that produce a specific service or product (serve a particular goal) for a particular customer or customers. A business process is a distinct activity described by its inputs and outputs. A process describes a unique behavior that has a beginning and an end and is performed repeatedly.
Business Process
BFS Data Stewards | 2012 Annual Report
37
Data Analysis
Data Architecture
Data Exchange
Data Governance
Data Quality
Data Security
Data Standardization Program/Project Information
This is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facts and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains (depth and breadth). This is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations (data models, data processes, and information sharing platform). This is the process of taking data structured under a source schema and actually transforming it into data structured under a target schema, so that the target data is an accurate representation of the source data; for example, this would include the content exchanged between systems (interface) or content that is exchanged from a system to an end-user (report). This is a strategic initiative involving multiple business lines and is the operating discipline for managing data and information as a key enterprise asset. This operating discipline includes organization, processes, and tools for establishing and exercising decision rights regarding valuation and management of data. The fitness for use of information; information that meets the requirements of its authors, users, and administrators. An aspect or property of information or information service that an information customer deems important in order to be considered "quality information." Characteristics include completeness, accuracy, timeliness, understandability, objectivity and presentation clarity, among others. The protection of data from accidental or malicious acts, usually by taking appropriate actions. These acts may be loss or unauthorized modification, destruction, access, disclosure, or acquisition. The process of achieving agreement on common data definitions, stewardship, representation, structures, and other metadata classifications to which all data must conform. The ability to provide cost and revenue/collections information by program or project to respond to Congressional or other inquiries (depth of granularity).
To better understand the above category listing, the Data Stewards subdivided those categories into key subcategories, reflecting the results of the survey. The below diagram displays the categories indicated with a bold line perimeter and links to each of the sub-categories that belong it. Figure 13 Data Topic Categories and Sub-Categories
38
BFS Data Stewards | 2012 Annual Report
For example, the “Data Governance” category, which had several individual data topics from each of the lines of business, has four sub-categories; meaning that based on the survey, this category could be decomposed into four major items: “Transparency”, “Cross-Government”, “Executive”, and “Financial Information Repository”. The “Transparency” sub-category corresponds to the need for a single authorized business decision towards resolving our responses to the DATA Act, US Federal Government Open Government/Transparency initiatives, and to 57 managing the implementation of Dick Gregg’s Testimonial to Congress . The “Cross-Government” sub-category corresponds to the lack of coordinated outreach and work of financial management data standardization to the US Federal Government. The “Executive” sub-category corresponds to the data topics that expressed a gap in an accountable and responsible executive champion to lead the management of data for this Bureau. The “Financial Information Repository” sub-category corresponds to the call for a data governance group within FIR. Here is a description of the process underwent to arrive at the results of the annual survey. The Data Stewards quantified a relationship between the number of stakeholders, which included the legacy FMS lines of business and OFIT, and the aggregate categories. This correlation provided the insight necessary to disambiguate some of the notions that had been circulating about the management of data and more importantly quantified the expectations from their leadership in context. Compartmentalizing the data topics by aggregation was an important factor but by no means the only factor in better understanding the survey results. Another dimension to understand the current data issues was to dissect the data topics by weight or by importance given to each data topic – the weight was determined collaboratively by the Data Stewards and again based on the results of the survey. If a category had broad appeal to all stakeholders, then the category was given more weight. The below diagram visualizes the data topic categories based on this weighting.
57
“Show Me the Money: Improving the Transparency of Federal Spending”, Gregg, Richard L., http://www.hsgac.senate.gov/hearings/show-me-the-money-improving-the-transparency-of-federal-spending
BFS Data Stewards | 2012 Annual Report
39
Figure 14 Data Categories in Priority Groups
High Priority 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Medium Priority
Low Priority
18%
100% 100% 100%
22%
100% 100% 100% 100% 100% 82%
78%
As with the work of curating the data topic categories listing, this illustration aided the Data Stewards’ understanding by dividing the categories into three areas: 1) High Priority includes categories that span multiple lines of business and that would involve modifying current implementations or would involve changes that alter the typical means of resolving data related issues; 2) Medium Priority includes categories that span multiple lines of business and involve centralizing processes that have been distributed or in some cases unmanaged; and 3) Low Priority includes categories that do not span multiple lines of business and that would benefit from specific, localized solutions. By subdividing the categories into priority, it was noticeably clearer that there was a distinct pattern afoot: 1) the Medium Priority grouped items seemed to reflect issues that needed most consideration; and 2) the few High Priority grouped items could, when executed and implemented, reverberate changes intentionally or unintentionally to the other groups. The cascading effect of the latter pattern in fact confirms the existence of an emergent order in the breakdown of these categories – that High Priority items should be addressed first followed by Medium Priority items. As is typical in governance situations, work becomes multi-tracked so as not to halt or slowdown work progress. The research into the breakdown of these categories indicates that, while the work of data management could be multi-tracked, the high impact of the decisions that needed to be made to address the High Priority group items would benefit greatly by arriving at those decisions before addressing the other Priority groups. The next diagram combines the observations of the prior two diagrams. The below diagram represents the categories overlaid with the results of the priority group analysis.
40
BFS Data Stewards | 2012 Annual Report
Figure 15 Data Categories and Sub-Categories in Priority Groups
This illustrates the categories not just as an alphabetic listing but rather builds upon the enterprise data priority to reflect the categories logically segmented by priority. The left-most part of the diagram contains the High Priority Categories, followed by the Medium Priority Categories in the center, and the Low Priority Categories are on the right-most part of the diagram. On each side of the diagram is a “Stakeholder Support Scale” – this vertical scale indicates the total number of stakeholders that supported a category, for example, the “Data Security” category in the High Priority group had a total higher stakeholder count than did the “Data Quality” category in the Medium Priority group. Towards the bottom of each priority group is a “Strength Scale” – this horizontal scale represents the average number of stakeholders per related data topics, for example, the “Data Analysis – Granularity” subcategory, which is within the High Priority group, ranked stronger than the “Data Governance - Transparency” subcategory. This means that even though the “Data Analysis – Granularity” sub-category had more total number of stakeholders; the “Data Governance – Transparency” sub-category had more topics from different lines of business – this was considered to be “strength” because of the ubiquity of the topic to the lines of business. To put simply, the vertical axis reflects the number of stakeholders; the horizontal axis reflects the number of similar topics. For categories or sub-categories that are shared between groups, the diagram rendered the highest “Stakeholder Support Scale” of either group and diminished the weight of the “Strength Scale”. [Caveat: there was some artistic freedom needed to render the items crowded in the center of the Medium Priority.]
Business intelligence solutions cannot derive the kind of business value that senior leadership seeks unless the data quality framework is managed by the enterprise and is meaningful and relevant to stakeholders.
The above diagram represents a quantifiable roadmap of the pulse of leadership within each line of business and OFIT for the management of data. Within each group, the categories and sub-categories that rendered to the top-left of each group reveal both an explicit concentrated BFS Data Stewards | 2012 Annual Report
41
desire to complete those categories in a timely fashion and an implied desire to be led towards that completion. For example, the “Data Standardization – Data Registry” sub-category scored the highest stakeholder count regardless of priority group. This confirms the relative importance of the EBIM Data Standardization to the stakeholders but also reveals the importance of the governance support team necessary to make this item successful – proper staffing, implementing appropriate processes, and using suitable tools. The “Data Security – Policy” sub-category had the next highest stakeholder count. As with the “Data Standardization – Data Registry” described above, the expectation of the stakeholders would be that the “Data Security – Policy” sub-category be managed in a similar way with similar dedicated staff with appropriate processes and tools. Another takeaway from the results of the survey and extracted from the diagram above is that the top third of every priority group is highly relevant and timely to the stakeholders and is in need of an accountable party to continue, or in most cases to begin, reporting metrics to the stakeholders. For example, these items: “Data Security – Policy”, “Data Governance - Executive”, “Data Standardization - Data Registry”, “Data Architecture Information Repository”, and “Data Exchange” could be considered “Hot” items – because of the popularity of the topic to stakeholders and because of the high topic count. The items in the bottom third of every priority group were items where the stakeholders needed to better understand the nature of the solution at the time that the survey was conducted (mid-November). For example, at that time, it was unclear how this Bureau would manage the “Transparency” issues related to the DATA Act and more importantly to this Bureau’s response to Dick Gregg’s testimony. Since that time of the survey, OFIT has, in coordination with representatives from legacy FMS lines of business, begun crafting a proposed recommendation for “Transparency” (OFIT is accountable). The surprise entry in the bottom third is the “Data Quality” category. This is a category that underpins the importance of data standardization and data governance. Business intelligence solutions cannot derive the kind of business value that senior leadership seeks unless the data quality framework is managed by the enterprise and is meaningful and relevant to stakeholders. While the “Data Quality” category had low marks for stakeholder count and strength, this reflects an opportunity to better educate the importance of the quality management aspect of information management. Also while a single “Information Repositories” category didn’t surface, the Data Stewards applied several categories that encompassed this concept – Data Analysis, Data Architecture, and Data Quality. These three categories, as is evident in the diagram above, mostly lie within the Medium Priority group.
RACI Matrix Because of the insight gained through the visualizations of the topics, categories, and stakeholders, the Data Stewards constructed a Responsibility, Accountability, Consulted, and Informed (RACI) matrix to both better attribute accountability and responsiblity but also to better understand the way in which these categories could be implemented within the Bureau. The RACI Matrix needs to be understood in context of the results of this survey as the conclusions drawn from this matrix are intertwined with the results of the survey. In effect, this matrix only provides insight into the way in which the Data Stewards interpreted accountability and responsibility to the categories and sub-categories. Unlike the prior diagrams, the assigning of values within the RACI Matrix was not algorithmic but instead was completed via unanimous approval by the Data Stewards and warranted deep discussions on the meaning of the different rows and the intersecting Roles. The purpose of a RACI Matrix is to assign accountability and responsibility to work. The below diagram is an abbreviated version of the Data Stewards RACI Matrix (some columns and rows were removed for brevity). The Appendix – RACI Matrix contains the matrix in its entirety.
42
BFS Data Stewards | 2012 Annual Report
Figure 16 RACI Matrix Diagram
As the diagram illustrates, the columns (from left to right) are: Category, Sub-Category, Priority Group, and the four identified Roles (BFS Executive, OFIT, Legacy FMS Business Lines, and BFS Data Stewards). The rows in the RACI Matrix are grouped by Category and color-coded by Priority Group. Each row has a single role that is accountable (“A”). A row may have multiple roles that are responsible (“R”), consulted (“C”), or informed (“I”). If a row doesn’t have a “C” (or “Consulted”) value, then the Role who is accountable is also responsible. The Roles are described here: “BFS Executive” – an executive leader or executive forum empowered to approve potential sweeping changes, to rally support for causes related to data and the synthesis of meaningful information, and to influence the strategic data direction of this Bureau; “Office of Financial Innovation and Transformation” (OFIT) – the office within the Treasury Domestic Finance that is a catalyst in the transformation of Federal financial management, in coordination with the CFO Council, by facilitating the development and deployment of innovative, common, cost effective solutions that improve data quality and create efficiencies; “Legacy FMS Lines of Business” – the collection of lines of business that are represented by the Data Stewards; and “BFS Data Stewards” – the EBIM forum with business representatives from this Bureau’s line of business. One of the conclusions to draw from the RACI Matrix is that based on the weighting of the topics and stakeholders, the BFS Executive and OFIT Roles were accountable for all the rows in the High Priority Group. There is a split between the first two columns and the latter two columns: the first two columns whose rows have “A” (or accountable) as values were calculated to be High Priority and the latter two columns whose rows have “A” as values were calculated to be either Medium or Low Priority Groups. In other words, whenever the BFS Executive Role was the accountable party for a row, OFIT would be the responsible party. The same pairing is evident for the the Medium and Low Priority Groups – regardless of which Role was accountable, more than likely the other Role would be responsible. There are a few noteworthy yet isolated exception cases: “Granularity”, Information Repositories”, and “Transparency”. Based on the survey, it became clear that these items of interest spanned both High Priority and Medium Priority Groups and impacted the most Roles in the matrix – meaning that both strategic direction as well as tactical direction was needed - where the tactical direction could be interpreted as imminent.
BFS Data Stewards | 2012 Annual Report
43
Another conclusion to draw from the RACI Matrix is that the four identified Roles were the “best fit� to resolve the Categories. At the time of the analysis, it was clear that the BFS Executive Role had not been fully identified and defined, nor in some respects, had the Line of Business Role been fully identified and defined. The Data Stewards, in the analysis of the annual survey, identified these two areas as gaps or as opportunities to be addressed by a wider audience. In fact, the Data Stewards Annual Report Recommendations make clear this gap and address a potential solution for the establishment of an Executive Champion to steer strategic interests and coordinate with the Data Stewards for internal standardization and with OFIT for government-wide standardization.
44
BFS Data Stewards | 2012 Annual Report
Acknowledgements The success of the Enterprise Business Information Management program would not have been as successful without the hard work and dedication of these individuals and the visionary management team that they support. Sponsor, Chief Business Architect, Mike Murray, Information and Security Services Chair, Data Stewards, Marcel Jemio, Information and Security Services Voting Data Steward, Richard Gassaway, Debt Management Services Voting Data Steward, Patricia Smith, Governmentwide Accounting Voting Data Steward, Richard Bauder, Payment Management Voting Data Steward, Shannon Koppers, Revenue Collection Management Voting Data Steward, Kim Smith, Management Alternate Data Steward, Rosa Chan, Debt Management Services Alternate Data Steward, Gary Bement, Governmentwide Accounting Alternate Data Steward, Cyndi Pham, Governmentwide Accounting Alternate Data Steward, Raghu Vallurupalli, Payment Management Alternate Data Steward, Marisa Schmaeder, Revenue Collection Management Invited Data Steward, Delores Burkhardt, Office of Financial Innovation and Transformation Invited Data Steward, Steve Ehle, Bureau of Public Debt (7 Oct 2013) Invited Data Steward, Ron Graham, Bureau of Public Debt (7 Oct 2013) Past Data Steward, Gregory Till, Revenue Collection Management Past Data Steward, Whitney Goss, Governmentwide Accounting Past Data Steward, Robin Gilliam, Governmentwide Accounting
Invited Experts Margot Kaeser Leroy Larkins Tony Paul David Saltiel
Donna Morgan David Castle Corie Taylor Shannon Redding
Kwema Ledbetter Andy Flower Jeff Hoge
Whitney Goss Gary Schaetz Andrew Ganahl
Olu Faokunla Brian Reeves Ashu Goel
Dave Stuart Doug Little Mark Hasson
Data Stewards Support Team: Brian Brotsos, Donna Morgan, Wendy Wang, Max Dolinsky
BFS Data Stewards | 2012 Annual Report
45
Terms Glossary The EBIM program has baselined all data/information terms and definitions upon the ANSI-approved National Committee for Information Technology Standards (NCITS) American National Standard Dictionary of Information Technology (ANSDIT). ANSDIT has been harmonized with the ISO/IEC-2382, Information Technology – Vocabulary. The below definitions were extracted from ANSDIT or amended to include specific relevance to this Bureau.
Consensus-based Interoperability is the development of high-quality specifications, implementation guides, and other deliverables based on the consensus of the Community of Interest and should include a high quality process, as defined by the Internet Engineering Task Force (IETF RFC 2026) which states that the high quality process goals are: technical excellence; prior implementation and testing; clear, concise, and easily understood documentation; openness and fairness; and timeliness. These are designed to be fair, open, and objective; to reflect existing (proven) practice; and to be flexible. These are intended to provide a fair, open, and objective basis for developing, evaluating, and adopting standards. They provide ample opportunity for participation and comment by all interested parties. At each stage of the Data Steward Process Model, a specification is repeatedly discussed and its merits debated in open meetings and/or email. These goals are explicitly aimed at recognizing and adopting generally-accepted practices. Thus, a candidate specification must be implemented and tested for correct operation and interoperability by multiple independent parties and utilized in increasingly demanding environments, before it can be adopted as a standard. These provide a great deal of flexibility to adapt to the wide variety of circumstances that occur in the standardization process. Data is any representation subject to interpretation such as through analysis or pattern matching or metadata or to which meaning may be assigned, such as by applying social conventions or special agreed upon codes. Data can be processed by humans or by automatic means. Data Governance is a strategic initiative involving multiple business lines and is the operating discipline for managing data and information as a key enterprise asset. This operating discipline includes organization, processes, and tools for establishing and exercising decision rights regarding valuation and management of data. Data Access is the place where a variety of capabilities are provided to business users to leverage the presentation area for analytical decision making. All data access tools query the data in the data presentation area. Data Presentation is where data is organized, stored, and made available for direct querying by users, report writers and other analytical applications. The data presentation area contains all that the business community sees and touches via data access tools. Data Marts reside in this area. A Data Mart is a wedge of the overall presentation area pie; a Data Mart presents the data from a single business process (which may cross organizational functions). Dimensional Models thrive in this area. The data is to be presented, stored, and accessed in dimensional schemas. Dimensional Modeling is the most viable technique for delivering data to information repository users. To be useful, the presentation area Data Marts must contain detailed, atomic data. Atomic data provides deep (not shallow) opportunities for unpredictable ad hoc queries. Data Marts may also contain performance-enhancing summary data or aggregates, but it is not sufficient to deliver these summaries without the underlying granular data in dimensional form. All the Data Marts should be built using common dimensions and facts (conformed dimensions). Without shared, conformed dimensions and facts, a Data Mart is a standalone stovepipe application. These types of Data Marts perpetuate incompatible views of the enterprise. 46
BFS Data Stewards | 2012 Annual Report
Data Primacy is the strategic acceptance of the importance of data for the synthesis of meaningful information. Data Staging is both a storage area and a set of processes commonly referred to as extract-transform-load (ETL). The data staging area is everything between the operational source systems and the data presentation area. Data Standard is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose. Data Steward is a person or organization delegated the responsibility for managing a specific set of data resources. A data steward must be multi-lingual: speak the language of business and know the language of technology. Fact is a statement whose validity is accepted with a high certainty factor. Governance is the process of decision-making and the process by which decisions are implemented. Information is (1) the meaning that is currently assigned to data by means of the conventions applied to these data. (2) In information processing, any fact, concept, or meaning derived from data and associated context or selected from knowledge. (3) In a conceptual schema language, any kind of knowledge about things, facts, or concepts of a universe of discourse that is exchangeable among users. (4) In information theory, knowledge that reduces or removes uncertainty about the occurrence of a specific event from a given set of possible events. Information-Centric Organization is an organization that shifts how it thinks about digital information. Rather than thinking primarily about the final presentation — publishing web pages, mobile applications or brochures—an information-centric approach focuses on ensuring data and content are accurate, available, and secure. An information-centric organization treats all content as data — turning any unstructured content into structured data—then ensures all structured data are associated with valid metadata. Providing this information through web APIs helps architect for interoperability and openness, and makes data assets freely available for use within agencies, between agencies, in the private sector, or by citizens. This approach also supports device-agnostic security and privacy controls, as attributes can be applied directly to the data and monitored through metadata, enabling agencies to focus on securing the data and not the device. Interoperability is the ability of two or more systems or components to exchange information and to use the information that has been exchanged (IEEE). Knowledge is (1) an organized, integrated collection of facts and generalizations (gained through a synthesis of meaningful information). (2) Information representing human experience and expertise. The knowledge built into an expert system includes facts, events, and rules. Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. Semantic Equivalence is a characteristic of information quality that measures the degree to which data stored in multiple places is conceptually equal. Equivalence indicates the data has equal values or is in essence the same.
BFS Data Stewards | 2012 Annual Report
47
Appendix
48
BFS Data Stewards | 2012 Annual Report
Pre-EBIM Background By 2010, the BIAB within EAD had successfully migrated several legacy FMS major systems within the Revenue Collections Management (RCM), Governmentwide Accounting (GWA), and Payment Management (PM) lines of business to rely on the data standardization initiative (using FMS XML) that began within RCM in 2007. Prior to the legacy FMS EB approval of the EBIM program, RCM (at the time known as “Fed Finance�) had initiated the pursuit of data standardization among its transacting systems. This effort began in earnest in the spring of 2007 within the Accounting and Data Management Division of RCM and while the focus was to look at RCM data elements, it became evident that there were four types of data challenges to hurdle: 1) Industry Data Standards - a portion of the data elements were defined by either other industry organizations like NACHA or other lines of business, notably GWA; 2) Data Sharing Guidelines - a portion of the data elements were shared or needed by other lines of business, notably PM; 3) Interoperability Challenges - a portion of the data elements needed to be logically grouped together (hierarchical) and needed to either be linked to other data elements or to other groups (hub-and-spoke linked); and 4) Data Naming and Metadata - a portion of the data elements were neither defined by an industry organization or a specific line of business but yet were needed by two or more lines of business. These challenges, while expanding the initial scope of the RCM work, continued to become more apparent. Figure 17 Data Standardization Initial Challenges
The EDA team, through the use-of and implementation-of a disciplined metadata management framework, was able to, as risk-free as possible, diffuse some of the more nagging challenges by providing three services: 1) crafting the schema (technical documentation) for data exchanges based on open, industry standards and building helpful tools to navigate the schema files (supporting documentation); 2) implementing an open, industry standard method of capturing metadata like business definitions, terms, business rules, and relationships; and 3) laying the groundwork for the establishment of a formal governance group.
BFS Data Stewards | 2012 Annual Report
49
Data Steward Process Model This process model was approved by Data Stewards and contains both the responsible party and the level of activity involved in the tasks culminating in content being published to the Data Registry. The Data Stewards have periodically updated the process model to better reflect the current environment and to simplify the explanation of the tasks involved. Figure 18 Data Steward Process Model
50
BFS Data Stewards | 2012 Annual Report
Data Steward Information Repository Dimensional Model Recommendation 1.0 Recommendation (includes two supporting documents – a list of shortened terms and an enterprise bus matrix template) 1.1 In order to preserve the standardization investment FMS has made in the Data Stewards Committee, it is unanimously recommended that the fields available for BI reporting (Data Presentation area) be standardized 1.1.1 Fact: FMS has invested in the Enterprise Business Information Management program of which the Data Stewards, as the sole governance group for FMS, is responsible to manage the enterprise standardization of the data terms 1.1.2 Fact: to foster consistency and accuracy, any data term must be properly understood in context before it can become meaningful information. The data terms must be properly named, labeled, categorized and shared in context so that it can be reused. The reusability of a data term enables interoperability 1.2 To leverage the greatest possible reuse of data terms and to ensure interoperability among the data terms for BI reporting solutions, it is unanimously recommended to use a database as the platform for that standardization 1.2.1 Fact: the BI tool space is fragmented with niche solutions. FMS already has more than two BI vendors approved in TRM and the list of BI vendors will grow by two more – one more from FIR and another from DMS. 1.2.2 Fact: rather than rely on a BI tool to provide the standardized terms for information sharing, FMS gains more control by standardizing the name of the field that is accessed by the BI tools. The industry best practice is to use a Data Presentation area where database tables present the standardized data in standardized fieldnames for consumption by any BI tool. This approach provides greater flexibility to information repositories while still retaining control for information sharing. 1.2.3 Fact: not all databases support long-name fields which is why the group recommends an approach that systematically shortens the standardized terms. 1.3 In order to be systematic in the way different repositories access other repositories, it is unanimously recommended that all repositories provide a consistent method to access data that uses dimensional modeling 1.3.1 Fact: both the Kimball and Inmon techniques agree that Dimensional Modeling is an effective means to query and access data 1.3.2 Fact: Dimensional Modeling is a data management best practice that is not unique to a particular industry even though it had its origins in the retail industry. 1.3.3 Fact: the denormalized technique provides a platform for much simpler queries than querying normalized content 1.3.4 Fact: the denormalized technique provides a Data Presentation area that is tuned for query performance 1.4 As the combination of the preservation of standardized terms coupled with the use of Dimensional Modeling is vital to consistency, common understanding, interoperability, and information sharing, it is unanimously recommended that all repositories use, build, and make available Conformed Dimensional Models (share common dimensions) 1.4.1 Fact: as the Data Stewards assign business line responsibility to the standardized data elements, the business line responsibilities are preserved in the dimensional models 1.4.2 Fact: reusability drives interoperability and understandability. The more deviations to common terms, the higher the maintenance cost across the enterprise 1.4.3 Fact: the fact tables and dimensional models implemented within an information repository should be categorized within a Bus Matrix spreadsheet (a common template is provided). The purpose of the Bus Matrix is to provide simple access to the facts and dimensions implemented to provide clarity in meaning and foster reusability BFS Data Stewards | 2012 Annual Report
51
1.5 To implement upon the importance of Conformed Dimensional Modeling technique, it is unanimously recommended that all repositories implement the same set of shortened terms. 1.5.1 Fact: since the terms that have been standardized by the Data Stewards can be lengthy (for human consumption purposes), it becomes apparent that physical limitations within databases preclude the ability to use the standardized name as the name of the database table column name. 1.5.2 Dimensional Model Table Column Naming Principles (removed for brevity) 1.6 Creating Dimensional Models 1.6.1 The above recommendations apply to repositories that have already created their dimensional models (since these repositories greatly influenced this recommendation) or for repositories that are at the beginning phases of their construction 1.7 Updating existing methods 1.7.1 So as not to impede business line progress in their repositories, these recommendations are guidelines for already established non-conforming solutions. For these instances, all other repositories that request data from non-conforming repositories will have to map the non-conforming table column names to the conformed table column names. 1.7.1.1 The proliferation of non-conforming table column names should be controlled as every non-conforming data table column name is a cost to the other repositories 1.7.1.2 All shareable table column names that do not conform should continue to be monitored by a governance group, the Data Stewards. 1.8 The timing of the implementation for this recommendation will be determined by the individual information repositories and monitored by a governance group, the Data Stewards.
52
BFS Data Stewards | 2012 Annual Report
Annual Survey Worksheet and RACI Matrix The worksheet for the annual survey is available from the Data Stewards Document Repository. If you would like to download the worksheet, please contact your Data Steward. The work sheet contains all submitted data topics, descriptions, which line of business submitted them, stakeholders, and notes. The below diagram – the Responsible, Accountable, Consulted, and Informed (RACI) matrix - was completed by the Data Stewards for their planning purposes. This RACI matrix is the result of the Data Stewards annual survey of their leadership about matters related to data. All source information that led to this matrix have been saved in the Data Stewards Document Repository and can be downloaded by any legacy-FMS or legacy-BPD employee (with permission from their Data Steward) or preferrably contact your Data Steward for all requested documents. Figure 19 Annual Survey RACI Matrix
BFS Data Stewards | 2012 Annual Report
53
Data Standardization Artifacts Screen snapshots are here provided to better understand the Data Registry and the Data Exchange Factory. Both of these artifacts – Data Registry and Data Exchange Factory – originate from same source – the Financial Metadata Repository (FMR). The FMR manages the versioning, the business rules, the volumes of metadata, the relationships, and the attributions. Figure 20 Data Registry Index Page
54
BFS Data Stewards | 2012 Annual Report
Figure 21 Data Registry Profile Page
BFS Data Stewards | 2012 Annual Report
55
Figure 22 Data Exchange Factory Page
Figure 23 Data Exchange Factory Component Browser
56
BFS Data Stewards | 2012 Annual Report
Figure 24 Data Exchange Factory Change Log
Figure 25 Data Exchange Factory Mind Map
BFS Data Stewards | 2012 Annual Report
57