D R I V I N G I M P R O V E M E N T W I T H O P E R AT I O N A L R E S E A R C H A N D D E C I S I O N A N A LY T I C S
SPRING 2019
ANALYTIC MODELLING CONTRIBUTES TO IMPROVING ENGLISH AND WELSH WATERWAYS Reliability analysis proves to be an asset
WIND FARMS DESIGNED USING OPTIMISATION
© Canal & River Trust
Millions of euros can be saved by a smarter use of resources
REDUCING NOROVIRUS TRANSMISSIONS System Dynamics gives insight to policy makers
E D I TO R I A L I enjoy walking along canals – the hills are not too steep. I’m therefore delighted to lead off this issue with an account of how reliability theory has proved useful in helping the Canal & River Trust understand their asset risks, allowing improved prioritisation of their annual expenditure as they maintain, manage and care for the waterways. Whilst I’m attracted to canals, I’m not attracted to norovirus. Avoiding its transmission is important. The work of David Lane and colleagues suggests that reducing foodborne transmission by 20% would result in a 9% reduction of norovirus infection but reducing person to person transmission has a much greater impact. In this case a 20% reduction would bring down the infection by almost 90%. The research reported here could lead to a better understanding of how to reduce the impact of a highly unpleasant infection. Was Trump’s crowd bigger than Obama’s crowd at their inaugurations? Many will remember the brouhaha. What may not be widely known is that the science behind the headlines to which Donald Trump took so much exception was provided by Manchester academic Keith Still. He made the headlines three times in one week, culminating in the New York Times ‘Crowd Scientists Say Women’s March in Washington Had 3 Times as Many People as Trump’s inauguration’. You can read about it in this issue. It’s good to see that O.R. and analytics continue to make an impact in many different environments. I hope you enjoy reading these stories and the others, and not just when you are cruising gently down a canal. Electronic copies of all issues continue to be available at https://issuu.com/orsimpact. For future issues of this free magazine, please subscribe at https://www.theorsociety.com/what-we-do/publications/magazines/ impact-magazine/. Graham Rand
The OR Society is the trading name of the Operational Research Society, which is a registered charity and a company limited by guarantee.
Seymour House, 12 Edward Street, Birmingham, B1 2RX, UK Tel: + 44 (0)121 233 9300, Fax: + 44 (0)121 233 0321 Email: email@theorsociety.com Secretary and General Manager: Gavin Blackett President: John Hopes Editor: Graham Rand g.rand@lancaster.ac.uk Print ISSN: 2058-802X Online ISSN: 2058-8038 www.tandfonline.com/timp Published by Taylor & Francis, an Informa business All Taylor and Francis Group journals are printed on paper from renewable sources by accredited partners.
OPERATIONAL RESEARCH AND DECISION ANALYTICS Operational Research (O.R.) is the discipline of applying appropriate analytical methods to help those who run organisations make better decisions. It’s a ‘real world’ discipline with a focus on improving the complex systems and processes that underpin everyone’s daily life – O.R. is an improvement science. For over 70 years, O.R. has focussed on supporting decision making in a wide range of organisations. It is a major contributor to the development of decision analytics, which has come to prominence because of the availability of big data. Work under the O.R. label continues, though some prefer names such as business analysis, decision analysis, analytics or management science. Whatever the name, O.R. analysts seek to work in partnership with managers and decision makers to achieve desirable outcomes that are informed and evidence-based. As the world has become more complex, problems tougher to solve using gut-feel alone, and computers become increasingly powerful, O.R. continues to develop new techniques to guide decision-making. The methods used are typically quantitative, tempered with problem structuring methods to resolve problems that have multiple stakeholders and conflicting objectives. Impact aims to encourage further use of O.R. by demonstrating the value of these techniques in every kind of organisation – large and small, private and public, for-profit and not-for-profit. To find out more about how decision analytics could help your organisation make more informed decisions see www.scienceofbetter.co.uk. O.R. is the ‘science of better’.
ANALYTICS SUMMIT 2019 ANALYTICS SUMMIT 2019
Thursday 13 June IET, London WC2R 0BL #AS19 TopicsWC2R include: #AS19 brings together speakers and exhibitors from the very Thursday 13 June IET, London 0BL #AS19 cutting edge of analytics to deliver a one-day event that is a one-stop shop for learning about how big data and analytics #AS19 brings together speakers and exhibitors from the very are shaping the future of organisational decision-making. cutting edge of analytics to deliver a one-day event that is a Filled withshop casefor studies, workshops and peer networking, one-stop learning about how big data and analytics the Analytics Summit is not to be missed. are shaping the future of organisational decision-making.
■ Explainable AI ■ User-centred analysis and visualisation Topics include: ■ Ethics and fairness in AI ■ ■ Movement insights ■ with ■ speakers from ■ ■ Women in Data UK ■ QuantumBlack with speakers from ■ Movement strategies ■ ■ Bath Business Improvement District ■ ■ Consultancies ■ ■Early Bird Rates until 30 April ■ Early bird
Standard rate
OR Society and until £15030 April £200 Early Bird Rates RSS Member Early bird Standard rate Non-member
BOOK YOUR PLACE www.theorsociety.com/AS19 The OR Society is a registered charity no.313713 and company limited by guarantee no.663819 theorsociety.com
BOOK YOUR PLACE www.theorsociety.com/AS19
£175 £150
£225 £200
£175
£225
CO N T E N T S 5
IMPROVING THE WATERWAYS OF ENGLAND AND WALES Ian Griffiths and Sheena Wilson explain how analytic modelling has helped the Canal & River Trust improve prioritisation of its £100m+ plus annual expenditure
12
SPECTRAL ANALYSIS Brian Clegg gives insight as to how an auction, underpinned by optimisation, repurposed 84 megahertz of spectrum and generated gross revenue of nearly $20 billion
19
CROWD SCIENCE AND CROWD COUNTING G. Keith Still tells us how he counted crowds in Times Square at New Year and at the Trump inauguration and why this is so important for crowd safety
30
IMPROVING PROFITABILITY OF WIND FARMS WITH OPERATIONAL RESEARCH Martina Fischetti explains how Vattenfall, a leader in the wind energy business, uses O.R. techniques to design new offshore farms, leading to savings of millions of euros
35
FORCE MULTIPLIER Andrew Simms informs us of the analytical work being done by military training specialist NSC to ease the Ministry of Defence’s decision-making burden
39
4 Seen Elsewhere
Analytics making an impact
10 The Data Series – data
democratization Louise Maynard-Atem discusses the value of making digital data available to all employees within an organisation
17 Universities making an impact
Brief reports of two postgraduate student projects
25 Making sense of big data using
cluster analysis Duncan Greaves explains how cluster analysis can leverage large data sets to discover value and enhance your organisation’s insight
45 Inter-ocular Analytics
Geoff Royston’s focus is on the use of data-based diagrams and charts to impart understanding and inform decisions and actions
GOING VIRAL Brian Clegg describes work for the UK’s Food Standards Agency by O.R. researchers modelling the transmission methods of norovirus to better target interventions
DISCLAIMER The Operational Research Society and our publisher Informa UK Limited, trading as Taylor & Francis Group, make every effort to ensure the accuracy of all the information (the “Content”) contained in our publications. However, the Operational Research Society and our publisher Informa UK Limited, trading as Taylor & Francis Group, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by the Operational Research Society or our publisher Informa UK Limited, trading as Taylor & Francis Group. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. The Operational Research Society and our publisher Informa UK Limited, trading as Taylor & Francis Group, shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Reusing Articles in this Magazine
All content is published under a Creative Commons Attribution-NonCommercial-NoDerivatives License which permits noncommercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
SEEN ELSEWHERE In a 1976 book, Thirteen Against the Bank: The True Story of a Man who Broke the Bank at the Roulette Table with an Infallible System, Norman Leigh claimed to have achieved the impossible and devised a system to consistently return a profit from playing roulette. It sounds too good to be true, and indeed it is, according to Graham Kendall of Nottingham University in an article in Significance (December 2018, pp. 26– 29). Using computer simulation, Kendall concludes that the book ‘is a work of fiction – which is a shame as it is a very nice story – and that the system it describes cannot, and does not, consistently return a profit’. © art-sonik /Shutterstock.com
AI HELPING ROMANCE?
Lunch Actually founder Violet Lim is banking on artificial intelligence to help keep the flame of romance alive. In a partnership with Singapore
4
IMPACT © THE OR SOCIETY
Management University and AI Singapore, a ‘romance’ chat bot called Viola has been developed that possesses the ability to send reminders of important occasions such as birthdays, offer personalised restaurant and activity suggestions and dispense relationship advice to potential courting couples. Viola is being fed with 1.1 billion data points and it will continue to draw on romance expertise from humans on a panel of experts. More at: http://bit.ly/romanceAI
© gabrienil/Shutterstock.com
BREAKING THE BANK?
DO JET SKIS SHORTEN SPEECHES
Comedian and talk show host Jimmy Kimmel hosted the Oscar ceremonies in 2017 and 2018. After a ceremony nearly 4 hours long in 2017, it seemed that the usual tactics of mic cuts and music cues were failing to temper the ever-lengthening speeches. So, for the 2018 ceremony, Kimmel offered a jet ski to the person with the shortest acceptance speech (timewise). However, the total runtime in 2018 actually increased by 4 minutes. But was there a significant change in speech word-counts when compared pairwise over all the transcribed 2017 and 2018 speeches in the database? It appears not. A statistical test comparing 24 speeches showed no significant change. On average, the speeches decreased by only 7.6 words year-on-year. (See Significance February 2019, pp. 24–27)
BUSINESS ANALYTICS AT NEWCASTLE UNIVERSITY
Newcastle University Business School recently welcomed Ian Griffiths, co-founder of whocanfixmycar.com, to help launch a new undergraduate module in business analysis and analytics which sees students working as teams of consultants on a live client project to build and pitch a prototype business analytics system to the company. Commenting on the new offering, module Leader, Dr Rebecca Casey, spoke of ‘creating opportunities for our students to solve real business problems in order to develop key employability competencies in business analysis and business analytics, which are sought-after skills’. More at http://bit.ly/2E55Soe
© Canal & River Trust. Photography: 360 Imagery
I M P R OV I N G T H E WAT E R WAYS O F E N G L A N D A N D WA L E S
IAN GRIFFITHS AND SHEENA WILSON
THE CANALS AND WATERWAYS OF ENGLAND AND WALES provide spaces and an environment that are beloved by the public, and enjoyed regularly by millions of people. They are undoubtedly a national asset. However, it was not always this way. Originally, they were key transport arteries within
the country and even more important with the economic explosion of the industrial revolution. They then entered a period of decline after the Second World War as they were abandoned and unloved. It took major efforts from dedicated people to save them and return them to their former glory, and
IMPACT © 2019 THE AUTHORS
5
their continued transformation provides a wonderful recreational resource to many.
THE CUSTODIAN OF THE WATERWAYS
© Canal & River Trust.
This national asset is made up of tens of thousands of structures and hundreds of miles of channels, paths and banks, each one of them an asset that needs to be cared for and maintained. This is the job of the Canal & River Trust. It was formed as a charity in July 2012 from British Waterways. Its mission is to ‘protect, manage and improve the nation’s canals and river navigations for the millions who enjoy them’. The Trust is the guardian of more than 2000 miles of historic inland
6
IMPACT | SPRING 2019
waterways across England and Wales. It maintains the nation’s third largest collection of listed structures (more than 2700), World Heritage Sites, Scheduled Ancient Monuments and Sites of Special Scientific Interest (SSSIs), as well as reservoirs, embankments and cuttings and many important wildlife habitats. The value of its assets has been estimated at £18 billion. The Trust is responsible for an eclectic range of assets, ranging from bridges to boat lifts to pumping stations, from mile posts to docks to historical assets such as pill boxes and heritage cranes, from locks to reservoirs to visitor centres, from aqueducts to weirs to stop gates and sluices. It also has to look after environmental habitats in its care,
including populations of bats and voles, as well as green spaces. The water in its canals and rivers is not just used for hosting boats – it is also used for cooling and in fire protection systems, and even drinking: if you have ever had a glass of water supplied by Bristol Water it will most likely have travelled along one of its canals.
WHAT IS INVOLVED
Waterways contribute to the economic, social, environmental and cultural wellbeing of the nation. The Trust’s waterways and museums already attract 4.3 million regular users, and it estimates they are visited by 18 to 20 million people each year. This includes boaters, of course, but also walkers, fisherman, cyclists, commuters and many more. But the Trust believes they can be used by many more people from a wide range of communities. It knows that to achieve this, it needs to provide a sustainable attractive, reliable and safe environment for the public. Maintaining the canals and waterways requires significant investment, and the Trust spends more than £100 million each year on its network. To do this, it relies on a Government grant, commercial fees, donations and other sources of income. It has a duty to manage its assets in the best possible way to realise the benefits and values of waterways for users and stakeholders. Moreover, the Trust is transforming itself from an organisation that manages waterways to one that improves the wellbeing of the nation, that is from a Waterways Trust to a Wellbeing Trust.
THE RECENT PAST
On its formation, the negotiation with Government came with a number of agreed Key Performance Indicators (KPIs) that the Trust has to hit to maintain an initial 15-year grant. The grant continues to 2027 and a renegotiation of the grant is planned around 2022. The Trust knows it has to show how it has managed its stewardship of the waterways with a high level of excellence in order to secure a positive renegotiation with Government. The portfolio of assets the Trust has to maintain is large and extremely diverse. Some of it is highly specialised or requires sympathetic care because of its historic nature. This means that the cost of maintenance is high and would exceed the available budget without judicious targeting of activities. This was a challenge because it was difficult to prove the relative importance of different asset types and the comparative impact of assets degrading and failing. The Trust acknowledges that, historically, it had been working with the view that if it did not have enough money to maintain all its assets then it could at least console itself with understanding the condition of them by inspecting them regularly, perhaps too often, to compensate. The Trust realised that this approach would not be sustainable in the long term.
BRINGING IN THE MODELLERS
In the past, the Trust used a simple approach of grading assets in five subjective levels from good to poor, which were used to broadly prioritise interventions. It also used another prioritisation approach based on a
notification system, where higher numbers of notified issues were more likely to result in an intervention on that asset. This dual approach led to inefficiencies within an asset class and there was no opportunity to trade-off between asset types. It also did not allow for strategic planning. The Trust wanted to apply modern condition-based assessments and reliability centred maintenance. New blood in the organisation saw the opportunity to pull through expertise from other domains, such as the regulated utilities which have developed sophisticated modelling approaches. decisionLab is a player in both the water and power distribution sectors, where it helps companies optimise their investments, and it was selected by the Trust to support its efforts.
The modellers’ focus was on how to convert the condition assessment data and expert opinion into probabilities of failure and degradation rates
THE CHALLENGE
decisionLab’s aim was to enable the Trust to understand the true health of its network and all its assets – both now and projecting into the future – and transform the way it manages them. The Trust wanted an approach that was meaningful to its engineers and justifiable to stakeholders requiring it to have a solid technical basis. It had to be practical in terms of the resource demands and providing outputs that could be used to create intervention plans and investment strategies. It wanted a methodology that was consistent across all asset
classes to facilitate adoption across the organisation and support optimal asset management across the full portfolio. Importantly, it wanted to be able to do more strategic planning, developing long-term asset plans that ensured that the Trust could truly manage risk in a sustainable way and based on a solid foundation. decisionLab also felt that the approach developed had to be ownable by the Trust, both at a senior and an operational level, to maximise the chance of success and enable the Trust to be self-reliant and sustain longevity of the capability. This presented a sizeable set of challenges to the modellers. The desire for a consistent approach that could be applied to the extraordinary variety of asset types: numerous types including structures (cranes, bridges, etc.), buildings, mechanical and electrical equipment, waterways and reservoirs, culverts, weirs and many more; ranging in ages of 200+ years to modern Supervisory Control and Data Acquisition (SCADA) equipment; localised assets as well as linear assets; and very many of a bespoke nature. There were also numerous causes of failure, with each of these potentially having a different consequence and intervention for the same asset type. One of the biggest challenges was a lack of quantitative data both on condition and failure, as the Trust relies on inspection assessments of assets, which have generally been qualitative in nature. However, what the Trust lacked in data it had an abundance of engineering expertise, and the modellers used this to drive the solution, namely to use an engineering-led approach that translated expert judgement into a mathematical framework.
IMPACT | SPRING 2019
7
© Canal & River Trust.
COLLABORATION BETWEEN ENGINEERS AND MATHEMATICIANS
From the beginning, the modellers and engineers worked together. It started with a pilot, covering three asset types: brick arch bridges, culverts and lock gates. This provided enough data within each asset type for the team to develop the modelling approach, and sufficient range to ensure the methodology created was consistent and flexible. An asset lead took responsibility for each asset class within the Trust, and provided engineering expertise to the modellers and applied the prototype models they developed to their assets and the data available. decisionLab looked at what others had done. The electrical distribution industry has a wellestablished modelling approach that is underpinned by a wealth of data, but this need for extensive data meant it was not well suited to application in canals. The Highways Agency has commissioned work on conditionbased assessment of bridges and other structures. This seemed much more promising. However, when the team
8
IMPACT | SPRING 2019
applied it to the pilot assets, the results were disappointing mainly because it provides two measures – a worst case value and an overall optimistic assessment – neither of which matched the experts’ grading. The modellers did adopt the assessment approach used in the Highways Agency work: a simple two-dimensional matrix for each major component of the asset, where the severity of a degradation feature and its extent were each scored. This was intuitive to the engineers and provided a quantifiable input into the model.
DEVELOPING A COMMON METHODOLOGY
The modellers decided to go back to technical basics, using reliability/ survival theory to underpin the approach, both for the health assessment and future health prediction. The modellers’ focus was on how to convert the condition assessment data and expert opinion into probabilities of failure and degradation rates. They initiated this by asking the engineers their opinions on what was the relative importance
of different features and factors, hypotheses about the numbers of failures if interventions were not carried out, and the impact of failure events. The engineers found this challenging; however, these were initial values to seed the mathematical modelling. The model would then estimate the condition of each individual asset, and predict its future degradation, its probability of failure in any year and the cost of failure. When the model is applied to all the assets of a particular type, it provides estimates on expected numbers of failures in any one year, as well as other metrics, that can be compared with reality and enable the engineer to calibrate the model. Now that the model could estimate the probability of failure of an asset both now and any time in the future, the engineers could then express what the triggers would be for them to intervene on the asset. These thresholds were specified against the two-dimensional severity-extent matrix, which could then be translated into the quantities used by the model. The team developed simple cost models for intervening and also priced the impact of a failure in terms of safety related incidents, repair or replacement, the disruption to the network, environmental damage and compensation to other organisations. This monetised risk-based approach is used in the electricity distribution industry and other sectors, and provides a means of trading off different investments, and the team found it effective here. The asset leads have been using the model to develop their asset strategies, which involves understanding the current health of the assets, predicting how they will degrade over time, and then intervention strategies matching different budget scenarios, including doing essential works only, the
unconstrained case where money is no object, a budget limiting case and, most usefully, where the level of risk within the asset portfolio is kept within a set limit. The engineers use the model for all of these analyses and to support them in developing their strategies.
This has been a culture shock for the Trust’s engineers
WHAT DIFFERENCE HAS IT MADE?
This has been rather a culture shock for the Trust’s engineers. At the start, there were some mutterings around the business that it could not model the assets as it would be too difficult. Some felt that ‘we have 200-year-old assets’, with no ‘as built’ drawings, and poor, or some cases non-existent, information on historic interventions. Engineers did not like being forced to guestimate and they did have to make many well-educated guesses at the start of their model journey as they were ‘asked to use your engineering judgement’. But in the end, the engineers have come to take ownership of their models. Indeed, there was a conscious decision to train all those involved in the models to understand the maths behind the approach so that they are not seen as black boxes. The common feeling now is that a huge amount of subjectivity has been taken out of the condition assessments.
ten asset classes, including 3200kms of the canal network itself. But already the work is having an impact: the Trust has been able to optimise the inspection frequency for culverts and other assets are sure to follow suit; inspections are now being designed with digital data collection instead of word reports; and the Trust is beginning to be able to stop collecting data it does not use, and start collecting data it now needs. It also means that formal asset strategies can be produced with confidence whereas it had never been possible before. Richard Parry, Chief Executive of the Canal & River Trust, says ‘the Asset strategy and model work is delivering a step change in the Trust’s understanding of our asset risks, enabling the Trust to improve its prioritisation of its £100 million plus annual expenditure to maintain, manage and care for the waterways so that we can make life better by water for the millions of people who use, visit and live alongside them.’
the Trust is beginning to be able to stop collecting data it does not use, and start collecting data it now needs
the Asset strategy and model work is delivering a step change in the Trust’s understanding of our asset risks, enabling the Trust to improve its prioritisation of its £100 million plus annual expenditure to maintain, manage and care for the waterways so that we can make life better by water for the millions of people who use, visit and live alongside them
This is a journey that is not yet complete – it has been extended to
The journey the Trust and decisionLab have been on has
been exceptionally beneficial for both organisations. The Trust has knowledge that was never available before and is able to plan both operationally and strategically in ways that were previously impossible, and it is feeding into the business planning at all levels of the organisation. decisionLab has developed a process and a model that has very wide applicability, and the approach is currently being piloted on a large naval facility in the UK for a different client. The modellers have also gained a huge respect for Victorian engineers, and the heritage they have left our country for us to benefit from. Ian Griffiths is the Chief Strategist at decisionLab. He has more than 25 years’ experience in science and technolog y modelling, leading R&D and analytics groups in both government and industry, covering applications in defence, security, transport, insurance, power and infrastructure. He is passionate about using modelling and data science to inform and improve decision making in organisations and businesses. Sheena Wilson is an Incorporated civil engineer with over thirty years’ experience with Canal and River Trust (formally British Waterways). The first fifteen years working on the canal infrastructure - building pumping stations and weirs and repairing numerous heritage structures with the last fifteen in asset management specialising in data, systems and processes. She has a love of the canals stemming from canal boat holidays with the family, canoeing and more recently cycling and walking.
IMPACT | SPRING 2019
9
T H E DATA S E R I E S – DATA D E M O C R AT I S AT I O N Louise Maynard-Atem
WHAT IS DATA DEMOCRATISATION?
Following on from the first article in my data series about the evolution of programming languages, I want to talk this time about a topic that is increasingly more relevant in both a professional and personal context, data democratisation. Many will be familiar with the statistic that over 90% of all data that exists in the world today was created in that last two years; according to the World Economic Forum, the production of data will reach 40 zettabytes in the next 2 years – zettabyte being 270 bytes. I recently spoke at the Future Technology Summit, as part of the Ingenuity19 event, at Nottingham University about the future of data and what trends would significantly impact the volume of data produced. More and more, we are adding sensors to devices and connecting them to the internet, so the key trend I discussed was the Internet of Things as this is where the significant volumes of data will most likely originate from in the not so distant future. The unprecedented amounts of data coupled with the relative shortage of talent in analytical fields makes the argument for making digital data available to all employees within an organisation, so that they can make better and more informed decisions, an increasingly compelling one. This concept is the basis of data democratisation within a business context.
WHAT ARE THE BENEFITS AND CHALLENGES?
Historically data has been ‘owned’ by the IT department, and all other employees have had to go via this team to access a business critical resource. Data democratisation
10
IMPACT © 2019 THE AUTHOR
presents a potential opportunity to improve their decisionmaking capability, and reach new heights of performance: • Removing gatekeepers to the data that often create bottlenecks within the company structure; • Employees would be more empowered to make decisions as the barrier to access is removed; data democratisation also means providing education around what the data is (and perhaps more importantly, isn’t) such that the barrier to understanding is also removed; • Leads to faster decision-making and more agile, data-driven teams; giving employees an increased sense of ownership and responsibility at a localised level, as well as at a company-wide level; • Customer experience can be greatly improved by access to data at every level of an organisation, and companies that do this well often end up differentiating themselves from their competition. Although the opportunities that data democratisation presents are vast, there are a number of concerns around the following areas: • Misinterpretation of data by non-technical staff which may lead to poor decision-making; • Increased security risk, due to more people having access to data – an increased attack surface and points of potential compromise; • Duplication of effort across multiple teams due to lack of a centralised analytical function.
Implementing data democratisation means making considerable investments in terms of budget, software and training
HOW DO WE ACHIEVE IT?
Implementing data democratisation means making considerable investments in terms of budget, software and training. There needs to be a clear strategy and measurable success factors as to how data democratisation will impact the company culture and bottom-line; relevant KPIs need to be set up in order to track the progress and impact of data democratisation on a company. Breaking down information silos and linking the data to create a 360° view of customers is a significant transformation undertaking from both a people and software perspective. Increased access to data
means well defined data quality responsibility is required to ensure that appropriate and defensible decisions can be made off the back of this data. A data governance strategy should be implemented to ensure best practise. Data visualisation becomes of vital tool, particularly for non-analytical staff, giving users the ability to work independently to extract insights that are relevant to their areas.
CASE STUDY: WWW.HARBR.GROUP
There is a wealth of companies who are trying to help companies implement a successful data democratisation strategy, but the example I’ll cover here is a start-up called HARBR. Their mission is to provide a platform where users can easily distribute and control their data, as well as providing a space for analysis, collaboration and the sharing of ideas. What I particularly like about the HARBR offering is their Store concept. This allows users to subscribe to data feeds, both internal and external, and use the data (dependant on permissions) for as long or as short a time as is necessary. The look and feel of the Store is very much inspired by the app stores we’ve all become so accustomed to, so using it is already very intuitive. The idea of the store also starts to touch on the idea of data monetisation, which will be discussed in a subsequent article.
Increased access to data means well defined data quality responsibility is required to ensure that appropriate and defensible decisions can be made off the back of this data
WHAT DOES DATA DEMOCRATISATION MEAN IN A CONSUMER CONTEXT?
So far, the discussion has focussed on how organisations can benefit from making data available to all employees,
but how does this impact our personal lives? Consumers are increasingly aware of the value of their data, and expect a return on the provision of personal information, whether that is: • An expectation that the data will be handled, stored and used in a secure and appropriate fashion; • Consent for and control over how that data is treated, and which third parties it is shared with; • A value exchange, whereby people share their data in order to receive a tangible reward.
GDPR has already started to put the power back in the hands of the individual, but there is still no way for people to know what information is held about them, who has access to that information, and what that information is being used for
GDPR, and other legislation that will invariably follow suit, has already started to put the power back in the hands of the individual, but there is still no way for people to know what information is held about them, who has access to that information, and what that information is being used for. In an op-ed for Time Magazine, Tim Cook recently called for a registry for data brokers that buy and sell data from third parties. Such a register would be incredibly challenging to implement, but it would be the first step towards data democratisation from a consumer perspective, with the next steps being to allow people to analyse their own data and make decisions based on that analysis. Louise Maynard-Atem is an innovation specialist in the Data Exchange team at Experian. She is the co-founder of the Corporate Innovation Forum and an active member of the OR Society. She is also an advocate for STEM activities, volunteering with the STEMettes and The Access Project.
IMPACT | SPRING 2019
11
© Sasin Paraksa/Shutterstock.com
S P E C T R A L A N A LYS I S BRIAN CLEGG
12
IMPACT © THE OR SOCIETY
USE OF MOBILE DATA IS SOARING. By 2017, there were over 7.6 billion mobile connections worldwide. With 4G widely available and 5G emerging, mobile communications put huge demands on the limited radio bandwidth available. As a result, eyes are being cast on the wavelengths allocated to television. It’s hard enough choosing which TV channel to watch – but rearranging US TV stations to release parts of the broadcast spectrum for mobile data provides a whole different level of challenge.
RESHUFFLING THE US AIRWAVES
The Federal Communications Commission (FCC), regulator for US communications, took on the challenge of reshuffling frequency allocations to TV stations, freeing up space for mobiles. The specific focus was the top end of the 600 MHz TV band, ideal for data networks as it is contiguous with existing mobile bands and has excellent range and ability to penetrate walls. As Gary Epstein, former chair of the FCC’s taskforce put it, this is ‘prime beachfront property for mobile broadband services.’
A team from the FCC and a number of US and Canadian universities took on the challenge, supported by the Smith Institute – a mathematical decision support consultancy based at Harwell in Oxfordshire, experts in spectrum assignment. For Smith Institute CTO Robert Leese and his colleagues, the challenge was to ensure the mechanism freed up bandwidth at the least cost, while verifying the optimisation tools used to enable broadcasters to continue their business, avoiding interference between stations. Leese has a degree in mathematics and a PhD in physics. His experience of O.R. began at the Smith Institute: ‘My involvement with O.R. developed through becoming involved in resource allocation problems for radio spectrum, in particular what is generally called the channel assignment problem. This was in the days before spectrum auctions were common, when spectrum management was almost entirely an engineering problem. I realised that the branch of mathematics known as graph theory was highly relevant, and this led to a fruitful line of research. O.R. techniques are now used routinely in channel assignment.’
GOING, GOING GONE
The main vehicle used by the FCC was an auction. It’s easy to think of auctions as simple marketplaces, but they perform a more sophisticated role. An auction provides information on price sensitivity: it tells us what individuals or companies are prepared to pay. This is why auctions provide an effective means to divide up the airwaves, bringing in large amounts of cash for governments. But the FCC’s requirement added in the complexity
of channel assignment. The auction process had to both manage the supply of channels, enabling TV companies to operate, and distribute the freed-up bandwidth to mobile networks. This ‘Broadcast Incentive Auction’ process was iterative, with two components. First came a so-called reverse auction, which determined how much TV stations would accept to relinquish one or more channels. In effect, it involved TV stations playing a game of chicken. The FCC initially set prices to give up a channel. Leese: ‘Initial levels were set taking into account the coverage areas and coverage populations of each TV station. So, stations covering large areas of urban population would see high initial prices in the reverse auction. Another consideration was to set the prices at levels which would be sufficiently attractive to encourage high participation levels from the broadcasters’. During each round of bidding, the prices offered dropped. Stations could either accept or reject the offer. If they rejected it, they dropped out of the auction and were assigned a channel in the newly established band. If they accepted, they stayed in the auction and the amount offered continued to decrease. At the point there was no longer a channel available for a bidder, the amount offered was the amount the station would be awarded – provided there was funding to cover this.
sophisticated algorithms were used to assess the ability to rearrange the broadcast spectrum and optimise it, ensuring that stations did not interfere with each other
During this stage, sophisticated algorithms were used to assess the ability to rearrange the broadcast spectrum and optimise it, ensuring that stations did not interfere with each other. This could include both stations within the US and across the border in Canada. With sufficient bandwidth freed up, the second part of the process was reached: a ‘forward’ auction amongst mobile networks. Here, prices went up each round until demand dropped off. At this point, the income raised by the forward auction was compared with the funding required to access TV channels. If the bidding was too low, the whole stage was repeated with lower targets for the amount of bandwidth to be released. This process was reiterated until a successful outcome was reached.
THE OPTIMISATION PROBLEM
The most complex aspect of the process was optimising the rearrangement of the spectrum to identify whether there were sufficient channels available for the bidding TV stations. Many optimisation problems are so complex that all we can hope for is a good solution, rather than the best. A familiar example is the ‘travelling salesman’ problem – optimising the route between a number of destinations on a journey. For anything other than a trivial situation, optimisation of this kind of problem can be too complex to complete in a feasible timescale. Thankfully, techniques have been developed to approximate quickly to the best solution. The FCC’s aim was to provide a new, non-interfering channel for each TV station that wanted one, while freeing up some bandwidth. TV
IMPACT | SPRING 2019
13
by edges which indicate which stations would interfere should they be given the same channel. The problem then became to colour in the nodes in such a way that no directly connected nodes had the same colour, while minimising the number of colours used. The more colours required, the more channels that had to be assigned. Like the traveling salesman, this problem can be impossible to truly optimise. It doesn’t help that radio can cause interference over significant distances, depending on the power of the transmitter and the terrain. In the US, the maximum distance between a pair of potentially interfering stations is a remarkable 420 km. There are also a huge number of nodes – 2900 for the US and Canada, with on average 34 edges connecting them to other nodes, as can be seen in Figure 2. Things get particularly difficult in dense urban
© Smith Institute
stations which did not want to lose a channel had to be offered one which was in the same frequency band as the one they previously held – the US TV spectrum is divided into three bands of 6 MHz channels: UHF (channel 14 and above), High-VHF (channels 7–13) and Low-VHF (channels 2–6). Of course, it was possible that no station would want to give up their channels, but in the US, broadcast TV viewership is small compared with cable and satellite, suggesting (as proved to be the case) that TV companies would relinquish bandwidth for financial incentives. The algorithm to make safe channel assignments used the structures mathematicians call graphs – a network of points known as nodes connected by lines called edges, as illustrated in Figure 1. Here, the graph consisted of a node for each TV station, connected
FIGURE 1 ILLUSTRATING THE GRAPH COLOURING PROBLEM: PAIRS OF NODES JOINED BY AN EDGE MUST RECEIVE DIFFERENT COLOURS
14
IMPACT | SPRING 2019
areas where there tend to be more stations and it was recognised that some TV channels would have to edge into the mobile band. As is usual in optimisation, the algorithm had to deal with two types of requirement, constraints and objective functions. Constraints are unbreakable requirements, where objective functions are nice to haves. Imagine, for example, that you are trying to get the best possible car for your money – a decision most of us would like to optimise. Constraints might be that the car has to be available, affordable, safe and legal. Objective functions might relate to the car’s comfort, performance, prestige and appearance.
JUGGLING THE POSSIBILITIES
Over and above the constraints of minimising interference and an upper limit on channels – initially 50 were available – a number of objective functions were included in the model, reflecting FCC policies and goals. To prioritise these, objective functions were successively converted to constraints between optimisations, ensuring that the most important goals were achieved. The priority order was to protect existing coverage, then deal with border issues, support requests from broadcasters (for example staying in a band), stay within the amount of interference acceptable to the mobile bands and minimising the need to relocate stations to new channels (changes usually require new broadcasting equipment or the replacement of antennas). It is one thing to say that optimisation involves colouring nodes on a graph – another to establish what form that graph should take. The structure was derived from an FCC model that divided the US
© Smith Institute FIGURE 2 PAIRWISE CONSTRAINTS REQUIRED BETWEEN STATIONS IN THE US AND CANADA TO PROTECT THEM FROM INTERFERENCE
into around 2.5 million 2-km square cells, then predicted the signal strength of each station in cells within its range, determining potential for interference. There was also a objective function to ensure, as much as possible, that anyone who received a particular station could continue to do so. Not surprisingly, the optimisations made during the auction process were at the limit of practicality for a timescale to fit the auction process. Over 100,000 feasibility checks were required at each step. The best off-the-shelf solutions dealt with 80 per cent of the problems required in 100s – too slow to be effective. A bespoke piece of software known as SATFC was developed at the University of British Columbia, tasked with rapidly finding a feasible solution (should there be one). This used machine learning to home in on the best solving approaches, managing around 95 per cent of test problems in less than 10s.
One key feature of the auction design was that participation for the broadcasters should be as easy as possible
A SUCCESSFUL OUTCOME
Despite its complexity, the auction was successfully completed by February 2017. One essential here was getting stakeholder buy-in. Leese explains: ‘The network operators generally had prior experience of participating in spectrum auctions, and the mechanism in the Broadcast Incentive Auction for them was not very different to what they were already familiar with. The TV broadcasters were in a completely different situation, and the FCC spent a great deal of time with them to make sure that they understood the process and that
all their concerns were addressed. One key feature of the auction design was that participation for the broadcasters should be as easy as possible. They were never asked to select from more than three options at a time. Another key feature was that broadcasters were free to drop out of the process at any time (or not to participate in the first place), safe in the knowledge that they would not end up in a materially worse situation with regard to interference than their situation before the auction.’
the start date for the auction had been set well in advance, so no slippage in timescale was possible and there was no room for things not to work perfectly first time
IMPACT | SPRING 2019
15
television industry and over $7 billion for federal deficit reduction. And the FCC is now looking at how the O.R. techniques developed here and how the two-sided auction design used for the Incentive Auction can be applied to other spectrum bands to ensure that they are being used efficiently.’
the FCC is now looking at how the O.R. techniques developed here and how the two-sided auction design used for the Incentive Auction can be applied to other spectrum bands to ensure that they are being used efficiently
00
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias.
00
JOURNAL OF SIMULATION
Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias. Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius. Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
00 of Simulation (JOS) aims to publish both articles and technical notes from researchers and Journal 00 practitioners active in the field of simulation. In JOS, the field of simulation includes the techniques, tools, methods and technologies of the application and the use of discrete-event simulation, agent00 based modelling and system dynamics. We are also interested in models that are hybrids of these JOS encourages theoretical papers that span the breadth of the simulation process, approaches. 00 including both modelling and analysis methodologies, as well as practical papers from a wide 00 range of simulation applications in domains including, manufacturing, service, defence, health care and general commerce. JOS will particularly seek topics that are not “mainstream” in nature but 00 interesting and evocative to the simulation community as outlined above. 00 Particular interest is paid to significant success in the use of simulation. JOS will publish the 00 methodological and technological advances that represent significant progress toward the application 00 of simulation modelling-related theory and/or practice. Other streams of interest will be practical applications that highlight insights into the contemporary practice of simulation modelling; articles that are tutorial in nature or that largely review existing literature as a contribution to the field, and articles based on empirical research such as questionnaire surveys, controlled experiments or more qualitative case studies.
THE EUROPEAN JOURNAL OF INFORMATION SYSTEMS
Joint Editors Christine Currie, University of Southampton, UK John Fowler, Arizona State University, USA Loo Hay Lee, National University of Singapore, Dov Te’eniSingapore VOLUME 00
T&F STEM @tandfSTEM
@tandfengineering
NUMBER 00
Explore more today… http://bit.ly/2Gg9Zv9 MONTH 2018
16
IMPACT | SPRING 2019
EJIS
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius.
Franz Edelman Award for Achievement in Advanced Analytics, Operations Research and Management Science and the 2018 OR Society’s President’s Medal. It was no secret that the broadcast and mobile data industries doubted that the FCC was capable of pulling this off – but thanks to sophisticated O.R. techniques, the project proved a remarkable success.
Brian Clegg is a science journalist and author and who runs the www. popularscience.co.uk and his own www. brianclegg.net websites. After graduating with a Lancaster University MA in Operational Research in 1977, Brian joined the O.R. Department at British Airways. He left BA in 1994 to set up a creativity training business. He is now primarily a science writer: his latest This approach is also being considered title is Professor Maxwell’s Duplicitous by regulators around the globe. Such Demon, a scientific biography of James VOLUME 00 NUMBER 00 MONTH 00 ISSN: 0960-085X was its success that it won the 2018 Clerk Maxwell. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY
Thanks to the sophisticated optimisation model, up to six rounds a day were completed as the auction neared completion. This was essential, as the whole process was predicated on having a tight schedule. Leese again: ‘If I could pick two biggest challenges, they would be (a) the recognition right at the outset that the start date for the auction had been set well in advance, so no slippage in timescale was possible and (b) that there was no room for things not to work perfectly first time.’ As Karla Hoffman of George Mason University, part of the optimisation team at the FCC, put it: ‘The O.R. techniques used were essential to the design and implementation of the auction. This coordination with Canada and Mexico ensured that all of North America now has a uniform 600-MHz band… The auction repurposed 84 megahertz of spectrum and generated gross revenue of nearly $20 billion, providing more than Contents $10 billion in capital for the broadcast
U N I V E R S I T I E S M A K I N G A N I M PAC T EACH YEAR STUDENTS on MSc programmes in analytical subjects at several UK universities spend their last few months undertaking a project, often for an organisation. These projects can make a significant impact. This issue features reports of projects recently carried out at two of our universities: London School of Economics and Southampton. If you are interested in availing yourself of such an opportunity, please contact the Operational Research Society at email@theorsociety.com STRUCTURED MARKET ANALYSIS AND STRATEGIC CHOICE (Meizi Chen, LSE, MSc Operational Research and Analytics)
The LSE’s management science masters programme (now Operational Research and Analytics) has sourced anything up to 100 projects in one year with organisations ranging from global giants to relatively tiny but dynamic start-ups. Inevitably, the majority have focused on one application area and often on one skill set. However, the LSE has a strong tradition of research and teaching methodologies to support strategic planning and decision-making in situations where a multidisciplinary approach is needed. It also tries to equip masters students with the ‘soft skills’ needed to champion their specialisms within an organisation, perhaps operating as internal consultant and having to build ‘intelligent customer’ capacity and awareness of what could be achieved. That was very much the case for Meizi’s project, undertaken for a client new to the LSE’s programme, IMI Precision Engineering. The client was looking for support in identifying areas within its sales and marketing
operations where better data analytics and structured decision-making could help shape strategy choice and subsequent focus. Pilots would then hopefully generate valuable results but also demonstrate the benefits to a wide range of internal stakeholders, with potential future roll-out of the best ideas. This was ‘real-life management consultancy’. The details of this project are confidential but Meizi worked across teams to explore ways to approach forthcoming decisions and build the necessary data sets and ran a structured multi-criteria decision analysis (MCDA) trial. The LSE’s project supervisor, David Collier, said ‘Meizi was chosen for this project because of her broad knowledge of suitable methods and her Distinction from Penn State in Supply Chain and Information Systems, but also because of her personality and adaptability. It is hard to come in cold to an unfamiliar organisation in a new country and convince potentially sceptical engineers but somehow Meizi managed it.’
Meizi agreed with that assessment of the challenge but recognised the opportunity. She said ‘the three-month consulting project was a highlight from my time at LSE. It was a valuable experience applying sound academic theory and offered, most importantly, a real in-depth task to immerse myself in. It was a great opportunity for me to improve my technical, communication, and analytical skills, but also gave me the confidence to start my career after graduation’. Francesca Dematteis for IMI Precision Engineering said ‘I enjoyed the experience of mentoring Meizi. We were really pleased with her and with our first experience of a summer masters project. Exploring possibilities inevitably means a few false starts but we learned a lot from the process and it really did inform the choices we subsequently made’. Meizi is now back home in China with an important role in a fastgrowing education business but she still keeps in touch with the friends she made through the project.
IMPACT © THE OR SOCIETY
17
DISCRETE STATE MODEL FOR BANKNOTE LIFE (Emily Loizidou, University of Southampton, MSc Operational Research and Finance)
The aim of Emily’s project was to model the process of banknotes deteriorating through different fitness states. The quality of circulating banknotes is an important consideration for central banks and variations in performance can have significant implications on the management of the cash cycle. De La Rue is the largest commercial producer of security documents working with governments, central banks and commercial organisations in more than 140 countries. The company provides central banks with the tools, including the software DLR AnalyticsTM, they need to make data-based decisions. Modelling the demand for new banknotes is a challenge faced by all central banks. The requirements for new banknotes vary between denominations and are influenced by circulation volume forecasts, seasonal patterns, policy changes and the banknotes performance in circulation. There is limited research in modelling banknote deterioration, and very few central banks have the capability to monitor the quality of individual banknotes using serial numbers as they move through their life in the cash cycle. The implication of having heavily worn banknotes in circulation can be severe.
18
IMPACT | SPRING 2019
Central banks set fitness standards that they consider acceptable to maintain a certain quality of banknotes in circulation. When banknotes are returned from circulation to be examined or sorted, they will either pass or fail according to these criteria. Small changes to the standards can have large implications on the number of banknotes assessed as no longer suitable for circulation and ultimately destroyed. It is therefore important to be able to model how banknotes wear over time, both to support more accurate banknote demand forecasting and provide the ability to fine tune banknote fitness standards whilst understanding the direct impact on destruction volumes. A discrete state model was used to represent the transitions between different fitness levels for banknotes. There are two main categories for banknote fitness: (1) fit and (2) not fit. Each category can be subdivided into additional levels. All banknotes will begin their life in the first state and move to other states according to their deterioration path. A semiMarkov model was used to replicate the different progressions of banknotes through the discrete states. Varying different parameters allows the model to be applied to different cash cycles
which can have distinct characteristics. It can forecast the number of banknotes in each of the fitness states and allows an assessment of the overall quality of banknotes in circulation at a given time. The ability to vary parameters will allow “what if analyses” to be completed. Dr Simon K Jones, Forecast Manager at De La Rue, stated that ‘it is essential that the quality of banknotes in circulation is monitored closely by central banks. A large volume of worn or damaged banknotes can make security features difficult to use and potentially counterfeit rates may increase. De La Rue equip Central Banks with traditional forecasting methods in their cash cycle software DLR AnalyticsTM. There is an opportunity to enhance these traditional methodologies to incorporate the quality of banknotes. This advanced mathematical approach enables the fitness of banknotes to be predicted and can provide an insight into the impact of “what if scenarios” such as considering adjusting inspection frequencies or modifying fitness standards. Extensions to the DLR Analytics application are underway and the inclusion of this model is on the roadmap.’
G. KEITH STILL
ESTIMATING CROWD SIZE, both a priori and in real-time, is an essential element for planning, and maintaining, crowd safety in places of public assembly. Where major, city wide events are planned, there is often an associated bragging right and marketing hype that surrounds the estimated crowd size. In this article, we outline how to estimate crowd sizes, using crowd dynamics, and why the numbers are important to get right.
TIMES SQUARE
Late December 2016, I was asked by the Washington Post to assist with
a katz/Shutterstock.com
C R OW D S C I E N C E A N D C R OW D CO U N T I N G an article on the number of people attending the New Year Event in Times Square. They were claiming that over 2,000,000 people would be present in Times Square at the midnight moment. I was asked to evaluate the ‘safe’ capacity, and outline the methodology of how to evaluate capacity for this kind of major, city centre event. The article appeared on the 31st Dec 2016. Overestimating the crowd size is a common issue, and several cities have similarly overestimated their actual crowd attendance, which leads to further exaggeration and upwards estimates. Why is this a problem? Does it really matter?
IMPACT © 2019 THE AUTHOR
19
How do you estimate attendance?
Let us consider the fundamentals of crowded spaces and the human ellipse. The projected area of an average person is approximately 50 cm × 30 cm (taking a 95-percentile average). If we assume close packing, in a row-by-row and orderly manner, then between 4 and 5 (average) people per square metre would be reasonable. (In the UK Guide to Safety at Sports Grounds, the maximum packing density of a viewing area is defined as 4.7 people per square metre). Using this as an approximation, we can easily define the area capacity of any place of public assembly. It is simply the area times 4.7 (people per square metre) as a maximum value. Of course, we need to factor safety into this, such as having walkways and barriers to prevent overcrowding at any specific point in the system, as seen in Figure 1. As an estimate for area, we can use Google Earth Pro, which gives the total area available in Times Square as 8,557 square metres.
Assuming this area could be packed to 4.7 people per square metre, the maximum capacity would be 4.7 × 8,557 square metres, i.e. 40,218 people. Of course, this does not include any safety considerations.
Crowd safety considerations
You simply do not pack people into these kinds of spaces like sardines. Barriers are used to create pens, between the pens there are spaces for emergency vehicles and crowd management/police/security. There are stages and other infrastructure in place. Places of public assembly also need monitoring for security as crowds can be a terrorist target, specifically for large public events such as Times Square at New Year. As can be seen from various site images, there is evidence of planning for crowd safety and crowd management. There are barriers, with contingency spaces and media reports of controlled filling, pen-by-pen, to prevent any dangerous overcrowding. This means that the maximum capacity
is less than 40,218 and far less than the Mayor’s public relations claim of 2,000,000 people.
Overestimating the crowd size is a common issue, and several cities have similarly overestimated their actual crowd attendance, which leads to further exaggeration and upwards estimates
Evaluating crowd capacity
Over the last 30 years of developing crowd safety and risk analysis tools, there is one recurring theme, one common set of questions I get asked. ‘If I don’t know how many people are coming to my event, how do I plan for crowd safety?’ You generally only need four fundamental bits of information to evaluate the answer to that question.
lev radin/Shutterstock.com
Routes – what direction will the crowd approach/depart the event space? Areas – what areas are available for the crowd? Movement – over what period of time do the crowds arrive/leave the site People – what do you know of the profile of the crowd? We call this a RAMP Analysis (Routes, areas, movement, people/ profile), and use this for planning, risk assessment and evaluating real-time events. We can illustrate this using a well-known example from one of our projects.
TRUMP INAUGURATION FIGURE 1 PREPARING FOR NEW YEAR CELEBRATIONS IN TIMES SQUARE
20
IMPACT | SPRING 2019
Following the Washington Post analysis of New Year in Times
Square (2016/17), we were approached by the New York Times to give them some background on crowded spaces, and specifically, to estimate how many people were at Abraham Lincoln’s inauguration (in 1861, from a few historical images). Using the density count from those images, projecting the area using Google Earth Pro and anthropomorphic data from historical records (people were approximately 20% smaller back then), we could estimate, from very limited data, that there were approximately 250,000 people present. The New York Times then asked if we could do this in realtime for the Trump inauguration in 2017. We had a few days to prepare for a real-time analysis and check the available data. Firstly, the areas for the crowds were shown on the White House websites. Including the screening points we had images of the main viewing area. There were routing maps published, how to get to the Washington Mall, with updates on any congestion/delays.
We knew we would also have live car park data posted on various social media sites. We also knew the Washington Metro would publish transit data at approximately 11am and we were monitoring social media for reports of any problems accessing the metro. Social media can be a useful indicator of the mood/profile of the crowd moving towards the site. This provided three independent sources of information for processing: car park fill rates, metro utilisation, reports of any delays. Coupled with the live TV coverage, showing the Mall filling, we had corroborating evidence that would support any real-time evaluation of crowd size. The initial problem was the reality gap, a ‘huge’ reality gap, between the claims that there will be 3,000,000 people at the inauguration and the available area, packing density and transport capacity. We can make a simple a priori approximation for this. The ‘claims’ vastly exceeded the capacity, a story we were all too familiar with from other major events.
FIGURE 2 THREE FUNDAMENTAL ARRIVAL PROFILES FOR MAJOR EVENTS
Inauguration crowd estimates (a priori)
The ticketed area for the inauguration was 65,000 square metres and the mall area was a little over 200,000 square metres. These spaces do not pack to capacity as people are viewing large screens and people leave spaces to facilitate viewing. For example, an aerial fireworks display would typically pack to 1–2 people per square metre and viewing large screens 2–3 people per square metre, etc.
When politician argue against facts, mathematics always wins in the end Even at maximum safe capacity (4.7 people per square metre), this gives a maximum capacity of 1,245,500, far less than the claims of 3m. When politician argue against facts, mathematics always wins in the end. At this stage in the project, we had the basic information sources, all gathered within 12 hours. My colleague, Marcel Altenburg, and I set up 4 computers, recording the various live video feeds from the major news channels, had the New York Times reports on the site, taking photographs of key locations and data feeds from social media (including reports from the Secret Service of any queueing issues – there were none). Metro and Car park data provide a valuable insight as their filling rates are a pre-cursor to crowd movements. To explain that, we need to look at the ‘M’ in the RAMP Analysis, Movement over time. There are three fundamental arrival profiles for major events, early, transport limited, or late. To illustrate these, the ideal curves shown in Figure 2 give us a reference guide.
IMPACT | SPRING 2019
21
The initial problem was the reality gap, a ‘huge’ reality gap, between the claims that there will be 3,000,000 people at the inauguration and the available area, packing density and transport capacity For example, the early arrival rate would be typical of a celebrity appearance, where getting the best position, close to the front of stage, is governed by how early you arrive on site. A late arrival event is typical of fireworks and football matches, where the transport capacity, parking, seating arrangements, are all welltravelled, and times are known in advance. The transport limited events, such as the Sydney Olympics, where you can only access the site via trains or buses, are a flatter, maximum transport capacity arrival profile. In essence, knowing what the curve looks like, how the transport system operates, can provide valuable insights to large scale events in city centres. We have deployed real-time fill
predictors on a number of city-wide events. It’s not exactly rocket science, and the analysis of the network, the routes people take to the site, and (in many cases) reducing the options for ingress, to condition behaviour for egress (typically the crowd will exit using their ingress routes) are all part of the crowd sciences, using RAMP analysis. For the inauguration, we didn’t know how this crowd might arrive on the site, but we had the video feeds from the Washington Memorial, which allowed us to monitor the crowd build up over time. We had live feeds from the car parks and the social media reports from the metro (no congestion). Using all three inputs, video, car parks and the Secret Service social media reports of site access, we had a clear image of how this site was filling in real-time. The curve was low, flat, not exponential, observed over several hours (from 6 hours prior to the speech at noon Washington time). To cut short a long afternoon of data gathering, analysis of area build up, car park
fill rates, the results are presented in Figure 3, which shows the relative areas from the Obama Crowds and Trump Crowds. After 5 hours of monitoring the arrivals, the cars parks, measuring the areas (every 30 minutes, we assessed how much more area was occupied/ filling with people), we had a clear image of the ‘M’ in the RAMP analysis. Low, flat, non-exponential and 1/3rd of the Obama Crowd, 37% of the metro ridership, 1/3rd of the number of buses in transit, 1/3rd of the occupied area.
we had front page on the New York Times with the, now famous, image of the Obama/Trump crowd comparison Media reports
Following the Lincoln Article, we had front page on the New York Times with the, now famous, image of the Obama/Trump crowd comparison. In real-time, working with the reporters and photographers from the New York Times we estimated this crowd
FIGURE 3 AREAS OCCUPIED AT THE OBAMA AND TRUMP INAUGURATIONS. IMAGE © 2017 DIGITALGLOBE, GOOGLE
22
IMPACT | SPRING 2019
was 1/3rd of the number of people (using the occupied area) compared to the Obama Inauguration (claimed as the inauguration crowd record) at 11.15am, 45 minutes before the speech, knowing the arrival profile/curve was key to understanding the event fill rate. This image was circulated around the world, with over 100,000 retweets in the first few hours of its production.
CAN YOU DO IT AGAIN, TOMORROW?
The project was complete, though the story continued to the following day when we were asked to count the crowds for the Women’s March, on the 21st January. Using the RAMP analysis, now with only a few hours of preparation, we were able to estimate the numbers at the Women’s March at 470,000 (at 2.00pm, just prior to the march starting). Again, this made the headlines (now three times in the same week). We didn’t know at the time that another organisation was crowd counting, using high resolution images (from hot air balloons) and physically counting heads. This took their team over a week to complete and their count at 2.45pm (during the march) was 440,000 +/- 50,000 people attending. In real-time, we had a confidence level of +/- 5% based on the correlation of bus data, metro data, fill rates (occupied areas) and photographs from the New York Times photographic team (13 photographers around the site at various locations). The final estimate was that, approximately, three times the number of protesters attended the march than attended the Trump inauguration.
CROWD COUNTING – AGAIN, AND AGAIN
We have been asked to evaluate crowd sizes at a wide range of events, from the Super bowl victory parade (720,000 not the 3,200,000 they claimed), the recent New Year celebration in Times Square (not the 2,000,000 they claimed) to name a few examples from 2018, and as part of our MSc in Crowd Safety and Risk Analysis at MMU, we task the international students with evaluating famous events and estimating actual attendance against the political claims. Typically, crowds are vastly over estimated, 5 to 10 times more than the site can accommodate. So, why is crowd counting important?
Realistic crowd counts are essential to provide the balance of safety and security, to balance the experience with the expectations, and to maintain the resources necessary to sustain future events. It is essential to get those numbers right
Assume you are the event planner for a city-wide event, and you need to provide security, screening, safety considerations, toilets, food and beverage, etc. If the resources coped with a crowd of size x, but the Council think the crowd was of size 10x, then they have an unrealistic view of the capability of resources, which may impact future events. Realistic crowd counts are essential to provide the balance of safety and security, to balance the experience with the
expectations, and to maintain the resources necessary to sustain future events. It is essential to get those numbers right. In April 2018 the College of Policing, having worked closely with me in its development, introduced mandatory training in policing events for all Public Order/Public Safety commanders and their advisors. The training provides a full day of crowd science content, giving them knowledge and understanding of key crowd science tools before providing the opportunity to test their learning in bespoke scenarios. The understanding of crowd science, developed by completing the course, enables commanders and advisors to recognise potential crowd safety issues at the planning stage of an operation, which in turn allows them to highlight any risk and where appropriate put measures in place to mitigate it. Keith Still is Professor of Crowd Science at Manchester Metropolitan University (UK) where he developed and delivers an online MSc programme in Crowd Safety and Risk Analysis. His work focusses on the use of planning tools for places of public assembly and major events. Keith has consulted on some of the world’s largest and most challenging crowd safety projects. Keith has developed and taught a wide range of short courses for all groups involved with the planning, licencing and management of public spaces which draw on his extensive experience of planning major events, his various research projects and application of crowd safety and risk analysis over the last 30 years.
IMPACT | SPRING 2019
23
Certified Analytics ® Professional (CAP ) The OR Society now offers Certified Analytics Professional, an exambased analytics qualification established by INFORMS in the USA. This is complementary to our own accreditation programme, and does not create an exclusive ‘one-or-the-other’ choice. What is CAP?
CAP is the premier global professional certification for analytics practitioners. Those who meet CAP’s high standards and pass the rigorous exam distinguish themselves and create greater opportunities for career enhancement. Earning the CAP credential requires meeting the eligibility requirements for experience and education, effective mastering of “soft skills”, committing to the CAP Code of Ethics, and passing the CAP exam. For organisations seeking to enhance their ability to transform complex data into valuable insights and actions, CAP provides a trusted means to identify, recruit, and retain the very best analytics talent.
CAPs in the workforce
■ 20% of Fortune 100 companies have CAP on staff including ■ ■
Bank of America, General Motors, Boeing, Chevron, DuPont, IBM, JPMorgan Chase, Chase, Lockheed Martin, UPS, and more. Visit www.certifiedanalytics.org to search the database for CAP professionals. Add CAP Preferred to your job postings to receive résumés from the most eligible analytics professionals.
CAP Benefits
■ Building Capability: CAP-accredited employees and ■
professionals provide the unique capability to leverage the power and promise of analytics. Driving Credibility: Having CAP professionals on your team demonstrates you have the top talent in place and are committed to the highest ethical standards of analytics practice. Creating Opportunity: Encouraging employees and professionals to pursue their CAP certification creates new opportunities for success and provides new avenues for organisational analytics-based growth.
■ Focused on seven domains of analytics process:
■ ■
I Business Problem Framing II Analytics Problem Framing III Data IV Methodology Selection V Model Building VI Deployment VII Lifecycle Management Computer-based exam administered through the CAP test vendor network Managed by the Institute for Operations Research and the Management Sciences (INFORMS)
Are you eligible?
Earning the CAP credential includes meeting eligibility requirements for experience and education. Degree Level
Degree Area
Experience
MSc/MA or Higher
Related Area
3 Years
BSc/BA
Related Area
5 Years
BSc/BA
Non-Related Area
7 Years
Fees and pricing
For OR Society members, fees will be billed in sterling equivalent at the prevailing PayPal exchange rate. Exam Fee: OR Society Member
$495
£380*
Exam Fee: Non-members
$695
£533*
Annual Maintenance Fee Payable beginning 4th year of certification
$100
£77*
Why should I hire CAPs?
Member Re-examination Fee
$300
£230*
■ ■ ■ ■ ■
Nonmember Re-examination Fee
$400
£307*
Processing Fee on Approved Refunds
$100
£77*
Appeals Processing Fee
$150
£115*
■
Proven pre-qualified analytics talent Improves your organisation’s analytics capability Maintains continuous professional development Provides long-term professional growth Increases your competitive advantage
Programme at a glance
■ Globally recognised credential based on practice of ■ ■
analytics professional Vendor and software neutral Created by teams of subject matter experts from practice, academia and government
31217 Resized Ads 280x210.indd 3
*GBP conversion estimated as at 6 September 2017.
For more information about the CAP programme www.theorsociety.com/cap 0121 233 9300 www.certifiedanalytics.org info@certifiedanalytics.org
16/03/2018 10:35
MAKING SENSE OF BIG DATA U S I N G C LU S T E R A N A LYS I S DUNCAN GREAVES
IN A FORMER LIFE, before academia called, I was gainfully employed as an information architect in one of the largest cash processing businesses in the world. One big problem with this job was that there was lots of data – terabytes of the stuff but precious little information. In common with other analysts, how to use and interpret big data to guide business decision making was one of the biggest challenges. Distributed processing and cloud computing allow advanced storage and analytical capabilities to be available to organisations of all sizes. Although such capabilities bring a welcome reduction in the capital costs of computing, the increased revenue expenditure that these changes bring necessitates an understanding of how big data from distributed storage can be translated into meaningful insights that can help the bottom line. Although the large size of ‘Big Data’ sets is the dimension that stands out, as the clue is in the name, the defining characteristics could be said to be the six V’s, namely Volume (large), Velocity (rapidly accruing), Variety (different formats), Veracity (information reliability), Variability (complexity and structure) and Value (the data has low value density). Although cloud solutions are able to assist with storage and on-demand computing power, understanding the data collected by revealing the structure of the information is the key to unlocking the value that is sparsely contained within. In order to extract
value from Big Data it needs to be made smaller. This is where Cluster Analysis techniques can help to classify and group objects based on attributes. Cluster analysis seeks to apply a common approach to grouping objects based on their relationships to each other by assessing the structure from a set of user selected characteristics. It is a generic name for a group of multivariate data analysis techniques that utilise a cluster variate, a mathematical representation that compares the similarity of objects. Cluster analysis also mimics a task that is an innate skill of the problem solver – pattern recognition. By recognising patterns in the data, analysts can overcome the mountains of data to reduce the number of observations needed and generate hypotheses about the structure of the problem that can be tested.
Cluster analysis is particularly useful in scenarios related to choosing between different business options, where the decision depends on the similarity (or difference between) groups
Cluster analysis is particularly useful in scenarios related to choosing between different business options, where the decision depends on the
IMPACT © 2019 THE AUTHOR
25
similarity (or difference between) groups. It can be applied to describing or producing a taxonomy of objects, for simplifying data where there are many observations, or for identifying relationships between objects. Common business examples include customer segmentation, utility or power usage and customer behaviour profiling. Advanced clustering techniques are also employed in more technologically sophisticated applications like image, number plate or facial recognition systems, but the principle of classification based on a known or assumed cluster variate remains the same.
cash and coins, are said to be ‘semi-clean’. Large quantities of poor quality notes and large amounts of coin constitute ‘unclean’ deposits due to the increased manual processing required. Cash is also automatically screened for out of date notes, coins and forgeries, the presence of which automatically increases the amount of intervention that is required.
the sample chosen [should be] sufficient to capture a representation of all relevant small groups that may be within the population
PROBLEM DEFINITION
26
IMPACT | SPRING 2019
The quality of the received deposits directly influences the methods used to process the cash, with large automated machines able to handle ‘clean’ deposits, semi-automatic methods are used to process ‘semi-clean’ deposits using desktop cash counters like you may see in a bank branch. Large amounts of ‘unclean’ coins need to be counted using specialist machines and tatty notes have to be manually withdrawn and replaced.
DESIGN ISSUES
The design of a cluster analysis for big data is aided by the fact that finding an adequate sample size is rarely a problem. What is more important is that the sample chosen [should be] sufficient to capture a representation of all relevant small groups that may be within the population. This is critical in the detection of outliers. In this case, it was important to ensure that the sample included sites at airports, so that processors capturing foreign currency could be evaluated. Outliers can represent small legitimate groups like the amount of Turkish lira processed or may represent under-sampled groups, like £50 notes processed in poor areas that are legitimate but may be prevalent in more affluent areas. Non-legitimate presentations included seaside © Bell Photography 423/Shutterstock.com
Let me illustrate the usefulness of cluster analysis from my cash processing days. During processing, cash is counted and deposited to a customer account. The problem was how to produce the most accurate estimates of how much ‘effort’ was used to process a deposit. Previous pricing mechanisms were based solely on the cash value of the deposit, but this masked wide variation in the quality of the money presented. An algorithm was required that could evaluate this processing effort based on both the value and processes used, and clustering analysis was used to determine the parameters of the resulting algorithm. The amount of labour employed in counting the deposits is the determining factor in selecting the pricing point at which the depositing bank is charged. When cash arrives at a depot, it is sorted and counted based on the way the cash is presented. Deposits can be ‘clean’, in that they contain single denomination notes, mixed notes, and foreign notes. Mixed
Prior analysis and understanding of the business process, data structures, and the aims of the project ensured that there was considerable consensus between the stakeholders about the way in which the clustering could be approached.
currency ‘joke’ notes that had to be excluded. Depending on the purpose and method of clustering it is always necessary to screen the sample data for outliers to ensure the results capture, and are representative of, the whole population and are not unduly affected by outliers.
PATTERN OR PROXIMITY
Similarity matching is the process by which the correspondence of objects between each other can be assessed. The similarity is computed by comparing each observation to the grouping criteria specified by the analyst. A procedure then allocates all similar objects into the same cluster. Correlation procedures utilise correlation coefficients to assess how closely the profiles of objects resemble each other and match the criteria. In the example, the quality of a deposit was based on metric (value) and non-metric (quality) data, that is the cash value and make-up and, therefore, this was the method used to evaluate the data.
it is always necessary to screen the sample data for outliers to ensure the results capture, and are representative of, the whole population and are not unduly affected by outliers
Dependent upon the types of data being compared, distance measures may be a more appropriate measure of similarity, and methods to achieve cluster similarity may be based on the Euclidean (i.e. crow fly) distance between data points or the Manhattan distance (based on a grid network)
between data points. Other distance measures are available to the analyst and can be utilised to represent the data patterns. Basing clusters solely on simple distance metrics of similarity determination like cash value that take into account only the magnitude of the values can disguise wide variations in other criteria like deposit quality. When choosing a procedure for analysis, the analyst should be aware of whether pattern or proximity is the primary objective of cluster definition. Regardless of method used, the resulting clusters that are produced should exhibit high internal (withincluster) homogeneity, and high external (between-cluster) heterogeneity.
STANDARDISATION
Different data sets may need a standardisation process and processing each variable (or distance) to a standardised score assists comparison of values when different scales are used for each variable type. Standardisation of variables in time or date formats to a numeric format were used to assess how long operations took. In questionnaire scenarios the standardisation of responses within rows (case standardisation) can be of use to level respondent scores and mitigate response-style effects. Counting financial deposits is aided by the fact that cash in the UK is standardised to GBP. However, processing foreign deposits in a UK cash centre involves a currency conversion and adding exchange fees as part of the process.
ASSUMPTIONS
Unlike some other statistical inference techniques, clustering is a technique that does not require the data to be
evenly represented across the scale used (normally distributed), or the values to have any relationship with each other (linearity). Checking input variables for multicollinearity, where the variables are so closely correlated that they essentially measure the same thing, was accomodated by using only the foreign currency values or the converted rates instead of both, because they represented the same values related by the exchange rate. Alternative methods of dealing with highly correlated data include choosing a distance measure that compensates for correlation (e.g Malahanobis distance, which measures the number of standard deviations from a point to the mean of a distribution) to ensure that multicollinearity bias is not introduced into the analysis.
ALGORITHM SELECTION
The clustering itself is performed by partitioning the data based on the attributes of the data according to the similarity rules employed, in order to produce groups of data that are similar in pattern or proximity. A hierarchical clustering algorithm was used to partition process data by using a divisive method that successively subdivided the sample into smaller groups based on attributes (see Figure 1) to form a tree like structure. This method was chosen because the patterns and taxonomy of the data were well known to the analysts involved. Alternative methods of clustering use agglomerative methods that successively group the data ‘up’ from single observations based on the criteria. These algorithms are best suited to multiple criteria evaluation scenarios and where distance and proximity (as opposed to pattern)
IMPACT | SPRING 2019
27
contain stopping rules that set limits on the number of heterogeneous clusters that can be created. These stopping rules determine that the clusters arrived at are sufficiently distinct from each other, in terms of percentage, variance or statistical difference. If the output is to be used to inform a change to a system or process, then the number of clusters arrived at must be intuitive and of manageable size.
INTERPRETATION
FIGURE 1 RESULTS OF A HIERARCHICAL CLUSTERING ALGORITHM
measures of grouping are used, and the choice of which algorithm to use is dependent upon the situation. Other alternative algorithms involve non-hierarchical clustering, whereby clusters are seeded from initial values and observations are assigned to the clusters with the aim of maximising the similarity within clusters and maximising the distance between clusters. A family of algorithms known as K-means work by iteratively optimising and reassigning observations to the clusters. Algorithm choice depends on the situation. Hierarchical methods offer the advantages of speed and simplicity. However, they are also biased by the presence of outliers, high storage requirements, and the potentially misleading effects of early selection combinations. Non-hierarchical methods can overcome some of these
28
IMPACT | SPRING 2019
barriers, but work best when the seed points are already known or can be reasonably specified from prior practice or theory. A blended approach of using a hierarchical algorithm, removing outliers, then re-clustering using a non-hierarchical algorithm can help to compensate for the weaknesses of both approaches.
STOPPING RULES
The analyst should always be aware of how many resultant clusters are required, to give sufficient discrimination between groups whilst maintaining a useful and informative analysis. Single observation clusters need to be checked for validity, and removed or regrouped to ensure that the number of clusters created is manageable. Many software programs
When the clustering process is completed, it is possible to interpret the clusters and to assign labels to them according to the entities they represent. In the example, the clusters can be further subdivided based on the quality and process used, so were assigned labels such as ‘Clean-Auto’, ‘Mixed-Manual’. Labelling means that it is immediately obvious to other users what the grouping represents, and is especially useful when communicating the results of the analysis to others, creating a common vocabulary of understanding.
VALIDATING AND PROFILING
Validation involves ensuring that an optimal solution has been arrived at and that it is representative and generalisable. This is achieved by using the cluster analysis on several different samples in order to cross validate the results that were obtained. This ensures that rare observations and edge cases are captured and classified correctly and that the proposed classifications are stable.
SUMMARY
This example shows how clustering techniques were used to establish the classification criteria for a new Activity Based Costing (ABC) solution utilising big data obtained from a distributed, multi system processing capability, bringing clarity, unity and added value to the pricing systems of a core business process. The process, as always, was aided by close collaboration between the users that know the meaning of the process and the administrators of the systems that aid the understanding of the data. It’s possible that Clustering, Collaboration and Co-operation could well be the three C’s of leveraging the
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias.
00
KNOWLEDGE MANAGEMENT RESEARCH AND PRACTICE Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias. Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius. Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
VOLUME 00 NUMBER 00 MONTH 00 ISSN: 0960-085X
EJIS
Hair, J.F., W.C. Black, B.J. Babin and R.E. Anderson (2010). Multivariate Data Analysis: A Global Perspective. Pearson.
Contents
Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius.
Duncan Greaves is a PhD researcher at Coventry University studying Cybersecurity Management, with a focus on how businesses can influence trust formation in digital environments. Although his writing is that of a self identified data geek his first love is the study of sedimentary geology and fossils.
FOR FURTHER READING
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY
Profiling using new or unseen observations ensured the predictive classification of the criteria used and to help prepare the operationalisation of the solution.
six V’s of big data into insights and innovation. Collecting and curating large data sets is a by-product of modern digital environments. If your organisation is one of those that is concerned about how to leverage large data sets to gain value then cluster analysis may be a step towards enhancing your insight capability.
00 To provide an outlet for high-quality, peer-reviewed articles on all aspects of managing knowledge. 00 include not just those focused on the organisational level, but all levels from that of the This will individual to that of the nation or profession. This will include both theoretical and practical aspects, 00 and especially the relationship between the two. There will be a particular emphasis on crossdisciplinary approaches, and on the mixing of “hard” (e.g. technological) and “soft” (e.g. cultural or 00 motivational) issues. Rigorous contributions from both academics and practitioners are welcomed. 00 Articles may be empirical research papers, theoretical papers, conceptual papers, case studies or surveys. 00 KMRP will fill the need for a journal specifically concentrating on knowledge management that 00 maintains the highest standards of rigour, and publishes articles that reflect greater multidisciplinary 00 work and/or conceptual integration than those currently published in existing outlets. 00 A cross-disciplinary focus will also enable articles in the journal to address other important tensions in the field of knowledge management, such as those between:
THE EUROPEAN JOURNAL OF INFORMATION SYSTEMS
• Strategy and operations • People and technology • Short-term and long-term needs
• The organisation and the individual
Editor Giovanni Schiuma, University of Basilicata, Italy Consulting Editor John S. Edwards, Aston University, UK Te’eni Dov VOLUME 00
T&F STEM @tandfSTEM
@tandfengineering
NUMBER 00
Explore more today… http://bit.ly/2DYvePT MONTH 2018
IMPACT | SPRING 2019
29
© Vattenfall
IMPROVING PROFITABILITY OF WIND FARMS WITH OPERATIONAL RESEARCH MARTINA FISCHETTI
30
IMPACT © 2019 THE AUTHOR
WIND ENERGY IS A FASTEVOLVING FIELD that has attracted a lot of attention and investments in recent decades. Being an increasingly competitive market, it is very important to minimise establishing costs and to increase production profits already in the design phase of new wind farms. Therefore, Vattenfall, which is a leader in the wind energy business, uses state-of-the-art Operational Research (O.R.) techniques in the design phase of new offshore farms. Our work proved that millions of euros
can be saved in practice by a smarter use of resources through the use of proper optimisation tools compared to the traditional and more manual approach. If, during the last time you were flying, you were lucky to fly in clear weather over the ocean, you may have passed over an offshore wind farm and admired the turbines – all neatly arranged in long straight rows, symmetric from all angles. But you will see less of that in the future, as lining up turbines is most often really bad for performance because turbines
will almost inevitably be casting wind shadows on other turbines in the farm, as can be seen in the lead image. In the future, you will see a new kind of 'beauty'– drawn by math and optimised in production.
Vattenfall uses state-of-the art Operational Research techniques in the design phase of new offshore farms
THE IMPORTANCE OF MINIMISING INTERFERENCE BETWEEN TURBINES
If we imagine two turbines placed one in front of the other along a certain wind direction, it is easy to imagine that the upwind turbine will screen the downwind turbine off from the wind. From a wind energy production perspective, this is a big loss, as the second turbine cannot produce as much as it would have done with unhindered access to the wind. We want all wind turbines to be clear of any wind shadow, the so-called wake effect, which could take away part of their capacity to generate electricity. Obviously, if you have to place two turbines near each other, and you have only one wind direction, it is not that difficult to avoid them casting shadows on each other. But if you are asked to place 50 or 80 wind turbines in a wind farm site where the wind changes continuously, how will you then find the optimal location for all turbines? Only a few years ago, the engineers at Vattenfall had to use standard commercial tools and,
not least, their own experience to position turbines in a wind farm. These tools were not able to incorporate all the limitations and constraints in the design work, so quite a substantial amount of manual engineering work had to be done. Now we use state-of-the-art Operational Research techniques to design modern wind farms. The question of optimal turbine allocation was addressed in a Ph.D. project and the resulting tools were fine-tuned in production mode at Vattenfall. New Operational Research models and algorithms were developed to optimise the position of wind turbines in a wind farm as well as the type and routing of cables to connect them.
WIND FARM LAYOUT: A CHALLENGING DISCIPLINE
Before Vattenfall starts its design work, it is given a pre-defined area offshore and all the relevant information about it, including wind statistics, sea bed conditions and various limitations such as ship wrecks, existing cable routes, etc. A team of experts in the company preselects a turbine model that they want to use on the site and, based on the overall capacity of the site allowed by the grid operator, they define how many turbines they want to install. At this point all the ‘LEGO bricks’ are there, but the biggest challenge remains: how should they be put together to create the best layout where all factors are taken into account? How should all the statistical information be used to increase production and profitability?
The optimisation is challenging. On the one hand, the magnitude of the instances in real-world applications is typically very large. On the other hand, there are some uncertain parameters such as wind conditions that are typical of stochastic problems. Finally, a number of non-linear effects have to be taken into account in the definition of the problem. Amongst these, the most relevant is the interaction between turbines. An ad-hoc mathematical model was designed in Vattenfall to crack the nut and design a farm where the total power production was maximised (explicitly considering the wake effect), whilst all the constraints arising in practical applications were taken into consideration.
Our layouts proved to be, on average, about 10 million euros more profitable for each wind farm
Our layouts proved to be, on average, about 10 million euros more profitable for each wind farm. In Figure 1, we illustrate an example of optimisation of a real-world farm. The costs of foundations (different colours in the plot on the left) are explicitly considered in our optimisation. Experts in Vattenfall compared our layout (red dots in the plot on the right) with the one obtained with the traditional process (blue dots), proving savings in the order of 12.6 million euro in the farm’s lifetime through increased production and reduced cost of foundations.
IMPACT | SPRING 2019
31
© Vattenfall
Direct impact: million-euro savings and revenue boost
FIGURE 1 POSITION-RELATED COSTS (FIRST PLOT) USED TO GENERATE THE OPTIMISED LAYOUT (RED DOTS IN SECOND PLOT)
CABLE ROUTING TAKING A NEW DIRECTION
Once the turbine positioning is defined, the next step in the wind farm design process is to optimise the way they are connected through cables. The turbines are connected through lower voltage cables to one or more offshore substations (the inter-array cable connection), and a unique high-voltage cable (the export cable) is used to transport the electricity from the substation to shore, where the grid transports the power to the end-customers. The challenge of optimising the inter-array cable routing lies in the decision of how to connect the turbines and what kind of cable to use for the connection. The optimiser is provided with a set of possible cables, each of them characterised by different price per metre, different capacity and different electrical characteristics. The optimiser must decide not only which turbine to connect to which turbine, but also what kind of cable to use for the connection, minimising both immediate and long-term costs. Different constraints arise in practical applications, such as obstacles in the area that the route needs to avoid, limitations on the maximum number of cable connection at the turbines as well as at the substation(s)
32
IMPACT | SPRING 2019
and capacity limitations for each cable type. The optimisation is further complicated by the noncrossing constraint: a cable placed over another one has a high risk of damaging the existing cable, resulting in huge production and revenue losses. For this reason, cable crossings are generally forbidden in practice. We developed an original mathematical model for this problem, together with a new heuristic framework for difficult real-life cases. Our layouts proved to be not only very different from the manual layouts, but also 1–5 million euros cheaper for each wind farm. In the example of Figure 2, the optimised layout is 1.5 million euro cheaper than the manual solution.
VATTENFALL WIND FARM LAYOUT LIFTED TO THE NEXT LEVEL
The tools are now fully integrated into Vattenfall’s wind farm design process, allowing not only for very large gains, but also for a more agile overall design process. A new ‘scenario’ team has been established to use the O.R. tools for what-if analyses, where different layout options for future farms are quickly evaluated and more informed decisions made.
The results obtained for turbine and cable layout within this project were extremely successful compared to commercially available software and exceeded all expectations. ‘The usage of these tools developed by Martina Fischetti and her colleagues in the wind business can contribute more than 10 million euros in increased productivity and reduced costs over the lifetime of each wind farm, allowing us to be more competitive in the energy market. This is a good example of a smarter way to increase profitability of our business whilst also reducing our costs’, Vattenfall’s CFO, Anna Borg, comments. Vattenfall has several times tested and compared the tools with similar simulation tools and models available on the market.
This is a good example of a smarter way to increase profitability of our business whilst also reducing our costs
‘Time and time again, it turns out that Martina’s models provide better turbine locations. The tools we can buy in the commercial market do not achieve results that are comparable to her models. In other words, we have something that is fast and better than any of the things we can buy’, Head of System Design in Vattenfall, Thomas Hjort, emphasises. The optimisation tools have been used for all recent Vattenfall tenders, including Denmark’s 600 MW Kriegers Flak offshore wind project (with a record low bid of 49.9 euros per MWh in 2016) and for Hollandse Kust Zuid, in the Netherlands in 2018 (first project in the world to be built without any subsidies). All these successes allowed reductions in the cost of energy for the whole wind sector, taking a huge step
© Vattenfall FIGURE 2 DANISH HORNS REV 1: MANUAL LAYOUT (LEFT PLOT) VS OPTIMISED LAYOUT (RIGHT PLOT)
forward in making offshore wind power more competitive against fossil-based energy sources.
Indirect impact: streamlined process and innovative thinking Before these optimisation tools were developed, the definition of a single layout was a very time-consuming and experience-based process. Many of the practical limitations were not even considered in the commercial software and had to be calculated one by one in a multi-stage post-processing work flow. This was very much a pen and paper job with a team of highly experienced engineers working for weeks to obtain suboptimal solutions. Such a process very much limited the amount of simulations it was possible to perform on a wind farm project and even experienced engineers would find it difficult to get all factors included. When all the constraints on the other hand have been coded into the layout optimiser, it is just a matter of running
the tools with the desired input data. It also means that it is possible to make a lot of what-if analyses for each individual wind farm project, for instance with different turbine types or different number of turbines, in order to find the best business case for each project.
It was amazing to see how mathematics and technical knowledge could, together, make such an impact in our daily work
‘If we have results in relation to a specific number of turbines, then we can quickly test what it will mean, if we set up an extra turbine, or if we fail to set one up. We now have a tool that can respond quickly and give us optimised results. We can test different solutions and see what output they provide’, says Thomas Hjort. This allowed Vattenfall to think out of the box and try many different
combinations for each wind farm, resulting in more competitive wind farms. ‘It was amazing for me to see how mathematics and technical knowledge could, together, make such an impact in our daily work’, says Jesper Runge Kristoffersen, team leader in the System Design Group. ‘Being able to quantify the impact of design choices already at the design phase is of key importance to make informed decisions, that will impact the final business case’. The optimisers also offered Vattenfall flexibility and capability to test the newest options on the market. This is very important in this fast-developing sector. We have to remember that the farms we are designing today will be erected in 3–10 years, so we have to be flexible and include in the tenders many of the elements that may not yet be fully developed in order to get the best price. Technology is indeed evolving at a fast pace: turbines constantly become bigger and more efficient and new types of foundations and cables are designed. With the new optimisers and the other models that have been developed around them, it
IMPACT | SPRING 2019
33
our optimisation tools live, to properly evaluate any proposal which can improve the business case. In this way, the cost/benefit can be quickly assessed, and new solutions can be proposed that could be cost-effective for the company. This open discussion with the suppliers has never been seen before in Vattenfall. Martina Fischetti is Lead Engineer in the System Design group, Vattenfall BA Wind. She obtained a Ph.D. in Operational Research, entitled Mathematical Programming Models
Fischetti, M. and M. Monaci (2016). Proximity search heuristics for wind farm optimal layout. Journal of Heuristics 22: 459–474.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias.
00
Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius.
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY Satis quinquennalis fiducias imputat gulosus agricolae.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
JORS00is published 12 times a year and is the flagship journal of the Operational Research 00 Society. It is the aim of JORS to present papers which cover the theory, practice, history or methodology of OR. However, since OR is primarily an applied science, it is a major objective 00 of the journal to attract and publish accounts of good, practical case studies. Consequently, 00 illustrating applications of OR to real problems are especially welcome. papers
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias. Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
THE EUROPEAN JOURNAL OF location, logistics, Real applications of OR - forecasting, inventory, investment, maintenance, marketing, packing, purchasing, production, project management, INFORMATION reliability and scheduling A wide variety of environments - community OR, education, energy, finance, government, SYSTEMS health services, manufacturing industries, mining, sports, and transportation 00
·
Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae.
VOLUME 00 NUMBER 00 MONTH 00 ISSN: 0960-085X
EJIS
Fischetti, M. and D. Pisinger (2018). Optimizing wind farm cable routing considering power losses. European Journal of Operational Research 270: 917–930.
Contents
Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis.
and Algorithms for Offshore Wind Farm Design, from the Technical University of Denmark in 2018. Her Ph.D. work on the optimisation of wind farm design and cable routing has been awarded various international prizes, such as the Erhvervsforskerprisen (Best Industrial Ph.D.) (2019), AIRO Best Application Paper award (2018), the Best Student Paper Award at ICORES conference (2017), and finalist positions at the EURO Excellence in Practice award (2018) and at the prestigious INFORMS Franz Edelman award.
FOR FURTHER READING
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY
is now possible to think more innovatively, for instance putting a 15 MW turbine into the model instead of the present state-ofthe-art 8 MW turbine and see, how that will affect the costs and performance of the farm. Without a crystal ball available to see how the future will be, Vattenfall must try to make the best qualified guesses in cooperation with the wind turbine developers, companies constructing foundations and other suppliers. This of course requires close cooperation and proactive discussions with the manufacturers on research activities and future outlook, so that Vattenfall will be able to not only pick a turbine from the shelf of the turbine manufacturers, but also to discuss with them, and develop technology in the interest of both parties. Today, when Vattenfall discusses technologies and components with suppliers, a Vattenfall staff member in the meeting room will be running
00 00
·
00
00
· Technical approaches - decision support systems, expert systems, heuristics, networks, mathematical programming, multicriteria decision methods, problems structuring methods, queues, and simulation
Editors-in-Chief: Thomas Archibald, University of Edinburgh Jonathan Crook, University ofDov Edinburgh Te’eni VOLUME 00
T&F STEM @tandfSTEM
@tandfengineering
NUMBER 00
Explore more today… http://bit.ly/2pI5OBV MONTH 2018
34
IMPACT | SPRING 2019
© NSC
FORCE MULTIPLIER ANDREW SIMMS
BEING CHARGED WITH ANSWERING THE STRATEGIC QUESTIONS posed by protecting the UK’s security, independence and interests at home and abroad does not mean those serving in Britain’s Armed Forces are immune to the pressures placed on other professions. Budgetary restraints and widelyreported recruitment and retention problems coupled with the need to meet emerging threats from states, including Russia, mean those in uniform face the inevitable challenge of doing more with less, despite the severity of the consequences of making a wrong call.
those in uniform face the inevitable challenge of doing more with less, despite the severity of the consequences of making a wrong call
Fortunately, the Services have an analytical ally in NSC – a simulation and training specialist on a mission to ease Defence’s decision-making burden and give the military the intellectual edge on the physical and digital battlefields of tomorrow. The company was responsible for introducing computer-based wargaming to the former Army Staff College in the 1980s and it has been using its mathematics
IMPACT © 2019 THE AUTHOR
35
© NSC
FIGURE 1 MAP USED IN COMPUTER-BASED WARGAMING
and modelling expertise ever since to ensure the Armed Forces remain greater than the sum of their parts. Figure 1, for instance, shows the digital equivalent of a traditional strategic map table. Over the past three decades, NSC’s analytical acumen has been deployed across Defence’s arms and agencies and assisted in procurement processes, measuring the effectiveness of capabilities, experimentation and the training of tens of thousands of front-line personnel. The Defence Science and Technology Laboratory (Dstl) is among the beneficiaries of the Surrey-based firm, which boasts a permanent staff in excess of 40. NSC’s wide spectrum of support has ranged from qualitative visualisations to the development and evolution of campaign-level simulation models, with a focus on ensuring study teams are able to rely on the availability, utility and performance of these tools for years to come. The company has worked on three of the five mission critical models used by Dstl’s Defence and Security Analysis division – WISE, CAEn and HLCM – re-engineering code to operate in the latest computing environments and utilising multi-processor and multi-threaded approaches to increase performance. Combined with its improved user interfaces, NSC’s endeavours have delivered dramatic results, with some tasks that once took analysts ten minutes now taking just ten seconds.
36
IMPACT | SPRING 2019
In respect of CAEn [Close Action Environment], a multi-sided, computer-based wargame representing all-arms close combat battles up to company group level, simulation runtime has been enhanced by a factor of 4.9. Such a significant upgrade has enabled Dstl to deliver studies at a scale previously impossible and consequently broadened the scope and variety of studies CAEn can now undertake on behalf of Defence.
NSC’s endeavours have delivered dramatic results, with some tasks that once took analysts ten minutes now taking just ten seconds
MISSION PLANNER
NSC’s detail for digits has also delivered Mission Planner – a decision-making, machine learning toolset which uses Artificial Intelligence (AI) and is currently applied at the tactical level of combat. Potentially easing the demand on Dstl’s operational research community, it could act as a force multiplier, supporting high-intensity warfighting simulations by reducing – or in some cases eliminating – the need for complex pre-scripting of simulated combat units or human-in-theloop interactors.
Mission Planner exploits two stochastic optimisation AI techniques – genetic programming and a novel implementation of simulated annealing – and its algorithms are employed in a generic architecture that allows simple application to different problems. The toolkit represents a significant upgrade to wargaming; an approach which was in danger of becoming prohibitively labour intensive as it evolved to reflect the fast-paced, flexible battle rhythm required for modern warfighting. Rather than requiring analysts to generate the large number of order sets associated with manoeuvring units in a simulated battlespace, Mission Planner’s AI solution can quickly translate a commander’s high-level intent into lowlevel military objectives. Two decades ago such a simulation would have required in excess of 15 people inputting orders. Today only one is needed. Although already key in the present, this capacity generator is set to become even more valuable in the future as the UK’s Armed Forces adapt to the challenges of hybrid warfare. The need to emulate a blend of conventional, political, cyber and information warfare strategies creates huge data demands. Collaboration has always been a common theme of conflict, and any AI applied in the military domain, such as simulated units, actors or autonomous systems, must realistically collaborate to achieve a goal. Mission Planner achieves this effect through the use of military syntax in its algorithms that produce plans for tactical problems which resemble human decision making. While cutting-edge, NSC’s proficiency in employing AI in simulation is not a new development. The Surrey-based firm demonstrated its pedigree in the field in 2000 when it used AI to automate the air planning and air tasking roles of an operational
level simulation of a joint – land, air and maritime – battlespace. Rather than the previous four people required to enter data for each country represented in the computer-assisted exercise, the innovation allowed for a single user per alliance to enter a joint prioritised target list and have the AI – using simulated annealing – determine the most appropriate action from the air assets available.
NSC demonstrated its pedigree in the field in 2000 when it used AI to automate the air planning and air tasking roles of an operational level simulation of a joint – land, air and maritime – battlespace
HIGH-LEVEL SUPPORT
Given the depth of its experience in mathematical modelling, NSC is unsurprisingly quick to champion the substantial merits of operational research to as wide an Armed Forces audience as possible and is responsible for refining future commanders’ understanding of the role it can play in informing military decision making. The company has provided experienced analysts to the UK’s Joint Services Command and Staff College in Shrivenham for more than three years, exposing officers to the scientific and mathematical methods and skilled people available to support headquarters staff on operations. Demonstrating how analytical research can calculate likely outcomes of military actions, NSC’s teams supply answers to questions posed by students during the exercise phases of the college’s Higher Command Staff, Advanced Command and Intermediate Command and Staff courses. And NSC’s promotion of operational research is not confined to Britain’s shores. One of the company’s operational analysts spent two extended spells in The Hague helping NATO to identify the resources
needed by the multinational organisation to meet its operational ambitions. He formed part of the NATO Communication and Information Agency team tasked with quantifying the military assets and efforts required from each of the Alliance’s member nations. Working closely with defence planners, the NSC secondee assisted in consolidating and mining data from 28 contributing countries to define future minimum capability requirements and identify any potential shortfalls. Another member of the NSC team deployed to La Spezia in Italy to support NATO’s Centre for Maritime Research and Experimentation in the development of a strategy for its Multinational Shared Training concept. The role involved reviewing proposals for the structure of a combined action and implementation plan and examining the performance parameters of NATO stakeholders.
EXPERT ANALYSIS
Of course, the military’s embrace of operational research to deliver success in theatres of conflict is nothing new. It has been part of British military planning since the Second World War, when the use of mathematical models included showing that sailing supply ships in convoys reduced vulnerability to submarine attacks and that harmonising the machine guns on fighter aircraft increased lethality in a dog fight. In modern theatres it can be used to support a wide breadth of decision-making, from calculating the probability of success in land, maritime, air and counter-insurgency operations to answering logistical questions on movement and consumption rates. However, despite the demand and prevalence of the use of operational research in a military context, NSC is one of only a handful of companies to specialise in the field. Steve Yates, Account Director for Research and Cyber at NSC, says the firm’s emergence as a prime provider to the Armed Forces is
down to more than just its numerical nous and coding capabilities. ‘In addition to our understanding of languages such as Java, Python and C++, we speak our customers’ language’, he explained. ‘As a company we’ve worked with the military for more than a quarter of a century, been embedded in its training establishments, helped it to refine and develop enduring models and employ a large number of former Service personnel. Consequently, we don’t just crunch numbers, we understand the questions being asked, the problems posed and the context of any answers; we can translate computer language into military speak. There is no “analyst/soldier” divide and that is one of our greatest strengths’. This close working relationship can be seen in the lead image.
we understand the questions being asked, the problems posed and the context of any answers; we can translate computer language into military speak
VIRTUAL TRAINING
NSC’s comfort at the cutting-edge of technology is demonstrated by the British Army’s Unit Based Virtual Training (UBVT) system (see Figure 2). Designed and delivered as a managed service by the firm, UBVT immerses soldiers in high-fidelity terrains featuring authentic equipment and enables units to practise fire and manoeuvre, command and control, and tactics, techniques and procedures in preparation for live training opportunities and operational deployments. The innovative system, which is available to all Regular and Reserve units and will provide virtual training to troops until at least 2020, has already been used by more than 2000 Servicemen and women at locations across the UK, Germany and Cyprus. In an era of fiscal difficulty, simulation is essential to the Armed Forces. Rolling out
IMPACT | SPRING 2019
37
© NSC
Analysis and Conflict Research (CHACR) with its director, research staff and technical and project management experts. An independent military think-tank, CHACR is tasked with ensuring the British Army is not out-thought on operations through the considered analysis of the lessons of the past.
NSC’S MODUS OPERANDI
NSC chief executive Jeremy Spurr commented: ‘Throughout our own history we’ve had to adapt to change and stay ahead of the trend. While faster, more powerful computers have enabled us to deliver increasingly immersive simulations and more complex models, the use of hi-tech hardware has not become NSC’s default solution. Although we were among the first to see the value of using a computer to support military training we have always been acutely aware that technology must not become the driver. One would always like bigger, more complex models but those are not always appropriate. In our field, effective operational research is not about producing data for data’s sake, it is about finding the best means to provide evidence that can be relied upon, understood and used to inform decisions’.
FIGURE 2 THE BRITISH ARMY’S UNIT BASED VIRTUAL TRAINING (UBVT) SYSTEM
brigades and battlegroups onto a real-world training area is an incredibly expensive and time-consuming business whereas the cost of a unit using UBVT for a week is equivalent to the fuel used by a single tank on exercise. Being able to benefit the British Forces’ bottom line clearly puts providers of synthetic training in the box seat and the continual advancement of augmented and virtual reality
38
IMPACT | SPRING 2019
platforms hints at an even brighter future for the simulation and modelling sector.
LEARNING LESSONS
However, despite spearheading the adoption of such technologies, NSC also has one eye firmly trained on the past. True to the company’s operational research roots, the consultancy provides the Centre for Historical
effective operational research is not about producing data for data’s sake, it is about finding the best means to provide evidence that can be relied upon, understood and used to inform decisions
Andrew Simms is the managing director of TylerBale Communications. An experienced journalist and editor, he has worked with and alongside the Armed Forces for nearly two decades and reported from an array of operational theatres, including Iraq, Afghanistan, Kosovo and Macedonia.
© Kateryna Kon/Shutterstock.com
GOING VIRAL BRIAN CLEGG
IF, LIKE THE AUTHOR, YOU HAVE EVER SUFFERED FROM NOROVIRUS, also known as the winter vomiting bug, you will know how debilitating it is. The symptoms of explosive vomiting, diarrhoea, headaches and fever cause misery, particularly when the virus breaks out in a confined location such as a cruise ship or hospital. It’s estimated that each year about 3 million infections occur in the UK; worldwide there are between 500 and 1,000 million cases, resulting in over 200,000 deaths, particularly among the weak and vulnerable.
In crude financial terms, there are annual global costs of around $60 billion (£80 million in the UK alone). Getting a better understanding of the different routes by which the disease spreads offers a real hope of improved prevention.
FOODBORNE OR PERSON TO PERSON
My own infection was blamed on a hotel breakfast, but norovirus is particularly good at finding ways to contaminate people and there
IMPACT © THE OR SOCIETY
39
are plenty of options available. In practice, norovirus is more commonly spread directly from person to person, but there has been a relatively poor understanding of foodborne routes, which is why the UK’s Food Standards Agency (FSA) commissioned an Operational Research study to model the transmission methods in the hope of better targeting of interventions.
norovirus is more commonly spread directly from person to person, but there has been a relatively poor understanding of foodborne routes
David Lane, an Operational Research expert from Henley Business School, worked alongside colleague Elke Husemann, and Darren Holland and Abdul Khaled from the FSA, to model the transmission mechanisms. Lane is Professor of Business Informatics at Henley Business School, with mathematics degrees from Bristol and Oxford and a doctorate in mathematical modelling from Oxford, after which he moved on to oil company Shell. The work here involved Operational Research techniques, and Lane enjoyed the challenges, but after a few years wanted an opportunity to have time to develop new approaches. The result was a move back to academia, but with a practical edge gained from his experience. Lane notes: ‘I haven’t really looked back – though perhaps the main development has been my attitude to teaching. Pretty quickly I found teaching very fulfilling and seeing it as a privilege to be able to work with clever young people, talking about things that enthuse me and trying to enthuse
40
IMPACT | SPRING 2019
them, all the time watching them get better and make the best of what is inside them. It’s a joy.’ The theoretical/ practical mix continues, as is clear in this project. Norovirus itself is a tiny bug from the family Caliciviridae, around 30-nm long. Like all viruses, norovirus is not a complete living organism, but makes use of the molecular machinery within its host’s cells to reproduce, damaging the cells in the process. Norovirus is not treatable using antibiotics and spreads from person to person when as few as ten individual virus particles make their way into a new host, usually via vomit or faeces. Direct transmission can take place through physical contact with a victim, touching a contaminated surface or when the virus particles become airborne. However, norovirus is hardy and can survive for a considerable period of time away from a host. As a result, it can also be spread via secondary routes through food and water. Some involve contact of a human carrier with the foodstuff, but mechanisms can also be surprisingly indirect. For example, norovirus particles in sewage can be ingested by shellfish which if eaten uncooked, pass the virus onto a new host.
SYSTEM DYNAMICS
The approach taken by the researchers was System Dynamics (SD). This technique, devised at the Massachusetts Institute of Technology by Jay Forrester in the 1950s, uses engineering-style thinking, in a highly creative manner, to simulate business and socioeconomic systems. David Lane: ‘SD is a computer simulation modelling approach which helps users understand the complex behaviours that systems produce over time, with a particular emphasis on the consequences of different policies.’
The team used a common structuring of potential sufferers known as SEIR: susceptible, exposed, infectious and recovered. Their model pulls together the different pathways by which the virus can be transmitted and, where possible, allocates numerical values to compare the risks from different infection routes. The FSA was initially doubtful that the data existed to make the model fully viable, but it was hoped that it would give a better understanding of the opportunities for mitigation, as well as indicating where further research was required. The direct P2P (person to person) aspect of norovirus transmission had already been mapped out by a previous research project commissioned by the FSA in 2004, so it was possible to incorporate an updated version of this structure into the new model, constructed using the software Vensim. This product makes use of the standard SD concepts of stock (for example, levels of money or commodities) and flow (transfer of stock contents from place to place).
Systems Dynamics is a computer simulation modelling approach which helps users understand the complex behaviours that systems produce over time, with a particular emphasis on the consequences of different policies
An example of the stock and flow aspects of the P2P transmission is shown in Figure 1, where we see the influences on and connections between individuals in the different states. A number of parameters feed into this, including ‘Beta’, which is the number of social encounters a susceptible person engages in which
FIGURE 1 STOCK/FLOW DIAGRAM ILLUSTRATING THE PERSON-TO-PERSON SECTOR OF THE MODEL
could result in exposure should they encounter an infectious person. There are also tweaking mechanisms such as the ‘Weighting for Infectious Asymptomatics’. This reflects the number of those infected with norovirus but not showing the symptoms, who will shed virus particles and potentially infect others. In Figure 1, a term ‘Theta’ feeds into the exposure rate: this was a catch-all in the earlier study to deal with situations when the virus is delivered via food. The aim of the new study was to replace Theta with a detailed model of foodborne routes that the virus could take from infected person to a new host. To build the model, the researchers started with a literature review and created simple models, then in an iterative process with experts from the FSA and other health bodies enhanced the model for each possible sector of foodborne transmission. After interviews, discussions and sharing of the developing models with
the experts, a final facilitated workshop was used to discuss functionality and provide feedback. Part of the importance of this process was buy-in from the clients. Rather than just being presented with results, the participants were involved throughout to ensure that they were happy with the models, could provide corrections and were able to feel that they supported the outcomes. In putting the model together, four different foodborne sectors were identified: • the contamination of shellfish by virus-carrying sewage discharged into the ocean; • contamination of fruits, salads and leafy vegetables from ‘sludge’ – processed sewage used as fertiliser; • contamination in the supply chain of this kind of produce eaten uncooked (e.g., from workers on the farm or packers);
• transport by other foodstuffs contaminated during their production, for example food handled by infected kitchen workers. In both final sectors, infections in the home and commercial kitchens were separated. Each sector had its own stock and flow diagrams – Figure 2 shows the bivalve shellfish sector.
IMPROVING THE ESTIMATES
One of the useful outcomes of the process was being able to divide the parameters in the model – the values input to make it run – into four categories shown in Table 1. One dimension here is whether a parameter is fixed – that is not capable of being altered as a preventative measure – such as the population – or whether it can be changed by taking precautions. The second dimension was whether the value was currently known or unknown,
IMPACT | SPRING 2019
41
FIGURE 2 STOCK/FLOW DIAGRAM OF THE ‘BIVALVE SHELLFISH’ SECTOR OF THE MODEL
highlighting opportunities for further research. Darren Holland from the FSA commented that the study ‘produced a whole list of knowledge gaps which would be needed to be filled to run the model properly and understand the issue further.’ A second important result was getting a better feel for the relative scale of P2P and foodborne routes of infection. This was one of three key parameters where new estimates were developed. These critical values had been estimated previously, but more recent data provided startlingly different numbers. For example, the proportion of those capable of spreading the virus
KNOWN VALUE UNKNOWN BUT RESEARCHABLE VALUE
but so far not showing symptoms had been estimated in 2004 at 0.003. But a value of 0.12 from a 2010 study was agreed to be better. Clearly, a value 40 times bigger made a significant difference to the outcome. The other two changed parameters again illustrate the difficulties arising in collecting accurate data. The simple matter of the incidence rate for norovirus is hard to pin down as the disease often isn’t reported. A 1999 estimate for the annual incidence rate in the UK was 3.7 million, but the new model used a better estimate based on large scale study from 2012, putting
FIXED
ALTERABLE
Fixed, known
Alterable, known
Fixed, unknown
Alterable, unknown
TABLE 1 A 2 × 2 ORGANISING FRAMEWORK FOR THE PARAMETERS IN THE MODEL
42
IMPACT | SPRING 2019
the annual incidence rate at 2,905,278. Note that this strangely accurate sounding number is the result of taking a midpoint in a distribution. The real figures were ‘between 2,418,208 and 3,490,451’ – and this was only to 95% confidence levels, meaning that in 1 case out of 20 the value would fall outside this range. The final parameter was the proportion of P2P and foodborne cases itself – vital to understand in order to decide the amount of effort put into different prevention measures. Earlier estimates put the fraction of foodborne cases at 0.107, but the most recent calculated value (which also involved clarifying exactly what concept was being measured) was 0.02527 – making the foodborne route significantly less common. Given these values, it was possible to use the model to see how sensitive the spread of the disease was to changes in parameters. This is where the most
fascinating outcomes arose. Although the estimated fraction of foodborne cases is very low at around 2.5%, a relatively small change in that number can have a disproportionate impact on overall incidence of the disease, as seen in Figure 3.
The analysis provided a better understanding of the seasonality implied in the name ‘winter vomiting bug’
The sensitivity analysis also provided a third important discovery in a better understanding of the seasonality implied in the name ‘winter vomiting bug’. Infections definitely peak in the winter months, but researchers thought it unlikely that this was simply due to people spending more time together indoors in closer proximity during the winter. However, the sensitivity analysis showed that the expected level of increase could indeed be put down to that small change in behaviour.
A 10% reduction [of P2P transmission] would bring incidence down 75% and a 20% reduction would bring norovirus infection down by almost 90%
CONTROLLING THE SPREAD
Improvement in understanding is a valuable scientific outcome in its own right, but the FSA undertook this research to help understand how to reduce the spread of norovirus and where interventions might be most beneficial. The model suggests that reducing foodborne transmission by 20% would result in a 9% reduction in all cases – a valuable outcome. This is due, in part,
FIGURE 3 NOROVIRUS OBSERVED INCIDENCE RATE (MILLIONS CASES/YEAR)
to the reduction of secondary spread of the disease. For example, someone who becomes ill from a food source may then infect others they come in contact with – if they don’t get ill, they can’t spread it. But reducing P2P transmission has a much greater impact. A 10% reduction [of P2P transmission] would bring incidence down 75% and a 20% reduction would bring norovirus infection down by almost 90%. This led to the recommendation that, while foodborne intervention is important (and often easier to undertake), P2P interventions should not be over-looked as they could easily be a highly effective way of making a difference. There is still further data to be refined to ensure that the model is working at
its best, and a continuing FSA study is looking at clarifying parameters that currently have unknown values, but the sensitivity to P2P transmission emphasises the importance of improving P2P hygiene – the report’s authors give the example of the installation of more hands-free taps. Lane: ‘the FSA should consider teaming up with others because P2P is no less important than [foodborne] and hence the FSA/NHS Choices initiative’ – this is a forecasting model using surveillance data from surgeries and hospitals and social media comments to forecast potential peaks in cases, so that food hygiene messaging can be timed to have maximum impact. Self-contamination of food is also a consideration. A 2018 report
IMPACT | SPRING 2019
43
discovered that bacteria often found in faecal matter were present on the touchscreens of several McDonalds restaurants in the London area. Though this report did not discover norovirus, it is representative of a larger concern that we rarely wash our hands before eating hand-held food outside the home. Making consumers more aware, and making it easier to wash hands before eating, could make inroads into transmission of the disease.
Our original idea was for a small scale, proof of concept work, while what the team produced was a much more thorough model and report which was a big bonus
For the moment, as Darren Holland from the FSA explains, the norovirus work has done more than was expected: ‘Our original idea was for a small scale, proof of concept work, while what the team produced was a much more thorough model and report which was a big bonus.’ We won’t eliminate norovirus – but this research could help us better understand how to reduce the impact of a highly unpleasant infection.
FOR FURTHER READING Lane D., E. Husemann, D. Holland and A. Khaled (2019). Understanding foodborne transmission mechanisms for norovirus: A study for the UK’s Food Standards Agency. European Journal of Operational Research 275: 721–736.
00
Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias.
00
HEALTH SYSTEMS
Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae.
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis. Perspicax agricolae suffragarit Augustus. Suis vocificat fiducias. Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius. Satis saetosus ossifragi agnascor incredibiliter perspicax apparatus bellis. Satis quinquennalis fiducias imputat gulosus agricolae. Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
00
VOLUME 00 NUMBER 00 MONTH 00 ISSN: 0960-085X
Health 00 Systems is an interdisciplinary journal promoting the idea that all aspects 00 of health and healthcare delivery can be viewed from a systems perspective. The underpinning philosophy of the journal is that health and healthcare systems are 00 characterized by complexity and interconnectedness, where “everything affects 00 everything else”. Thus, problems in healthcare need to be viewed holistically as an 00 integrated system of multiple components (people, organizations, technology and resources) and perspectives. The journal sees the systems approach to be widely 00 applicable to all areas of health and healthcare delivery (e.g., public health, hospitals, 00 primary care, telemedicine, disparities, community health). Hence, the principal aim of 00 the journal is to bring together critical disciplines that have proved themselves already 00 in health, and to leverage these contributions by providing a forum that brings together diverse viewpoints and research approaches (qualitative, quantitative, and conceptual).
THE EUROPEAN JOURNAL OF INFORMATION SYSTEMS
Co-editors Sally Brailsford, University of Southampton, UK Paul Harper, Cardiff University, UK Nelson King, Khalifa University, United Arab Emirates Cynthia LeRouge, Florida International University, USA
Dov Te’eni
VOLUME 00
T&F STEM @tandfSTEM
@tandfengineering
NUMBER 00
Explore more today… http://bit.ly/2GgCKYq MONTH 2018
44
IMPACT | SPRING 2019
EJIS
Apparatus bellis corrumperet Medusa, quod fiducias amputat verecundus suis.
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY
Contents
Saburre miscere Aquae Sulis. Pessimus tremulus matrimonii insectat Octavius.
Brian Clegg is a science journalist and author and who runs the www. popularscience.co.uk and his own www. brianclegg.net websites. After graduating with a Lancaster University MA in Operational Research in 1977, Brian joined the O.R. Department at British Airways. He left BA in 1994 to set up a creativity training business. He is now primarily a science writer: his latest title is Professor Maxwell’s Duplicitous Demon, a scientific biography of James Clerk Maxwell.
I N T E R - O C U L A R A N A LY T I C S Geoff Royston It is said that one picture is worth a thousand words. For the analyst this might be reformulated as one diagram being worth a hundred numbers. The need to present complex quantitative information in a manner that is both accessible and efficient has led to the rise of “infographics,” “data visualization,” and “visual analytics.” This column is prompted and informed by the justpublished The Art of Statistics by David Spiegelhalter, Chair of the Winton Centre for Risk and Evidence Communication at the University of Cambridge (and a memorable past OR Society Blackett lecturer) and also by the historic works of “the world’s leading analyst of graphic information,” Edwin Tufte, The Visual Display of Quantitative Information and Visual Explanations. In The Art of Statistics Spiegelhalter provides an expert, entertaining and largely non-technical guided tour of key statistical ideas and issues, illuminated throughout by real-world problems ranging from the commonplace, such as the benefit of taking a daily statin pill, to the more exotic, such as the probability that the skeleton found in a Leicester car park really was that of Richard III. A key message in Spiegelhalter’s book is that quantitative analysis is just one component
of a problem-solving cycle which also includes understanding the problem and communicating conclusions. In keeping perhaps with the title of his book, I would like to focus here on the use of data-based diagrams and charts to impart understanding, suggest or test ideas, and inform decisions and actions. Such use is relatively modern, dating back to about 1750. An early example (included in Tufte’s first book) is the visually striking representation of quantitative information in the classic chart, Figure 1, by the engineer Charles Joseph Minard, of Napoleon’s fateful campaign of 1812–13 in Russia. It vividly portrays the destruction of his army, dogged by disease during its advance and by a bitterly cold winter in its retreat, through mapping, across both space and time, the numbers of remaining soldiers as a shaded ribbon of continually shrinking width.
GRAPHICS FOR UNDERSTANDING PROBABILITY AND RISK
In an earlier column (“Wrong Numb3rs,” Impact Autumn 2015) I discussed (mis)perceptions about health risks. Many people are confused by traditional expressions of probability
FIGURE 1 MINARD’S CHART OF NAPOLEON’S RUSSIAN CAMPAIGN
IMPACT © THE OR SOCIETY
45
One such chart (see Figure 3, from the Department of Health’s clinical audit report by Professor Richard Baker) shows how the time of death of Shipman’s patients differed markedly from those of patients of other local family doctors. Overwhelmingly, his patients tended to die in the early afternoon. Spiegelhalter points out that the discrepancy does not require subtle analysis, indeed that it can be described as “inter-ocular,” “since it hits you between the eyes.” Another “inter-ocular” graphic can be found in one of Tufte’s historical – indeed historic – examples; the physician John Snow’s map (Figure 4) on which he plotted the distribution of cases of cholera in the London epidemic of
FIGURE 2 PROBABILITY DOTS
and by the difference between absolute and relative risk – including many clinicians, who consequently may advise patients incorrectly – but understand much better when probabilities and risks are expressed in simple visual formats. Spiegelhalter gives an example of this, in discussing the health risks of eating a bacon sandwich every day, with an estimated increase in lifetime risk of bowel cancer of 18% (the relative risk) compared to bacon sandwich non-eaters, who have a lifetime bowel cancer risk of 6% (the absolute risk). To illustrate this he uses a simple matrix or array diagram (redrawn here in a basic dot form; see Figure 2) – the absolute risk is shown by shading six out of a hundred dots and the increase of 18% in relative risk – unfortunately the way in which risk differences are often described in the media, perhaps because it sounds a lot more dramatic than the 1% increase in absolute risk to which it here equates (6% × 1.18 = 7%) – would then be shown by shading just one additional dot. And that is eating 50 g of bacon every single day of every year; maybe enjoying the odd bacon sandwich is a tolerable risk after all!
FIGURE 3 SHIPMAN: DEATH IN THE AFTERNOON. FROM PROFESSOR RICHARD BAKER “HAROLD SHIPMAN’S CLINICAL PRACTICE, 1974–1998.” LONDON: HMSO, 2001. CONTAINS PUBLIC SECTOR INFORMATION LICENSED UNDER THE OPEN GOVERNMENT LICENCE V3.0
GRAPHICS FOR SUGGESTING AND TESTING IDEAS
Several simple and revealing graphics, discussed in The Art of Statistics, that could have raised life-saving questions (but which, unfortunately, were constructed only after the events), concerned the infamous Dr Howard Shipman, who is thought to have murdered over 200 of his patients.
46
IMPACT | SPRING 2019
FIGURE 4 SNOW’S LONDON CHOLERA MAP
1854. This revealed a concentration of cases in households near to a particular water pump (in Broad Street) but, tellingly, a comparative lack of cases in a nearby workhouse and brewery, which had their own separate water supply, and a few isolated cases further afield, which investigation showed to have been of people out of the area who nevertheless obtained their water from Broad Street. This supported Snow’s theory that that cholera is transmitted not by air, as was commonly believed, but by contaminated water.
GRAPHICS FOR SUPPORTING DECISION MAKING
Although the most important outcome of Snow’s map concerned epidemiological thinking, it also had an immediate practical result (the famed removal of the handle of the Broad Street pump). So, did Florence Nightingale’s influential 1858 chart (Figure 5, oddly not included by Tufte) of soldier mortality in field hospitals in the Crimean War. This indicates the number of deaths that occurred from preventable diseases (in, now faded,
blue), those that were the results of wounds (in, now faded, red), and those due to other causes (in black). The graphic demonstrates that many more soldiers died from disease than from wounds, and also shows the decline in these preventable deaths that followed the introduction of Nightingale’s sanitary reforms in the field hospitals. Her chart and Snow’s map were not only clear descriptions, but also led to practical action – they were some of the first visual decision support tools. A disastrous decision that might well have been avoided by use of simple but revealing graphics was the go-ahead to launch the Challenger space shuttle on January 28th, 1986. Tufte describes how misleading selection and presentation to senior managers of data on booster rocket damage contributed to a decision to launch in freezing cold weather, with tragic results. The immediate cause of the disaster was found to be failure of O-ring seals in joints in the booster rocket casing. The rubber seals had lost resilience in the very low ambient temperature, with consequent catastrophic leakage of burning gases. The engineers, aware of this potential problem, had pointed out that no launch had
FIGURE 5 NIGHTINGALE’S DIAGRAM OF MORTALITY IN THE ARMY IN CRIMEA
IMPACT | SPRING 2019
47
information, well-designed data graphics are usually the simplest and at the same time the most powerful.” Which begs the question, what makes for a well-designed data graphic? Spiegelhalter quotes from another expert in the field of data visualisation, Alberto Cairo (holder of the Chair in Visual Journalism at the School of Communication of the University of Miami and author of the acclaimed book, The Truthful Art: Data, Charts, and Maps for Communication). Cairo’s principles for a good data graphic are: • it contains reliable information • the design has been chosen so that relevant patterns become noticeable • it is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth • when appropriate, it is organised in a way that enables some exploration
FIGURE 6 SPACE SHUTTLE O-RING DAMAGE AND LAUNCH TEMPERATURE
WHAT MAKES FOR A GOOD GRAPHIC?
Tufte states “At their best, graphics are instruments for reasoning about quantitative information…. of all methods for analysing and communicating statistical
48
IMPACT | SPRING 2019
The last feature can be aided by interactive graphics. Obviously these cannot be shown in print, but the TED talks by the late, great, Hans Rosling provide some wonderful examples. If you have not seen one, do take a look (https://www.ted.com/talks/hans_rosling_shows_the_best_ stats_you_ve_ever_seen), and you will be in for an interocular analytical treat.
© Penguin Random House
ever been made below 53 °F and advised not launching at lower temperatures. Why had their advice not been followed? The engineers were thinking analytically (and correctly), but they did not display their thinking in a sufficiently compelling way. In deliberating on whether or not to launch in such cold weather, historical data was presented on launches only where the most worrying damage to the rubber O-rings had occurred, for which there were just two cases – one was on the coldest day of a launch (53 °F) but the other was on a hot day (75 °F). With only two seemingly inconclusive cases to go on, the engineers were overruled. What was missing was a presentation of data on seal damage and launch temperature, not only of the two launches that had shown the most worrying seal damage, but also of the other 22 launches, most of which had shown little or no damage to the booster rocket seals. Tufte remarks that omitting this latter data was “as if John Snow had ignored some areas with cholera and all the cholera free areas.” Figure 6, recreated from data in Visual Explanations, shows that every launch below 65 °F showed some damage to the vital seals, compared to about 15% of those above that temperature. If such a chart had been shown to NASA, would they have risked launching the shuttle in such cold weather?
Dr Geoff Royston is a former president of the OR Society and a former chair of the UK Government Operational Research Service. He was head of strategic analysis and operational research in the Department of Health for England, where for almost two decades he was the professional lead for a large group of health analysts.