7 minute read
Our Lessons Learnt as a Data-Science Team
By the Allan Gray Data Science Team
It is widely quoted that to date over 80% of data science related projects generally do not make it past an experimental phase into production. Nevertheless, it is also reported that executives place continued value on these projects despite their low implementation rate. An obvious question to ask is why such a disparity between implied value and tangible results exist?
OUR JOURNEY as a data science team has been no exception. Looking back over the past three years provides a trail of valuable lessons learnt to unlock true value as a data science team within an established corporate. Although we recognise that every data science team follows an invariably different journey, we believe there are some universal truths to ensure optimal value is delivered.
To start off, it is important to provide some context for why and how the data science team at Allan Gray was formed. Allan Gray is an investment management company with the aim of creating long-term wealth for our clients, while placing immense value on client service. The data science team was formed to innovate in the retail space within the business. The team formally sits in the retail IT department, with a cross-functional capability into the rest of the business in areas such as: operations, distribution, product development, risk, and client experience. Our team essentially focuses on identifying areas in the business where leveraging a data science capability can for example optimise a process or provide actionable insights. Our projects are therefore a mix between research and development. It took us a while to get to a state of congruence, some of our valuable lessons learnt are summarised below
Establish a sense of identity
As a new team in the business, it was important to take a step back and reflect on where we could add the most value. Being performance driven, we initially put a lot of pressure on ourselves to deliver something tangible that could be used by business. We started off almost entirely delivery focused, not assessing what distinguishes us from the other IT teams. Taking a step back made us realise that there is a lot of value in research-driven tasks. We started setting-up regular chats with various business units to understand what they were struggling with and where we could help. Soon word was out that there was a nimble team with a cross-functional capability.
While establishing a sense of identity is important, we found that setting the right expectations is crucial. Machine Learning is not required for every problem; some problems require rudimentary analytics to yield the desired output, while others require a combination of both. We soon realised that we required a team project policy document to better understand what stakeholders required. This document essentially determines whether the project is more of a business intelligence problem, a robotic process automation task or systems integration task. It is also important to make it clear that there is often a trade-off in accuracy depending on the type of approach you implement. Machine learning and data science are often used quite loosely in a business context, with the misconception that employing these may magically solve complex problems. Be open and honest with regards to what data science is and is not.
Ensure there is buy-in from executives
It is likely an executive decision to start a data science team within an established corporate. The reasons behind building the data science capability may differ from company to company. The reasons could include a strategic intent, experimentation, and innovation. Some companies provide a data science service or product as the core of their business offering. Our team is different in the sense that the business can operate without us. What we provide is supplementary to the rest of the business. This provides us with more freedom to experiment and to innovate where possible. Our team has regular check-ins with a steering committee where we can share the latest developments in our field and what we have been working on. These sessions allow the executives from the various departments to share idea and helps us, as a data science team, see the bigger picture. The output of these meetings consists of a short- and long-term roadmap, thus ensuring there is a shared vision for the projects we will work on.
Educate and share ideas
Every week our department hosts Tech Thursday. This is a great platform to share ideas with the wider department and potentially interested stakeholders. The format is quite informal: an hour-long slot, anyone from Allan Gray can attend and any tech related topics are welcomed. As a team we regularly give updates at this event with the aim of introducing general data science related topics and an overview of what we have been working on for the past couple of months. We generally get quite a bit of interest from other departments because they attended these events. While presenting on work we have done, other potential stakeholders often get an aha moment and join the dots with how our teams could collaborate. Informal chats are just as valuable as presentations or formal meetings. A lot of our best ideas came from spontaneous chats in the canteen or while having a coffee. These informal chats do not rely on a formal agenda or a business plan. This arguably allows for more creative thinking without the constraints of meeting a business requirement. As a team, we also learn a lot about the business from these chats while being able to efficiently share ideas.
Fail quickly
Often a lot of emphasis is placed on quick wins; this is important to demonstrate the initial capability of a team, especially when it is a new domain. However, the timeline of various projects may differ drastically. It is therefore important to clearly define a definition of done as early as possible. Data science projects frequently go through many iterations, with no guarantee of success. We use a proof of concept phase to test whether it is feasible to take a project any further. This enables us to fail quickly and not waste valuable resources trying to force a solution. This is also coupled with setting the right expectations upfront; there is no guarantee that a data science approach would yield the desired outcome.
Research problems
Research forms a big part of what we do: we approach problems against the grain and with the intent of extracting novel insights from data. Various business units within Allan Gray approach us with a problem and we have come to the realisation that a sound approach is to reframe the problem into a research question using the scientific methodology. The first, and arguably, most valuable step is to formulate a hypothesis for the problem. This allows both the data science team and business team to place a yardstick, which acts as a reference point to measure success or failure for a problem. Doing more research also exposes the team to valuable domain knowledge that could potentially turn into a long-term project in the future.
Think long-term
It is important to develop solutions that may be improved incrementally, with a focus on feature engineering and enhancement. The reuse of models is important to ensure quick turn-around time in the future. It is thus important to keep functions as generic as possible with sufficient documentation to streamline the process of jumping between various projects. Some data science objectives may also seem unattainable in the short-term; there is value in recognising this. Currently, there may not be sufficient or the right kind of data for a project, but plan and ensure that you do not waste an opportunity now that may be valuable in the future.
Augmentation versus automation
Our team places immense value on augmentation. We acknowledge that automation could improve efficiency and reduce costs. However, keeping a human in the loop and focusing on augmentation has its own benefits. Firstly, there are various tasks a human simply does better than a machine. Providing augmented insights may free up time in a consultant’s day to focus on more valuable tasks and do less menial work. As a result, new human-defined insights may even be derived from the data. Secondly, the training of new consultants may be a lot faster, thus saving a business immense time- and financial-costs.
Keep it simple
A problem is as complicated as you make it. We try not to reinvent the wheel. Instead we use as many open-source packages as possible. We chose Python as our main programming language, mainly due to the many open-source machine learning libraries available. This allows us to focus more time and energy on understanding the problem, feature engineering and the system integration if the project makes it into production.
We still have a lot to learn as a team, but that is part of the fun ai