AIOPS++
How to train your AI algorithm Successful AI algorithms are built on a foundation of training data, but sourcing data that fits your needs and meets volume requirements is harder than you might think. Particularly when it comes to developing AI-driven applications and smart voice assistants. RICHARD DOWNS, DIRECTOR NORTHERN EUROPE, APPLAUSE BUSINESSES FACE several challenges when it comes to training their algorithms to respond to real-world scenarios. Sourcing data at scale is extremely challenging. Businesses need to be able to leverage large and diverse samples, or crowds, of people representative of their target market. It takes a dedicated resource to deliver projects of this scale. In effect, a crowdtesting (or distributed testing) solution, which provides businesses with access to a global community of skilled testers who work remotely. This model provides an embedded infrastructure that can be scaled up or down to meet requirements. Enterprises and consumer brands have been using crowdtesting services for over a decade. Crowdtesting has become a well-established model that operates in tandem with in-house teams to complement integrated QA testing. Traditionally used to test apps, websites and other digital properties, crowdtesting has become integral to sourcing the data needed to train AI algorithms. It provides businesses with the scope
84
WWW.DIGITALISATIONWORLD.COM
l
ISSUE II 2021
l
COPYRIGHT DIGITALISATION WORLD
and scale they need to bring new AI applications to market. Despite the advantages this model offers there are still a number of challenges businesses need to address. Here we explore three of the key challenges businesses face when sourcing training data. 1. Quantity of data sources Enormous amounts of data are required to develop an effective algorithm. In the case of training a smart voice assistant developed for the UK market, the algorithm required over 100,000 voice utterances. This eventually required utterances from 972 unique people who were sourced from almost every corner of the UK. In another example, a business needed to train its AI algorithm to read handwritten documents. The brief was to deliver thousands of unique handwriting samples. The quantity of individuals was a critical