7 minute read

Machine learning isn’t hard with OpenML

Developed in the heart of the Brainport region, boasting more than 150 thousand users worldwide and yet, the Open Machine Learning platform is largely unknown in the local industry. With the support of initiator Joaquin Vanschoren, Georgo Angelis from TUE’s High Tech Systems Center and Eindhoven AI Systems Institute wants to change that with his startup PortML.

Alexander Pil

Advertisement

When Joaquin Vanschoren started developing the Open Machine Learning platform about six years ago, it was out of need and out of frustration. As a researcher at the KU Leuven, he kept running into the same walls when he wanted to use machine learning techniques. “How can I get access to many datasets? And how can I properly compare di erent machine learning algorithms?” describes Vanschoren some of his daily obstacles. “ e challenge was – and still is – that most datasets aren’t accessible or, at least, require weeks of work before they’re useful. Moreover, what’s published in research papers is often very di cult to reproduce, if it’s even possible at all. Especially when there’s a commercial company behind it, they contain a lot of marketing. When you try for yourself, it often doesn’t work.”

Vanschoren started the OpenML platform as an open-source project because his ambition was too big for one person to achieve. His initiative was quickly picked up by the research community and now about twenty people are contributing to the tool. “Mostly volunteers,” says Vanschoren, currently an associate professor at Eindhoven University of Technology. “Initially, they were predominantly PhDs who, like myself, were struggling with the same challenges but now, more and more people from the industry are getting involved.”

Credit: TU Eindhoven

Joaquin Vanschoren from OpenML Foundation: “OpenML is a gathering place for all machine learning research worldwide.”

Credit: TU Eindhoven

Georgo Angelis, High Tech Systems Center: “Within PortML, we’re working on a version of OpenML for the industry.”

Credit: OpenML

SMEs will benefi t from the machine learning recipes developed by Michelin chefs.

e basic idea behind OpenML is that it should be an open platform where datasets are easily available and where you can nd algorithms that are relevant to your problem. “An accessible interface to all machine learning research,” summarizes Vanschoren. At the moment, OpenML serves a community of about 150 thousand users worldwide. Understandably, a similar tool wasn’t available. “Commercial parties have little interest in transparency. ey rather hold their cards close to their chest. However, they can bene t from having such a platform for internal use – something we realized quickly during the development. Large companies like Amazon have their own tools, of course, but for most companies and organizations, it’s unfeasible to do it themselves.”

Enterprise version

It’s precisely that last point that triggered Georgo Angelis from TUE’s High Tech Systems Center and the Eindhoven AI Systems Institute (EAISI) to start his own company, PortML, in collaboration with the OpenML Foundation and the Eindhoven university. “In the academic world, OpenML has many users but in industry, the platform is still largely unknown,” says Angelis. Although the tool is available for anyone, and companies could start right away, there’s some reluctance. “ at’s understandable considering the open character of OpenML. e uploaded datasets, the models, the algorithms – it’s all public. Commercial companies aren’t too keen on sharing these kinds of data with everyone.”

Speaking with potential industrial users, Angelis notices that many are interested but that they indeed aren’t happy with releasing all their valuable data. “Within PortML, and supported by the OpenML community, we’re working on an enterprise version,” tells Angelis. “Currently, we’re in the pilot phase with several companies to really understand which features are required from an industrial point of view.”

“Openness is pivotal for OpenML, but companies want to safeguard their data,” Vanschoren adds. “PortML tries to nd the middle ground by building a platform that combines the advantages of access to the latest research with the requirements from industry.”

Michelin chefs

Angelis expects the rst beta version of OpenML for industrial users to become available this quarter. “Data scientists are specialists who use every tool they can nd to optimize their machine learning ow,” An-

How does OpenML work?

OpenML is an online and open platform for machine learning that consists of thousands of datasets, algorithms and tasks. “ e core of the platform is formed by more than 21 thousand, well-annotated datasets,” says Georgo Angelis from High Tech Systems Center. “ at’s a seriously big pool of data directly available for all kinds of machine learning experiments.” Users can upload new datasets themselves, making sure the OpenML platform keeps growing continuously. e second axis in OpenML are the tasks, for instance, for classi cation, regression or clustering. “You can use the results others have shared to get to an algorithm for your problem,” Angelis explains. “Imagine you want to distinguish apples from pears. en you go and nd the right dataset, look for a similar classi cation task and see what algorithms others have found. You’ll get an overview of which recipes scored

best. You can try those on your own dataset and maybe even improve them. Finally, you share the new recipe in the OpenML database.”

Joaquin Vanschoren from the OpenML Foundation: “OpenML is a gathering place for all machine learning research worldwide. You’ll nd problems similar to your own and see what solutions work best.” A luxury problem for OpenML is that it contains so many datasets and tasks that it’s di cult to see the forest for the trees. “Largely, we’ve solved this with an advanced search functionality but we also use machine learning to better organize all data.” is last step is essential because sometimes it’s not enough to cluster datasets and tasks around topics like healthcare, sports or industry. An algorithm working splendidly to map the stars might also be the best solution for recognizing skin conditions. “We tackle that machine learning challenge with machine learning”, says Vanschoren.

Credit: OpenML

OpenML shows which algorithms performed best.

gelis points out. “ ey can approach OpenML from their toolset through an API. ey can keep using their own environment – whether it’s Python, or R, or any other suite – and use the stand-alone version of OpenML to organize all data, algorithms and models, and make them suitable for reuse.”

Does OpenML also make sense for smaller companies that more often than not lack an in-house data scientist? Angelis de nitely thinks so: “Bigger companies can use OpenML to improve the e ciency of their process, increase the quality and automate several steps. For smaller companies, OpenML lowers the threshold to start with machine learning. With some mouse clicks, they have access to what’s already available and reach a su ciently good solution. at result may not be next-level, but it will surely be a big step in the right direction. ey’ll bene t from the recipes created by Michelin chefs.”

Vanschoren: “When you want to build a machine learning model, you need to take an incredible amount of decisions. Which algorithms, which models, which parameters, to name a few. Currently, you need a PhD – or at least someone with a lot of experience in machine learning – to create e cient models. Because there’s so much data and metadata available in OpenML, we can learn from ourselves. We use machine learning to decide what will work and what won’t.” at’s the research eld of automatic machine learning, or AutoML, focusing on good search algorithms that nd the best solution for a given dataset. “ e resulting solution may not be a panacea but for many small and medium-sized companies, it can be very insightful to experience what they can do with machine learning, and it will give them a perfect starting point to build on.”

Angelis is looking emphatically for collaboration with the outside world. e companies interested in the pilot phase are involved in healthcare, manufacturing and mobility. “We at HTSC/EAISI want to contribute to the power of the Brainport region. So obviously, we’ve started with our own network in the high-tech,” he explains. “Later, we plan to expand to, for instance, telecom or nance. To me, this article is a pitch for the industry. We want to work together with companies to get to the most optimal version of OpenML for that target audience.”

OpenML is an open platform to easily share datasets, algorithms and models. Credit: OpenML

Leader in Customer Value

VALUE =

VECTOR NETWORK ANALYZERS

Uncompromising performance. Advanced software features on Windows or Linux OS without added cost or licensing fees. Timely support from automation and applications engineers. Comprehensive VNA solution from Copper Mountain Technologies.

Need more information? +31 40 2507 451 tm-nl@acalbfi.nl

VNAs to 44 GHz and Frequency Extension to 110 GHz from Copper Mountain Technologies

This article is from: