00001117
L
ibelium: The IoT gamechanger Change is fundamental to life and in these rapidly changing times, the technological world has been constantly shifting its tectonic plates, which has unlocked a plethora of opportunities for those who have a thirst of innovation. In this cut-throat competitive world, we need technological innovation and productivity to survive this race of success.
After IoT and Big Data have stepped into the arena, the floodgates of the tech world have been gushing with opportunities and challenges for many companies. To keep up with this fast-paced world, Libelium has envisioned that IoT will drive a new form of society, with more transparency and more freedom. Thus it has ventured into the world of the Internet of things, smart cities and M2M platform provision while creating a unique window of opportunity for the future of technology. Being one of the most successful wireless sensor platform providers, Libelium has created a healthy and rational relationship between developers, integrators and innovators while being completely client-centric. Herein, we look at the story of Libelium and what drives this company to continually strive better and dream more. A man and his ants David Gascón, Libelium CTO, was only 23 when he observed the behavior of the ants that create the most efficient algorithm of trying to find the minimum distance between two points. David was inspired by this algorithm and proposed the idea of expanding the limits of the internet and allowing objects and the environment to be part of it using small sensors which send information in real-time to the internet. Acting upon the metaphoric light bulb that lit above his head, David made Libelium (which means dragonfly in Latin) his brainchild. Libelium’s smart sensor platforms are branded by their capability to operate over long distances and low consumptions since they can last up to 10 years with the battery of a single mobile phone. Ever since the inception of the company,
20
Winter 2019 | Beyond Exclamation
David Gascรณn Co-founder & CTO Libelium
Beyond Exclamation | Winter 2019
21
the powering of the IoT revolution became real thanks to visionaries such as Libelium who realized its potential from the very beginning. The company is now playing in the world’s ’first division’ and in the next decade, their goal will be to get the technology to reach the maximum number of people. Libelium’s strategy to offer high value-added services to the entire chain of IoT key players has led them to release a new service called “The Hive”, a unique Internet server, which allows users to send information from any IoT device to the main worldwide cloud platforms without having to implement each specific cloud protocol or authentication. Highlighting the greater connectivity, enhanced data security, and online device control, the new “Hive” service allows devices to have direct, secure and reliable connectivity to the cloud concurrently guaranteeing end-to-end data privacy. By using a unique HTTPS request, any device can connect to the main worldwide cloud platforms just by sending the data to - The Hive which strongly promotes that security is a key factor. “All the messages are point-to-point encrypted so developers do not have to cope with security issues,” points out Gascón. Thinking beyond the horizon David Gascón, who is used to work with disruptive technology, has constantly focused on the future and what is coming next, which has formed the main driving force in his career. This aspiration for growth has been a dynamic game changer for
22
him here at Libelium as well. Gascón highlights, “The most important thing is the path we follow, more important than the goal. We must always be moving forward.” Inspired by ant colony and other swarm insect behaviors, David mainly researched on the Wireless sensor networks (now commonly known as the Internet of things) and Mesh networks and Self Emergent and Auto organizational Complex Systems. For his revolutionary innovations, he has been awarded by the MIT with the TR35 prize as the most important innovator under 35 (2012) and the "Juan López de Peñalver" in 2018 by the Spanish Royal Academy of Engineering. These awards grant the most outstanding engineers under 40 in Spain. In 2013, Libelium Sensors successfully launched ArduSat, the first open source satellite that allows private citizens to design their own applications in space. Libelium has been a pioneer in the world of the Internet of Things for over a decade now and has continued to revolutionize the digital world with their vision for the future of technology. Identifying IoT as a breakthrough for their company, Libelium has found new ways to solve the problem of interoperability of any IoT project. Furthermore, it has focused on organizing its relationship with the ecosystem and has launched more partnership programs than ever. The most striking quality about Libelium’s Smart Sensor platforms is that they can be utilized for a variety of purposes, from monitoring cities to precision crops.. David Gascón and
Alicia Asín, CTO and CEO of Libelium, see IoT as an untapped goldmine that is just starting out on its own path and is still at a very early juncture but with the right set of innovative minds this can really be a breakthrough for many consumers and providers. Until now the transactions have been more between professionals and companies but Libelium dreams of making the technology accessible to anyone and everyone in the years to come. Turning an idea into an accomplishment Libelium is a strong believer of the ecosystem approach and believes that small companies need to focus on what holds you back and work on these shortcomings to better themselves in the future. Their primary focus used to be on
Winter 2019 | Beyond Exclamation
hardware and sensor devices; it was only later in their journey that they grew to understand the nature of the market and what clients really need. Through this journey of understanding, Libelium dug out other underlying shortcomings as well. For example-System integration; the customers did not want to start putting things together instead they desired a turnkey solution. The chief focus of most companies in the IoT industry was finding software solutions which made the customers wonder about the data generated by the hardware and collected by the sensors. This is where Libelium stepped into the picture to fulfill their customers’ needs; Libelium became more aware of the potential of the market, hence gaining relevant partners to their business. In order to connect
Beyond Exclamation | Winter 2019
everyone in the same way, a compelling program was required. That is when the cloud partner program was created wherein anyone who got a cloud gateway from Libelium could connect to all partners that the company had on the cloud and enhance business. A bright future After years of deploying their sensors all over the world, Gascón visualizes the sensors to have a collective global brain that equips them with intelligence and decision making abilities in anticipation of new situations. Gascón claims. “There will come a time when there will be billions of sensors that are interconnected”. Libelium can be given credited for having a big impact in shaping the IoT industry in different developing countries. For
instance, fish farms in Vietnam were monitored by controlling water quality in ponds and lakes which led to the reduction of the number of lost animals and improved the product quality. IoT’s primary advantages being efficiency and cost saving, the other underlying advantages are generations of new jobs and greater opportunities. Thus a society based on data can optimize many things and the biggest legacy of IoT will eventually lead to greater democracy and greater economy.
23
C
loud technology has penetrated every aspect of our lives, both personal and professional. Our music and TV shows are in the cloud. Everything on our phones from to-do lists to random notes are backed up and accessible on the cloud. It has transformed the way we work. It has changed the way businesses function, how teams communicate and collaborate. Glassbox leverages the power of cloud in the following ways by being the most secure web & mobile analytics solution. Every month, Glassbox’s customers capture and analyze billions of sessions. Each one of them benefits from a single tenant solution, hosted on its own Virtual Private Cloud, with dedicated resources and computing requirements to ensure maximum security. This sets Glassbox apart from any other vendor in its category. The beginning of a global leader The foundation of Glassbox took place in 2010, when three seasoned HP/Mercury executives, recognized that digital would become the dominant channel. At the time, the market was poorly served and fragmented: there were only quantitative analytics packages, technical IT solutions and lightweight replay solutions. The leading player was using an old technology, complex to deploy, that required extensive ongoing configurations. The founders, Yoav Schreiber, Yaron Gueta and Hanan Blumstein identified the need for a complete package supporting web and mobile, built for high volume, data-sensitive environments that was fast to install and easy to maintain. They left the comfort of a corporate life and chose the path of adventure, with humility and dedication. From 2010 to 2015, they grew and sustained the company single-handedly and at the end of 2015 they raised $5.5 million in Series A funding to spearhead the company’s expansion in Europe and across North America. Fast forward to today, Glassbox enables clients to capture, record, replay and analyze each one of their customers’ digital sessions. By doing so, Glassbox provides clients with visibility in real-time not only into “what” their customers’ digital behaviors are – whether on the web or on their native mobile applications – but also, more importantly into “why” customers behave the way that they do and enables them to take immediate action. Glassbox takes the ‘black box’ idea from the aviation industry – recording everything during a flight – and adds a twist of transparency to it. Just like every plane must have two black boxes recording at all times, to ensure evidential record of everything that happens during a flight, every website and mobile application should have a
26
Winter 2019 | Beyond Exclamation
Yaron Morgenstern CEO Glassbox
Beyond Exclamation | Winter 2019
27
glass box for the best interest of both enterprises and customers. By using Glassbox, organizations can drive business agility and continuously improve customer experience, optimize customer service, drive operational excellence and reduce risk across all their digital channels. Focusing on solution’s outcomes and addressing a myriad of business & technical needs At its core, Glassbox cloud-based services offer a comprehensive Digital Customer Management solution which enables clients to quickly focus on the solution’s outcomes and address a myriad of business and technical needs, across their web and mobile app channels. With its rapid setup, it enables enterprises to record, analyze and replay all interactions with their website and mobile apps, as well as to be alerted when things go wrong, which allows them to action improvements, delight their customers and generate more revenue. Glassbox does all this while meeting the highest security and privacy standards – and all with the client’s customer consent. In fact, it is a unique single tenant solution, meaning that for each customer, the software is deployed on a dedicated infrastructure and uses dedicated resources for each organization’s data and computing requirements.
ensure that they stay, grow and develop at Glassbox. Its Research & Development team is based in Israel, where the company’s CTO leads them through revolutionary product roadmaps to fulfil its customer’s current and future needs. Having the largest enterprises as its customers also means Glassbox is working with the most digitally mature organizations in the world. The firm really partners with its customers to develop future looking capabilities that will support their Digital Customer Management efforts. In 2018, while exponentially growing the business, it was Glassbox’s ambition to achieve dominance within the US financial services industry and the team duly succeeded by seeing all four largest US banks adopt Glassbox Digital Customer Management solutions. The biggest challenges, however, have rotated around scaling up very fast a small startup into a structure that services and continues to help Tier 1 enterprises in their Digital Transformation journey. Raising awareness and educating the market on the need to move from the concept of online customer experience analytics to the more complete approach of digital customer management - which Glassbox has pioneered globally is a continuous effort. Leading the pack of machine learning applied to analytics
Together Everyone Achieves More One of the key reasons why Glassbox has been able to sustain such a long period of hyper-growth, quite simply, is the people that are associated with the company. Glassbox invests in its team to
28
“Together with Glassbox, we work at scale and serve the largest global financial institutions, telecom providers and airliners. The combination of Glassbox with AWS Cloud Machine Learning offers the widest scope of digital
data analysis available to organizations and more importantly, enables them to act on the insights we help them generate” says Yaron Gueta, CTO and CoFounder, Glassbox. “Not only have we made error reproduction a thing of the past, we exponentially boost our customer revenues and operational efficiency, while complying with the most demanding regulations requirements, such as SOC2 and GDPR” Gueta adds. Glassbox and AWS shared ambitions “This partnership ensures that our customers are always at the cutting edge of machine learning developments and that Glassbox leverages the huge investments
Winter 2019 | Beyond Exclamation
Amazon has made in this domain over the last few years” said Yaron Morgenstern, Glassbox CEO. It is part of Glassbox’s ambitious machine learning agenda, which includes out-of-the-box functionality and more advanced capabilities tailored for power users. By feeding AWS Machine Learning – in real time – with the massive amount of digital data captured by Glassbox, enterprises can create and automate a vast range of predictive analytics and workflows that optimize management of the digital customer experience. Striving to be unique every single time Today, Glassbox is the only enterprise-grade customer analytics
Beyond Exclamation | Winter 2019
platform in the market that allows users to automatically capture, index, search, retrieve, replay and drive real-time machine learningdriven insights from data related to every single digital customer journey. It enables online customer experience professionals to receive automatic alerts about customer struggles and technical anomalies and act upon them. It is a unique single tenant solution, meaning that for each customer, the software is deployed on a dedicated infrastructure and uses dedicated resources for each organization’s data and computing requirements. Going ahead, Glassbox aims to become the global leader in digital customer management solutions, helping enterprises prepare for the digital unknown by employing the most sophisticated big data,
machine learning and artificial intelligence capabilities. Meet the maestro Yaron Morgenstern has been the Chief Executive Officer at Glassbox Ltd since October 2015. Having held senior positions in larger organisations prior to joining Glassbox, he brings both a very strong business acumen as well as an extremely high standard when it comes to work ethics. He’s a very hands-on CEO who leads by example and communicates very clearly the company’s strategic directions to the rest of the staff.
29
s
upports rapid business growth via automation, consolidation, and standardization – all the while improving the operational expenses. It becomes important for IT businesses to change the way they think about their business’ IT infrastructure and align their business and IT with scalable solutions delivered by experts.
Enters Virtual Tech Gurus with a team of IT professionals that brings-forward enterprise-level expertise with deep knowledge to give small- to medium-sized businesses access to the best talent and premier technology. Virtual Tech Gurus is perfectly aligned to assist its customers with their cloud services requirements with its expertise, including developing solid solutions around Managed IT Services, Cloud Infrastructure, including private, public and hybrid cloud offerings and Zero Downtime Data migration. The technical gurus of the company work in unison with the clients to gather crucial requirements, design, implementations, and support solutions based on emerging Virtual and Cloud Technologies. VTG knows that providing the best possible solutions at an affordable cost equals instant success for its customers. The team collaborates with clients to gather crucial requirements and knows that providing the best possible solutions at an affordable cost equals rapid results. Bringing quality and automated analytics solution to the industry VTG saw a big vacuum in automation, data analytics in the infrastructure. VTG’s goal was to bring quality and automated analytics solution to meet customer demand and with that goal, VTG was founded back in 2008. Today, Virtual Tech Gurus (VTG) is an IT Service Provider company headquartered in Dallas, Texas, with offices across the United States and India having core competencies in Cloud Services, Data Center Migration, Cloud Migration, Infrastructure Assessment, and Staffing. VTG is partnered with leading players in cloud and data center solutions like EMC, NetApp, Hitachi, VMware, and other renowned companies.
32
Winter 2019 | Beyond Exclamation
Guru Moorthi CEO Virtual Tech Gurus
Beyond Exclamation | Winter 2019
33
An Elite Partner of Dell EMC for over nine years, VTG has received the coveted 2017 Best Customer Experience Award for its leadership in Intelligent Data Mobility (IDM) area and expertise in the relationship-building prowess. VTG is also the winner of the SMU Cox School of Business, Dallas 100 Fastest Growing Company Awards (#10). One of the most trusted partners to the frontrunners Today, VTG proudly stands as one of the leading providers of cloud, infrastructure, migrations, and data center solutions and has also made a name for itself as one of the most trusted partners to the frontrunners in the business. The company’s services are briefly explained as follow: Data Center Migration: Virtual Tech Gurus help clients migrate applications and workloads to new platforms quickly and easily, with minimal downtime. VTG’s skilled project team helps them plan and implement their migration with an end-to-end service. VTG works with their team at every stage from the opportunity identification, initial assessment, and implementation to ongoing management. The service is suitable for storage, server, application, or exchange migration, and migration to the cloud. VTG uses its patented automation tool, ZENfra, to ensure a successful endto-end transition. This helps lower costs by up to 25 percent and reduce migration times by 30 to 40 percent compared to traditional inhouse methods.
34
ZENfra™: VTG’s automation tool ZENfra manages and monitors migration projects to cut lead times by 30 to 40 percent and reduces the cost by 25 percent compared to traditional in-house methods. ZENfra integrates initial assessment, development of a strategic migration plan, and premigration to ensure a successful end-to-end transition. By automating the collection of data from log files, ZENfra eliminates the complexity of data capture and reduces the risk of human error. The ZENfra process includes Data Collection, Processing and Data Generation, Reporting and Automation. Cloud Automation: VTG’s skilled project team helps clients plan and implement their migration with an end-to-end service. VTG works with their team at every stage from the opportunity identification, initial assessment, and implementation phase to ongoing management. The service is suitable for storage, server, application, or exchange migration, and migration to the cloud. VTG uses its patented automation tool ZENfra to ensure a successful endto-end transition. DevOps: VTG believes in balancing and integrating improvement strategies across people, process, and tools to create meaningful business results such as keeping your team up to date on cutting-edge practices, utilizing a system level approach in process design and creating a prioritized roadmap and tactical project plan. At VTG, the team sees DevOps as an Enterprise Architecture Framework that allows seamless communication between
development and operations team to deliver highly available and secure infrastructure on time. A look into the future VTG’s proven processes and best practices, coupled with its innovative automation tools such as ZENfra, allows the company to complete the migration process with less downtime and significant cost savings over its competitors. “We are transforming our company to more solutions and IP based from purely consulting. We are able to add a significant number for fortune 500 customers to use our products, looking to increase our adoption and add more dataanalytics towards containers,” says Guru Moorthi, Chief Executive Officer of Virtual Tech Gurus. Notable client testimonials “Brinder, Thanks for all you do and your incredible sense of pride and tenacity to get this thing to move forward with Capital One while balancing everything else. I do not have the words to say thank you enough, so I hope this will say it all. Congratulations and please keep up the great work!” – Software Engineer, EMC Corporation. “There was nothing in the process that we thought could have been improved, but we were proven wrong. We were extremely happy with the way the project was managed.” – Major Motorcycle Manufacturer.
Winter 2019 | Beyond Exclamation
Beyond Exclamation | Winter 2019
35
Veera Swamy Arava CEO & Director SAT Infotech
36
Winter 2019 | Beyond Exclamation
W
ay back in 2006, when Amazon launched its very first IaaS offering, it started off by peddling its wares to start-ups and smaller businesses. The reasoning behind the move was that start-ups and smaller firms are “cloud-native” and are quicker in adopting cloud as they don’t have the burden of legacy hardware, unlike their enterprise counterparts.
Cloud Defines a Digital Leader Companies everywhere, of all sizes and nature, are embracing new technologies, processes and business approaches to be digital disruptors. Right from Amazon to Airbnb to Byju learning app, we have countless examples for this transformation.
During this phase, many such firms leveraged this strength and out-innovated their larger enterprise rivals, with the competitive edge they achieved by better use of cloud.
According to the Microsoft Asia Digital Transformation Study, around 88 percent business leaders in India surveyed agreed that ‘cloud computing is an essential part of their digital transformation strategy, and that the cloud has made it more affordable for companies of all sizes to embark on their digital transformation journey.’
Fast forward to 2018, and we would see that there isn’t a single organization—small or large, start-up or older firm—that does not leverage the power of cloud computing. Interestingly, the industry is at a crossroads once again. A decade ago, it was about how organizations can redefine the way they implement and consume IT, through cloud. It’s now time for organizations to redefine their entire business processes and consumer experiences through digital transformation journey. Will start-ups continue to have that competitive edge?
Beyond Exclamation | Winter 2019
Many successful digital organizations already acknowledge the importance of cloud as one of the ingredients for successful digital transformation.
As Gartner aptly puts it, “The agility enabled by “as-a-service” cloud-based technologies allow an enterprise to embrace market and operational changes as a matter of routine.” The cost efficiency, scalability, and anytimeanywhere accessibility are some of the fundamental value-adds that cloud brings to digital transformation.
37
However, organizations are moving to a more matured phase in cloud consumption. The adoption of cloud-native applications (applications designed specifically for cloud architectures), for instance, is considered to have direct impact on an organization’s digital journey. Market studies indicate that the share of new business applications that are cloud native will more than double in next two years. Management of multi-cloud environment is another key aspect that organizations are focusing on, to enable the digital transformation. The Start-up Edge Start-ups and SMEs are once again uniquely positioned to leverage the digital transformation phenomenon. In fact, start-ups across the world are adapting to the digital wave in a much faster and effective manner. A recent study from IDG communications is proof enough. The study reveals that start-ups (established within 10 years) are way ahead of traditional firms when embracing digital transformation. The report shows that 95 percent of start-ups have digital business plans as compared to 87 percent of well established companies,
38
while 55 percent of start-ups have adopted a strategy compared to just 38 percent of more established firms. The start-up advantage of not having a legacy baggage—technologically and culturally—is a great boon. Innovation and adoption of new
technologies & tools, to achieve better customer experience has historically been second nature to start-ups. The more established firms, on the other hand, will have to go for a complete overhaul of their well-established systems and processes. SMEs, which are habitually
Winter 2019 | Beyond Exclamation
considered as ‘digital laggards’, are busting that myth, with rapid adoption of cloud and other third platform technologies like social, mobile and big data. Case in point being the adoption of cloud-based ERP systems among SMEs. SMEs are the frontrunners in moving critical
Beyond Exclamation | Winter 2019
applications like ERP systems on to cloud environments. Considering that a cloud-based, intelligent and flexible ERP system is critical for the enablement of digital transformation, SMEs will gain a competitive edge over a period of time.
Introduction of GST will further bring the SME sector to the mainstream and drive cloud adoption and digital transformation.
39
Jay Chapel CEO ParkMyCloud
42
Winter 2019 | Beyond Exclamation
8 Ways to Improve Cloud Automation Through Tagging
S
ince the beginning of public cloud, users have been attempting to improve cloud automation. This can be driven by laziness, scale, organizational mandate, or some combination of those. Since the rise of DevOps practices and principles, this “automate everything” approach has become even more popular, as it’s one of the main pillars of DevOps. One of the ways you can help sort, filter, and automate your cloud environment is to utilize tags on your cloud resources. Tagging Methodologies In the cloud infrastructure world, tags are labels or identifiers that are attached to your instances. This is a way for you to provide custom metadata to accompany the existing metadata, such as instance family and size, region, VPC, IP information, and more. Tags are created as key/value pairs, although the value is optional if you just want to use the key. For instance, your key could be “Department” with a value of “Finance”, or you could have a key of just “Finance”. There are 4 general tag categories, as laid out in the best practices from AWS:
Beyond Exclamation | Winter 2019
43
Ÿ
Technical — This often includes things like the application that is running on the resource, what cluster it belongs to, or which environment it’s running in (such as “dev” or “staging”).
the scope of both the keys and the values for those keys. Using management and provisioning tools like Terraform or Ansible can automate and maintain your tagging standards.
Ÿ
Automation — These tags are read by automated software, and can include things like dates for when to decommission the resource, a flag for opting in or out of a service, or what version of a script or package to install.
Ÿ
Business and billing — Companies with lots of resources need to track which department or user owns a resource for billing purposes, which customer an instance is serving, or some sort of tracking ID or internal asset management tag.
Automation Methodologies Once you’ve got your tagging system implemented and your resources labeled properly, you can really dive into your cloud automation strategy. Many different automation tools can read these tags and utilize them, but here are a few ideas to help make your life better:
deployment, you can use tags for the build number or code repository to help with the continuous integration or continuous delivery. Ÿ
Ÿ
Security — Tags can help with compliance and information security, as well as with access controls for users and roles who may be listing and accessing resources.
In general, more tags are better, even if you aren’t actively using those tags just yet. Planning ahead for ways you might search through or group instances and resources can help save headaches down the line. You should also ensure that you standardize your tags by being consistent with the capitalization/spelling and limiting
44
Ÿ
Configuration Management — Tools like Chef, Puppet, Ansible, and Salt are often used for installing and configuring systems once they are provisioned. This can determine which settings to change or configuration bundles to run on the instances.
Ÿ
Cost Control — this is the automation area we focus on at ParkMyCloud — our platform’s automated policies can read the tags on servers, scale groups, and databases to determine which schedule to apply and which team to assign the resource to, among other actions.
Ÿ
CI/CD — If your build tool (like Jenkins or Bamboo) is set to provision or utilize cloud resources for the build or
Cloud Account Cleanup — Scripts and tools that help keep your account tidy can use tags that set an end date for the resource as a way to ensure that only necessary systems are around long-term. You can also take steps to automatically shut down or terminate instances that aren’t properly tagged, so you know your resources won’t be orphaned.
Conclusion: Tagging Will Improve Your Cloud Automation As your cloud use grows, implementing cloud automation will be a crucial piece of your infrastructure management. Utilizing tags not only helps with human sorting and searching, but also with automated tasks and scripts. If you’re not already tagging your systems, having a strategy on the tagging and the automation can save you both time and money.
Winter 2019 | Beyond Exclamation
Beyond Exclamation | Winter 2019
45
CLASICO Dark Roast Instant Coffee Jar NESCAFÉ Clásico delivers rich, bold flavor in every cup, which has made it the most loved and popular coffee brand in Latin America for generations. 100% pure coffee, carefully roasted to capture the blend’s full flavor and aroma.
ADF mapping Dataflows for the impatient — Introduction Mehdi Modarressi Cloud Solution Architect Microsoft
T
L;DR ADF Mapping Data Flow is a new feature of Azure data factory (ADF) which allows users build data transformation pipelines (ELT jobs) using a graphical user interface. Essentially Data Flow mapping generates Spark code for the pipeline to be executed on Spark at scale without needing to write a line of code and with the advantage of a GUI for pipeline management, data lineage, query push down and most importantly embedding within the current ADF pipelines like any other activity.
How is data flow mapping different to ADF? To be able to answer this question we need to first briefly look at the idea behind ADF and how it works. ADF is built with the sole responsibility of orchestration and scheduling of data pipelines rather than a data transformation tool. What this means is Microsoft intentionally decided to separate the role of a job scheduler from the data processing platform. The advantages of this approach are:
48
Leave every task to the platform that is best at it. Scheduling and orchestrating is done by ADF while depending on the type of transformation you can choose which ever tool does that best. Ÿ Provides a full flexibility of combining various processing engines in the same pipeline depending on the nature of the task so that it is both technically and commercially optimum. For example if a data pipeline starts with an event driven data source the preliminary data acquisition and processing can be done in an Azure Function and the primary data transformation happens in a dedicated data platform like Azure Databricks or Azure Data Lake Analytics. Ÿ
At the end of the day what this means is ADF would become a very light, non-compute intensive application and as a result very cost effective.
Winter 2019 | Beyond Exclamation
Source and sink Data source is where data first is read from and staged in Spark for further transformation. At this time there are 4 data sets available as part of the private preview but there are many more to be released in near future. Azure SQL DB Azure SQL DW Parquet: If you are not familiar with Parquet file format I suggest have a look at some of the resources available on the web. In a nutshell Apache Parquet is a binary column oriented file format. The primary advantage of binary file formats over delimited text or flat text files is they include metadata (column names and types) as well as the data. For instance if we are offloading information from a database to Parquet files the column names and types are also included in the data file. Also being a binary format the major problem of separator being in the data field and the need for escape characters are eliminated. Ÿ Delimited Text (e.g. CSV)
Ÿ Ÿ Ÿ
The only exception where ADF requires to bring in its own compute power and rely on its Integration Run Time is for copy data activity. Copy Activity is when ADF is taking data from a source to a target. The source and target could be any data source whether on-premises, in Azure or in other clouds. Once the data arrives at the destination the rest of the process must be handed over to one of the dedicated data processing platforms as described above. Now it is very easy to answer to the question above: Data Flow Mapping is just another “dedicate” data processing activity as far as ADF is concerned. ADF Data Flow Mapping concepts Data streams Data Flow Mapping is built around the concept of data streams (the same concept used in Apache NiFi). Simply put, every data stream starts with a Source, then data flows through as series of transformations and finally arrives in data sink.
Similarly data sinks are the data sets for writing the transformed data to the target data store. Transformations This is where the actual magic of Data Flow Mappings takes place. Transformations, as the name suggests are the activities that perform the actual data transformation logic on the data stream. below is a list of transformations but if you need to read more you can refer to this page. Transform Description New Branch Create a new flow branch with the same data. Join Join data from two streams based on a condition. Conditional Split Route data into different streams based on conditions. Union Collect data from multiple streams. Lookup Lookup additional data from another stream. Derived Column Compute new columns based on existing ones. Aggregate Calculate aggregations on the stream. Surrogate Key Adds a surrogate key column to output stream from a specific value. Pivot Row values transformed into individual columns. Unpivot Column values transformed into individual rows. Exists Check the existence of data in another stream. Select Choose columns to flow to the next stream. Filter Filter rows in the stream based on a condition. Sort Order data in the stream based on column(s). Alter rowMark rows as insert, update, delete or upsert based on conditions.
Beyond Exclamation | Winter 2019
49
Getting started with Data Flow Mappings Before proceeding, you will need to request access to the limited private preview of data flow mapping using the web form (http://aka.ms/dataflowpreview). Once your Azure subscription is white listed for data flow mapping you will need to create an Azure Data Factory V2 instance in order to start building you data flow mapping pipelines.
Note: You may have noticed previously you needed to create a ADV v2. with data flows in order to access data flows mapping but this is no longer the case and it is now included in ADF V2. When you enter the edit and monitor GUI of ADF you will notice the new data flows in the side bar similar to below. Once you create a new dataflow it will open a new canvas to start building your dataflow streams. From here you can start creating a new dataflow by adding a data source and data transformation.
50
Debug mode Azure Data Factory Mapping Data Flow has a debug mode, which can be switched on with the Debug button at the top of the design surface. When designing data flows, setting debug mode on will allow you to interactively watch the
Winter 2019 | Beyond Exclamation
data shape transform while you build and debug your data flows. https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-debug-mode When you turn on debug mode ADF under the hood will create a Spark cluster to harness its compute power for letting the developer preview the transformations in real time. Note: In previous versions of Dataflows you needed to create a Azure Databricks cluster and attach it to ADF through a linked service. You will no longer have to bring your own Databricks clusters. ADF will manage cluster creation and tear-down. What this means is you will not even need to have an Azure Databricks workspace in order to work with Dataflows mapping. https://github.com/kromerm/adfdataflowdocs/blob/master/adf-data-flow-faq.md How I built an SCD type 2 data pipeline using Azure mapping dataflows with all the concepts covered about dataflows mapping I thought a very good practice would be to build my own slowly changing dimension type 2 using this new shiny toy. Below is an overview of the pipeline and in the next post I will go deep in to how to build a full data pipeline like this using mapping dataflows so stay tuned!
Resources: If I managed to get you excited about mapping dataflows start your journey by going through the below resources. Microsoft Documentation: https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview Azure Data Factory Youtube channel. https://www.youtube.com/channel/UC2S0k7NeLcEm5_IhHUwpN0g/videos?view=0&sort=dd&shelf_id=1 Early documentation of mapping ADF Dataflows on Github repo: https://github.com/kromerm/adfdataflowdocs
Beyond Exclamation | Winter 2019
51