8 minute read

Zero Trust in a Big Data Connected World — The IoT Evolution

TB: Thanks for joining me today Dr. Engels. Tell us a little bit more about barcodes and r FID tags, and how that has evolved. Maybe start with the a uto ID center?

DE: Sure, thanks Trey. at the MIT a uto ID center, we developed and came up with a really cool technology where we were literally trying to connect the physical world to the virtual world. In trying to figure out what the killer application for such a really cool technology would be, we ran into folks at the Uniform code council (U cc), then subsequently Procter & Gamble and Gillette. We listened to them, processed the feedback, and figured out that the killer application for connecting the physical world and the virtual world is supply chain management.

TB: Well that sounds like a reasonable place to start, and certainly a point of emphasis for our readers.

DE: It was very, very reasonable. But the whole point of that system could be narrowed down to a single question: h ow do you take physical objects and identify them, automatically connect them to the virtual world, process and read information about them, or maybe just identify them in the virtual world? Then figure out what to do with it, what it is and everything else.

TB: That sounds like it wasn’t just classes of things, but rather discrete instances, where everything would have a virtual “license plate”?

DE: a bsolutely correct. The ubiquitous barcode that we’re all familiar with, which had been originally developed in the 1970s by the U cc , didn't really begin adoption in the retail supermarket until the early 1980s. The catalyst was a label change required by the FDa to include ingredients and calories on all the foods that we buy. In those times, the barcode had a scanning lifespan of one time - at checkout. Fortunately for us now, barcodes can be used in a much broader range of applications, including back-end processing like your Starbucks a pp.

We were doing these types of things at the a uto ID center, back in the time when 96k modems were all the rage. Believe it or not, there are probably several 96k modems and slightly faster modems still in use today. Typically the most remote facilities are on modems and not high speed internet. With Wi-Fi in the home, and fixed and mobile broadband everywhere, it’s surprising that 96K modem connections are still out there. But they are. connectivity though was not available all the time. This drove us to develop the EP c system around r FID (radio frequency identification) tags that carried the data instead of barcodes.

We made another observation while defining the protocols and developing tags. Tags generate a huge amount of data when r FID readers are reading them, and that data needs to be managed. In the early days we had middleware processes to enable management and filtering closer to the edge. The functionality closer to the edge would push the data into a back end database. The edge functionality, our EP c servers, were there to capture the data, collate the data, store the data. The result was a local server with information accessible from various applications. These applications could mine the data then do something with the data more intelligently, but also allow that information to be potentially discoverable from somewhere else – whether it was public or private information or private corporate data – in the US or the world.

That was really the whole system. The EP c is just a unique identifier which is really a serialized GTIN (Global Trade Identification Number).

The number on a barcode today is a GTIN, identifying the class of product. The serialized GTIN adds a serial number that identifies the instance of that product. It's unique to that particular thing in the world. The EP c was just a way to store that information in a common form factor whether it was digital or human readable or machine readable. SGTINs allow you to uniquely capture an item as it travels through the supply chain.

TB: Ok, so the first step in the merge of the physical world with the virtual representation is in the serialized GTIN. Was that the only step that needed to occur?

DE: The third part is the financial flows. When you merge the physical and virtual and truly see where things are going, track where they're going, manage where they're going and actually automate a lot of that, you can start automating financial flows, like auditing, controls, and billing and receiving.

TB: Makes a lot of sense. We can track almost anything through the supply chain on a discrete basis. I understand that link to IoT.

DE: Yes, but when you start thinking about tagging the world, there is a lot of data - which creates a lot of questions. Who gets to see that data? h ow is it viewed? What can be done with it? Who owns it? That is, maybe your data may be about you, but it isn't yours. Or is it owned by the guy that read it? Or somewhere in between.

Now you have a huge security problem. h ow do you protect the data so that only authorized people can read it? h ow do you ensure that you have some form of privacy? If somebody reads it, how do you protect that information? The good thing about r FID is that it's got a limited amount of data. The bad thing about r FID is that it's got uniquely identifying data even if it just has an identifier. If you carry multiple tags on you, even if they don't have a unique identifier, you now have a digital fingerprint from the combination of tags like Nike shoes and Levi pants and maybe a designer wallet. We're able to now uniquely identify you, not because you carry a unique identifier, but because you've got this constellation of tags. Then consider one of the benefits of r FID which is you don't need line of sight. You can read it at three or four meters. If you want to go illegal power levels you can read it at much longer range or stay within power levels if the tag is designed for longer range. Your toll tag, which incorporates the EP c protocol we developed, can be read with a standard legal reader at speed well over 50 feet away.

You now have tags on you that can then be read and used by nefarious people, which really got me into security for these very resource constrained devices. revere Security is a startup company that I joined as c TO where we continued to develop protocols for r FID tags, protocols for Sca Da devices, which was our original impetus. We also created a cryptographic cipher algorithm which to our knowledge is yet to be broken, even though we had lots of people really look hard at it. It's extremely lightweight, extremely durable, extremely good. But if you're going to use anything today, I would recommend a ES or public key cryptography unless you have special requirements.

I've been doing research and work at the intersection of IoT, cybersecurity and Data Science for decades even though on the surface someone may say “you're doing antenna design and you're doing cybersecurity, and you're doing data science, they have nothing to do with one another.” Except in the world of IoT and cybersecurity and Data Science, all of them are merged together into one because they all are related. h ow do you move Big Data? You can't just analyze it all in one place because no computer has enough memory for terabytes of data. You’ve got to do it some other way. h ow do you partition that up? h ow do you then allow that to be analyzed? That's all part of data engineering. Based on what you know, how do you engineer the infrastructure?

Data science is really a broad field that encompasses your traditional data analytics database management, including all of your data engineering work that you would typically see, like database, data structures, data management, and how to analyze the data. If you're managing the database, setting post data structures, how do you organize your queries into it? The other piece is how do you manage it? So we get into this big data concept. When I started at MIT, the reason we needed the filters in the middle to do something intelligent closer to the edge was because we couldn't handle all the data. We didn't have databases or servers big enough. Today, it is called cloud computing. You don't have to filter everything on the edge, although we do for certain things. When you start thinking about that data science piece you have to think about how do you keep the data clean? Because if you get dirty data, any analysis of dirty data gives you bad results.

TB: Where do analytics fit in?

DE: That's all part of data science. Data science and a I, which is the big buzz word, are just applied statistics and applied data and applied computer science to data. 80% of what you need to know can be obtained through simple basic statistics. You can figure it out and do a lot with that. a s you get more complex insight requirements, we start applying the more advanced techniques like neural networks, and deep learning approaches, where advanced analysis comes in. It's typically referred to as a I. Today it starts becoming interesting and useful because now you're doing all these insights and you're starting to automate a tremendous amount of decision making.

TB: h ow does cybersecurity fit in?

DE: cybersecurity is very much different from Data Science. There's a lot of different types of cybersecurity, different mechanisms you can use. But they all try to do the same thing. In the computer networking world, and the database world, the operating system world, the application world, cybersecurity is all about providing confidentiality, integrity and authentication. The authentication includes authorization. You might be authenticated to do something or access an area, but not authorized to enter a particular room in the area.

In cybersecurity, while the mechanisms are very much the same across the spectrum, you have different algorithms and different protocols. You've got different objectives. attacks are all pretty much well known. The NIST Framework (National Institute of Standards and Technology “Framework for Improving critical Infrastructure cybersecurity”) will tell you all kinds of attacks and defenses against the various attacks. In cybersecurity the human is the weakest link. consider phishing emails. We do an ok job of identifying them, but they’re still arriving in the inbox. People still click on them causing malware to be downloaded into the whole security infrastructure. h ow do you apply the cybersecurity mechanisms and security framework? It's fundamentally flawed in most organizations in the way that operating systems and applications and networks were initially developed. cybersecurity was in most cases a complete afterthought.

TB: Why do you say that?

DE: Think about how computer networks have evolved over time. Originally it was a big mainframe sitting in a huge building, the ENI ac , as an example, one of the first electronic digital computers. a s we moved on to mainframes, it was okay because input and output required physical punch cards to feed and receive. That

This article is from: