Is it possible for organizations to use machine learning and big data technologies without affecting the privacy of users? How could companies derive insights from their data without compromising on security and protect against possible data breaches and fraudulent attacks that could happen in cloud computation? The answer may be confidential computing, an emerging technology that protects data even when in use. This new technique makes it possible for organizations to securely share, pool and process data in the cloud without exposing any of it to the outer world.
The data lifecycle
With the rise of new data-driven applications, companies are collecting and storing huge amounts of data that would have seemed impossible a decade ago. If most of this data becomes user privacy-related, it becomes crucial to protect and secure the data against attacks and breaches. A data breach could violate GDPR laws in countries where they apply, so a system architect's most important task is often to identify sensitive data and determine approaches for how to best protect it. Thus, data needs to be protected throughout its lifecycle: at rest, in transit and in use.
Data at rest means inactive data which is stored in any of the following digital forms: databases, data lakes, or other types of storage technology. This data is currently protected using techniques such as tokenization, encryption, and access control, meaning that even during transfer from one database to another, it cannot be breached.
Data in transit includes any data being moved through the network between applications, servers, or clients. This data is protected from unauthorized access using the TLS/SSL protocol.
But what about when data is being computed, a.k.a when it is in use? To run any sort of analysis, the data must be in clear text.
The problem—and solution—to protecting data in use
Organizations often need to perform operations on data in use such as search, query, analysis, and machine learning. However to do this, the encrypted data from databases must be decrypted into clear text before it could be used for any sort of computation.
Once decrypted, this clear text data gets exposed to the underlying operating system and the host machine, meaning that any malware application running on the host machine could dump the memory contents and steal sensitive information. So even if your data remains encrypted in storage, it becomes vulnerable to exposure in memory during computation.
When these types of computation are hosted on the cloud, this becomes an even bigger risk. Your data is exposed to the vulnerabilities of the host operating system, hypervisor, hardware, and the cloud provider’s orchestration system. As a result, companies dealing with highly sensitive user data such as credit card details, user information, and KYC documents are often reluctant to host computation on the cloud.
Fortunately, confidential computing solves this problem. This emerging technology isolates sensitive data in a secure enclave, or Trusted Execution Environment (TEE), during processing. By doing so, the contents of the enclave, the data being processed, and the techniques used for computation are accessible only to authorized programming code and invisible to all external parties, including the cloud provider. This enables organizations to share, pool and process sensitive datasets in the cloud, safe in the knowledge that it won’t fall into the wrong hands.
R3’s new Conclave platform makes it easy to perform confidential computing either in a machine or in the cloud. The only requirement is support for the Intel SGX enclave.
With Conclave, you can build applications that securely pool and process data from multiple parties. Conclave-powered solutions are so secure that no one sees the source data without permission—not even the cloud provider. You can see examples of applications you could build with Conclave, as well as tutorials, on our docsite.