Paper 01 source code attestation network a decentralized digital identifier for your source code on by the tech start up channel show

Source Code Attestation Network: A Decentralized Digital Identifier for your source code on a Blockchain Network Vinod Panicker 1

1 Wipro Technologies vinod.panicker@wipro.com

Abstract. Usage of open source code is at an all-time high. However, the development of code audit systems has not kept pace with the growth in open source adoption. The existing source code auditing systems are centralized and expensive to maintain. Source Code Attestation Network (SCAN) is a proposed code attestation network to self-attest open source code. SCAN makes use of Decentralized Digital Identity [1] to determine the provenance of source code. The SCAN nodes can be setup either internally within an organization, or across multiple organizations, working groups to enable them to collaborate and help validate claims made on source code regarding ownership, origin and license. Keywords: Blockchain, Open Source, Source Code Attestation Network

Introduction

Sixty five percent [2] of companies surveyed reveal they are contributing to open source projects with fifty-nine percent doing so to gain a competitive edge. With the amount of open source code generated everyday it becomes essential to do regular source code audits. Though open source is good for the larger Eco-system, organizations spend a huge amount of time in auditing the developed code. Code audit systems are time consuming and impractical for many organizations to build from grounds up. Centralized code audit systems are also not completely transparent. Organizations do not want to rely solely on such systems. Lack of trust and the competitive nature of the industry prevents organizations from freely sharing details about their source code. Source Code Attestation Network (SCAN) presents an approach to overcome some of these challenges and share source code related information through a process of source code self-attestation. SCAN will help in determining source code provenance in a transparent manner.

Source Code Attestation Network (SCAN)

2.1

What is a Source code Attestation network?

A Source Code Attestation Network (SCAN) is a public permissioned Blockchain network that can help in the validation of verifiable claims with respect to origin, license and usage of source code. SCAN allows organizations and individual to self-verify their code, without having, to disclose any other specific details about the code.

How does SCAN work?

At the heart of SCAN is a Distributed Ledger Technology. The Ledger has two types of nodes namely validator and observer nodes. The nodes are responsible for validating the claims on source code origin, usage, license from agents triggered by participants on the SCAN network.

Validator Nodes: Will do a write operation on to the ledger and create DID [1] Descriptor Object (DDO) [5] for Source Code Units. Observer Nodes: Will notify changes in the DDO [5] for Source code units.

Scan Agents: Off-chain components that interact with the systems of the Participants. Peer-2-Peer Attestation: Makes use of the special elevated powers to attest any of the source code units (SCUs) in case of Verifiable claim. SCAN Agents can also share information about the source code unit on the ledger. 3.1

Source Code Units

Source code pushed to code repositories is broken down into uniquely identifiable smaller units called Source Code Units (SCUs). Agents generate Encrypted Decentralized Identifiers (DID) corresponding to each SCU. Validator nodes write SCU DIDs on the ledger after the execution of the consensus algorithm. Participant ID, SCU’s DID are used to reference claims and proofs in SCAN. A signature for source code unit has attributes: unit ID, commit ID, organization ID, public repository ID and transaction ID. The other additional attribute on the ledger for SCU DID are “has-duplicate”, “has-valid-license”, “has-usage”. 3.2

Generating Digital ID of Source Code Units

Source Code Unit ID (unit ID) generation is an Off-ledger process that uses an intermediate Abstract Syntax Tree (AST) of the code. The SCU’s Digital ID is got using a hash function that takes in SCU ID, organization ID, commit ID, public repository ID as input. 3.3

Granularity of source code for Digital ID

The SCAN agents that generates Digital ID takes into consideration the granularity of SCU. Granularity of source code unit (SCU) is determined based on a seed value set in the genesis block of the ledger. The SCU seed value is the minimum computed size for an executable byte code. 3.4

De-duplication of Digital ID

The SCAN connector will have basic code De-duplication capabilities. The Off-ledger De-duplication process executed by the SCAN agents adds attributes to DID of the SCU when it identifies duplicate code. Validator nodes use “has-duplicate” attributes as proof to validate duplicate code claims.

3.5

Consensus Algorithms

Proof of Code Commit (POCC) is the consensus algorithm employed by the validator nodes. The validator nodes also validates if the software code unit is still active or not. The last verified active date will get included as a verified attribute on the SCAN ledger. The other consensus algorithms possible are Proof of Open Source Usage (POU) and Proof of Applied Open Source License (POAL). For POU the validator nodes are configured to validate de-duplication attribute in the ledger and for POAL the validator nodes are configured to validate license attribute.

Rolling out SCAN for open source code

An organization can choose to deploy SCAN inside the organization boundaries or make it available to the world by and become a contributing organization to SCAN. In this paper, we will focus on operating SCAN as a public permissioned ledger that can give a worldview of open source code in public repositories. Contributing organization pledge validator and observer nodes in return for access to SCAN.

Bootstrapping a SCAN

The founding organizations of SCAN will operate the ledger and provide operational support required for the newly formed Network. Setting up of SCAN is an automated process. The various stages are:  Setting up validator nodes and observer nodes in the Blockchain Ledger.  One time breeder data from popular open source repositories with most frequently used SCU DIDs uploaded to the ledger.  Initializing the SCAN Agents for participants and Trust Anchors happens next.  Followed by enabling of Peer-to-Peer Attestation Agents. Once SCAN is up and running the participants can start making claims on the SCUs, process verifiable claim requests and share proof of commit, usage and license for any SCU in the ledger.

Expanding and Sustaining SCAN

 Enrolling more contributing organizations, Trust Anchors and participants to SCAN helps expand the SCAN network. Founding organizations can vote and accept

organization as Trust Anchors to SCAN. Trust Anchors have elevated powers to add other Trust Anchors and participants.

Community driven model to govern SCAN

Formation of a SCAN foundation to shepherd SCAN is critical. The members of the foundation will comprise of all the initiating organizations. Members will hold positions in the foundation on a rotational basis and commit to operating validator nodes and SCAN agents for the community. Organizations that are part of the SCAN foundation are responsible for nominating Trust Anchors. They will help keep SCAN operational and build an active SCAN community that will have enough observers and validator nodes constantly running. Eventually SCAN will require lesser number of sponsored nodes as the number of participating nodes increase. The foundation can organize hackathons to seed frequently used SCU DIDs for open source projects in the ledger to improve the efficiency of SCAN. Organize De-duplication challenges for active public repositories.

Earning Credibility on SCAN

8.1

Participating in SCAN

Organization, working groups, individuals earn credibility by participating and remaining active in SCAN. A polling mechanism rewards long running trustee nodes and allocates more credibility points on SCAN. The mechanism also ensures that the participant scans a predetermined number of SCU for at least two other organizations before they get enough runtime credits and start receiving proofs for their own claims to SCUs. SCAN Ledger Dashboard will call out organizations, participants and their contributions verified by the SCAN.

Why would organization volunteer to participate in SCAN

The key benefits to organization participating in SCAN are: ď&#x201A;ˇ Help organizations work effectively with open source and public code repositories. SCAN offers a simple mechanism to disclose their open source contribution and copyright without any intermediate organization. ď&#x201A;ˇ Helps establish provenance and precedence with respect to contributing code to public repositories. Organization can now do preemptive maintenance of their code and protect them from any open source infringement in their proprietary code.

Organization could leverage the proofs made available on SCAN to verify for any unexpected open source code that could have got into their development cycle.  Helps improve open source developer productivity.  Focus on Privacy. SCAN has built-in support for zero-knowledge proofs (ZKP) to avoid unnecessary disclosure of source code identity attributes.

Working with Open Source and public code repositories

10.1

Using Public code repositories and collaboration platforms

There are SCAN agents that run on a regular basis to validate select repository and update DID for software units on the ledger. Build systems are configured to trigger the SCAN agents. The agents that run successfully earn the owner of the agent high credibility rating; this in turn enables them credits to process claims. Processing more number of claims translates to higher credibility rating.

How SCAN can help improve open source developer productivity

SCAN agents can trigger claims directly from the nightly builds. This seamless integration with Continuous Integration (CI) [6] process helps improve the overall developer productivity.

How SCAN avoids unnecessary disclosure

SCAN has built-in support for zero-knowledge proofs (ZKP) [9] to avoid unnecessary disclosure of identity attributes. SCAN Ledger will have attribute to capture original contributors consent. Before sharing public references of SCU as proof in response to a claim request, the validator node verifies the consent of the original contributor on SCAN.

Future work and Extension

This paper is a proposal to setup a prototype of public SCAN Network and onboard founding organizations. In the prototyping phase the repository formats supported will be Git[4], and a basic De-duplication process using intermediate AST is considered . The Blockchain framework considered for SCAN prototype is Hyperledger Indy[8]. Some of the possible future work and enhancements are:  Support for other source code repository formats,  Advanced De-duplication process,

ď&#x201A;ˇ Support for other Blockchain frameworks. Consensus algorithms that can be included into the Blockchain framework layer eventually are Proof of open source Usage (POU) and Proof of Applicable open source License (POSL). SCAN would be an open source project on successful completion of the Prototype.

Summary

Open source is going to be the way majority of code is developed. SCAN has the potential to disintermediate the code audit process. SCAN is a shot in the arm for a transparent and self-attested open source code usage and development. It has the potential to give due credit to the original authors and contributors of software.

References 1)

Decentralized Identifiers (DIDs), https://w3c-ccg.github.io/did-spec/

Open Source Survey,https://www.blackducksoftware.com/2016-future-of-open-source

Blockchain ,http://fortune.com/2016/05/15/leaderless-blockchain-vc-fund/

Git, https://git-scm.com/

DID Descriptor Object (DDO), http://ldapwiki.com/wiki/DID%20descriptor%20objects

Continuous Integration, https://martinfowler.com/articles/continuousIntegration.html

Abstract Syntax Tree (AST) and clone detection, http://www.semanticdesigns.com/Company/Publications/ICSM98.pdf

Hyperledger Indy, https://www.hyperledger.org/projects/hyperledger-indy

Zero-knowledge proofs (ZKP), https://en.wikipedia.org/wiki/Zero-knowledge_proof