Over the past few days, we gave you a general overview of how Ethereum 2.0, or ETH 2.0, works and then showed you ETH 2.0 Staking and the Casper Protocol’s nuances. In this one, we are going to look into another massive feature of ETH 2.0 – Sharding.
Layer 1 vs Layer 2 Scalability
One common criticism of various cryptocurrency and altcoin systems is that of scalability. Put simply, if cryptocurrency and blockchain technology is going to drive the DeFi world of tomorrow, it needs to be able to support billions of people. This is something our comprehensive DeFi guide goes into in-depth, but there are already many solutions. Scalability techniques mainly fall into the following categories – layer 2 and layer 1.
Layer-2 scalability
These are off-chain scalability solutions built on top of the blockchain. The idea here is to leave the base layer alone and put on extra architecture on top of it. This layer deals with complex computations which mitigates the architectural bottlenecks of the base layer. Raiden and Plasma are examples of layer-2 scalability, which we will explore in future articles.
Layer-1 scalability
Scalability techniques that are executed within the blockchain are called layer-1. Increasing the block size and Sharding are the two most well-known layer-1 scalability techniques.
Sharding database
Sharding was initially a technique used to partition bulky databases into more manageable chunks or shards horizontally. Look at this table:
So, do you see what happened here?
There is a large database with 6 rows. By breaking it down, we are converting it into three smaller shards of manageable sizes. This happens only via horizontal partitioning. To understand this, consider the following example.
Consider this table:
Let’s partition this table vertically:
See that? Because of the partition, the table turns into two completely different tables. Horizontal partitioning only changes the table into smaller tables with the same features.
The same concept can be extended to the blockchain, wherein the chain’s state gets fragmented into smaller and more manageable chunks, called shards.
Why is Sharding used?
One of the biggest problems with cryptocurrencies and the core reason behind the creation of Ethereum 2.0 is scalability. Ethereum currently can do <25 transactions per second, which is pretty abysmal. The reason behind this slow speed is the proof-of-work (POW) consensus protocol and the inherent architectural design on these cryptocurrencies.
ETH 2.0: Sequential vs Parallel processes
The majority of the transactional operations that take place in cryptocurrencies are sequential in nature. Think about how a transaction works:
- The sender initiates the transaction by sending it to the receiver’s public address. They sign off the transaction with their digital signature.
- The miners pick up the transaction, verify the signature and check whether the sender has enough balance to fulfill it or not.
- Following that, they add the transaction to their block.
- The block gets added to the blockchain, following which the transaction goes through.
As you can see, the whole process is extremely sequential. Every step depends on the proper fulfillment of the previous step. This problem gets even more compounded as the network increases in size.
This is why choosing a parallelized process can be a more viable alternative. Breaking up a blockchain state into several shards and processing them in parallel can, in essence, allow you to divide and conquer.
Imagine a network with three nodes – A,B, and C. In a sequential format, they would have each had to verify a dataset D individually. However, with Sharding, D would be broken down into three shards D1, D2, and D3. They can each take up an individual shard and process them all at the same time. Even if we are just considering three shards, parallelizing can definitely speed up proceedings quite dramatically.
However, let’s scale things up to Ethereum’s size, which currently has more than 6,970 nodes. If optimally executed, the improvement in overall throughput will be immense. Eth 2.0 will eventually be divided into 1,024 shards and its hoped that this should theoretically increase network throughput by >1000X.
The cost of hosting a node
Up next, let’s look at another aspect of scalability. As you may already know, Ethereum is a peer-to-peer network. There are no centralized data-centers. The whole network depends on its nodes doing their jobs. In Ethereum, each individual node has the same power and privilege as its other peers. In Ethereum, you can either be a light client or a full node.
Light Clients are nodes that download a portion of the blockchain in their system. It allows them to verify transaction execution without having to download and maintain the full blockchain.
A full node is any system connected to the main network that has fully downloaded and is regularly maintaining the blockchain. They are pretty much the backbone of the Ethereum network and fulfill the following roles:
- Either mining blocks or ensuring that the correct block reward is given out for each block mined.
- Completely execute all the consensus rules of the network.
- Making sure that the transactions have the right signatures, and the blocks are in the proper data format.
- Finally, their most crucial function, make sure that double spending isn’t occurring in the network.
The catch is that Ethereum full nodes must download and maintain the whole blockchain at all times. The problem here is that the Ethereum blockchain is enormous. It is fast approaching the 1 TB size, so it’s becoming increasingly difficult for regular nodes to store the entire data.
So, how is Sharding going to help here? As per the official Sharding FAQ on GitHub, the key idea is to allow Ethereum to process upto 10,000+ transactions per second without forcing every node to spend thousands of dollars on hardware equipment. This is why Sharding is such a brilliant solution to this problem. The workload distribution per node decreases significantly.
What is Sharding Ethereum?
Finally, let’s look at how Sharding works in ETH 2.0. The entire state of the ethereum blockchain is called “global state.” This state gets broken down into shards, and each of these shards has its own state. These states, shards, and global roots form a Merkle tree.
So, let’s see what’s happening here. Every single level in the tree is derived from one of the nodes in the level above.
ETH 2.0 Sharding mechanics
When Sharding is activated, the following happens:
- The state gets split into shards.
- Each unique account belongs to one shard.
To visualize how this works, let’s take Vitalik Buterin’s example from Devcon.
Imagine that Ethereum has been split into thousands of islands. Each island can do its own thing. Each of the island has its own unique features and everyone belonging on that island i.e. the accounts, can interact with each other AND they can freely indulge in all its features. If they want to contact with other islands, they will have to use some sort of protocol.
Vitalik Buterin
Ethereum 2.0 executes this by creating two levels of interaction.
The First Level
The first level in the shard interaction is the transaction group. Every shard will have its own unique transaction group.
This group is further subdivided into a transaction group header and body.
Transaction Group Header
The group header has a distinct left and right part.
The Left Part has the following components:
- The Shard ID that the transaction group belongs to, in this case “43.”
- The Pre-state root i.e., the state of the root of that particular shard before the transactions were put inside it.
- Post state root is the state of that root after you put the transaction group inside it.
- Finally, there is a receipt root that acknowledges the fact that the transaction group has entered into the root.
The Right Part of the header is a group of randomly chosen validators who verify the transactions inside the shard itself.
Level One Features
So far, we have seen the components that belong in level one, let’s see how everything comes together:
- Every single transaction specifies the unique shard ID that they belong to.
- A transaction that belongs to a specific shard has occurred between two accounts in that shard.
- The level also shows state transition by specifying both the pre and post state root.
The Second Level
Now, let’s look at the second level of ETH 2.0’s Sharding. What you are looking at the image above is a standard blockchain, but it has two roots, instead of one:
- State root: Remember our Merkle tree diagram from before? The state root is the root node of the entire state.
- Transactions group root: The root node of all the transaction groups that are contained inside all the shards of that block.
Level Two Features
- This level acts as a simple blockchain which accepts transaction groups instead of transactions.
- A transaction group is only accepted if the pre-state root matches the shard root in the global state. Plus, every single signature in the transaction group needs to be validated.
- Once a transaction group enters the block, the global state root of, as mentioned in the block, becomes the post-state root of that shard ID.
Cross-shard communication
Alright, so now you know how the shards individually work and what they are made up of. However, the last thing Ethereum needs is for these shards to become individual silos of their own. There must be a method with which these shards could effectively talk to one another.
To paint a clearer picture, let’s bring back Buterin’s island analogy. If the islands have to thrive, they need to interact effectively with each other using a particular protocol. Plus, to reduce communication overload and expenses, the islands have to figure out a way to communicate only when needed.
The same principle is true for shard communication. Ethereum developers needed to answer certain questions to ensure effective cross-shard communication:
- How can the shards communicate with each other while providing the same security expected by the Ethereum network?
- How can Sharding deliver the massive scalability expected without compromising on security?
ETH 2.0’s cross-shard communication
ETH 2.0’s cross-shard communication protocol of choice is the “receipt paradigm.”
- As we have explained above, every single transaction in the set generates a receipt in the shard.
- The ETH 2.0 beacon chain will have a distributed shared memory where these receipts will be stored.
- The other shards can see the receipts inside the beacon chain. Due to the blockchain’s immutable property, they won’t be able to tamper with it.
- Hence, shards will be able to benefit from each other without affecting finality.
The two biggest problems with cross-shard communications are operational complexities and latency. Let’s see how ETH 2.0 mitigates these road bumps.
#1 Removing complexities
Vitalik Buterin has announced two proposals to create a fully-sharded Ethereum, with a “relatively minimal consensus-layer framework,” that provides sufficient support to develop complex smart contract frameworks.
Complexity kills
The proposals will:
- Siphon several tasks and responsibilities from the individual shards to the beacon chain.
- Ensure that the shards have their own unique state and execution.
- Reduce the complexity of each shard and maintain various network functionalities.
- The shards will have enough functionality that creates an execution environment to support smart contracts in shards, cross-shard communication, and other features.
- Introduce three new transaction types – NewExecutionScript, NewValidator, and Withdrawal to accomplish these tasks.
Looking into the new transaction types
- NewExecutionScript creates an execution script that can hold ETH.
- NewValidator adds new validators to the system.
- Withdrawal removes validators from the beacon chain.
- Addition and withdrawal of validators is authorized using an execution script and receipt system.
- These new innovations will enable Ethereum to conduct cross-shard communications whilst leveraging layer 2 abstraction to exchange all ether and execute smart contracts.
- This takes away all the complex operations form the individual shards and keeps them as uncomplicated as possible.
#2 Communication Latency
To understand how this delay happens, let’s look at how the cross-shard communication works:
- Alice wants to send a token from Shard A to Shard B.
- During this transaction, the token gets burned on Shard A and saves a record of the address. The value then gets sent over to the destination Shard B.
- After a bit of a delay, every single shard learns about the state root of the other shards through gossip.
- Once the shards have verified the transaction, Shard B recovers the token receipt from Shard A.
- Finally, Shard B receives the token sent from Shard A.
As you can imagine, this process causes a lot of delays which will damage user experience and go against the whole scalability ethos of ETH 2.0. Vitalik explained the solution to this problem by giving the following examples:
“…if Bob has 50 coins on shard B, and Alice sends 20 coins to Bob from shard A, but shard B does not yet know the state of shard A and so cannot fully authenticate the transfer, Bob’s account state temporarily becomes ‘70 coins if the transfer from Alice is genuine, else 50 coins.’ Clients that have the ability to authenticate shard A and shard B can be sure of the “finality” of the transfer (ie. the fact that Bob’s account state will eventually resolve to 70 coins once the transfer can be verified inside the chain) almost immediately, and so their wallets can simply act like Bob already has the 70 coins.”
This solution proposal is called “Fast Cross-Shard Transfers Via Optimistic Receipt Root.”
As soon as a transfer has been verified, the transaction:
- Becomes permanent if valid.
- Gets reverted if not.
When will Sharding work?
One of the most crucial technological needs to properly execute Sharding is the proof-of-stake consensus algorithm. The reason being that each individual shard will have a fraction of the original Ethereum chain’s hashrate. However, if the shard contains a powerful mining pool as its validator, it can completely take over the system and centralize its operations.
The phases are – Phase 0, Phase 1, Phase 1.5, and Phase 2.
Phase 0
Phase 0 kickstarts the POS implementation by launching the beacon chain. This chain launches its first (genesis) block once the following conditions are met:
- At least 524,288 ETH staked on the network.
- At least 16,384 validators have signed on.
Phase 1
This phase can be thought of as the “Sharding phase” and will take place in 2021. The blockchain gets partitioned into 64 shard chains which run parallelly and continually communicate with each other. By the end of this phase, Ethereum should be able to process transactions in 64 blocks simultaneously. Since the shards distribute the workload, it reduces the main chain-bloat by a considerable amount.
Phase 1.5
In this phase, the beacon chain gets integrated with the main proof-of-work (POW) chain to create a new POS chain. As defined in the previous phase, the POW chain will end up existing as one of the 64 shard chains.
Keep in mind that the original POW chain’s history will exist, but it will simply exist as any other shard and won’t run the POW consensus mechanism anymore.
Phase 2
The finer details of this phase have still not been fleshed out. However, it is widely believed that this phase will finetune features like ether accounts, transactions, transfers, and seamless smart contract execution on the new chain.
ETH 2.0 Shards: Conclusion
As you can imagine, fine-tuning the different features of seamless Sharding execution is extremely difficult, which is why it’s a good thing that Ethereum 2.0 is launching in stages. However, once executed, this is going to take Ethereum to unprecedented heights. The biggest knock against cryptocurrencies has been its lack of scalability, which has forced newer protocols to opt for more centralized methods and mechanisms. However, with Sharding, Ethereum will be able to scale up significantly without compromising on decentralization.
Do you want to know more about Ethereum and smart contract coding? Want to prepare yourself before Ethereum 2.0 launches? Check out Ivan on Tech Academy’s blockchain courses to find a repository of highly valuable educational blockchain material that will give you a significant edge in the job market!