Smilo Update — Smilodon Populator upgrade
A major milestone for Smilo, merging upstream with the release of Go Ethereum v1.9.0 to Smilodon Populator — Wolfmoon!
A major milestone for Smilo, we’ve merged upstream with the release of Go Ethereum v1.9.0 (https://blog.ethereum.org/2019/07/10/geth-v1-9-0/) to Smilodon Populator — Wolfmoon (GETH 1.9.0)!
This opened up the way for more features and improved performance and efficiency.
The features with the highest impact are;
· Optimization of CPU and disk IO complexity
· The analysis and optimization of the account and storage trie access patterns across blocks.
· The analysis and optimization of various EVM opcodes, aiming to find outliers both in Geth’s EVM implementation as well as Ethereum’s protocol design in general.
· The analysis and optimization of our database schemas, trying to both remove any redundant data as well as redesign indexes for lower disk use (sometimes at the cost of a slight CPU hit).
· The discovery of a LevelDB compaction overhead during the state sync phase of fast sync.
For more information check the https://github.com/Smilo-platform/go-smilo/releases blog
It’s interesting to realize that the “Performance” section was somewhere at the end of previous announcements, but over the years it became one of the most sought after improvement.
Over the past 6 months, we’ve tried to dissect the different components that are on the critical path of block processing, in an attempt to identify and optimize some of bottlenecks. Among the many improvements, the highest impact ones were:
- The discovery and optimization of a quadratic CPU and disk IO complexity, originating from the Go implementation of LevelDB. This caused Geth to be starved and stalled, exponentially getting worse as the database grew. Huge shoutout to Gary Rong for his relentless efforts, especially as his work is beneficial to the entire Go community.
- The analysis and optimization of the account and storage trie access patterns across blocks. This resulted in stabilizing Geth’s memory usage even during the import of the Shanghai DoS blocks and speeding up overall block processing by concurrent heuristic state prefetching. This work was mostly done by Péter Szilágyi.
- The analysis and optimization of various EVM opcodes, aiming to find outliers both in Geth’s EVM implementation as well as Ethereum’s protocol design in general. This led to both fixes in Geth as well as infos funneled into the Eth 1.x scaling discussions. Shoutout goes to Martin Holst Swende for pioneering this effort.
- The analysis and optimization of our database schemas, trying to both remove any redundant data as well as redesign indexes for lower disk use (sometimes at the cost of a slight CPU hit). Props for these efforts (spanning 6–9 months) go to Alexey Akhunov, Gary Rong, Péter Szilágyi and Matthew Halpern.
- The discovery of a LevelDB compaction overhead during the state sync phase of fast sync. By temporarily allocating pruning caches to fast sync blooms, we’ve been able to short circuit most data accesses in-memory. This work was mostly done by Péter Szilágyi.
Wouldn’t it be amazing if we didn’t have to waste so much precious space on our expensive and sensitive SSDs to run an Ethereum node, and could rather move at least some of the data onto a cheap and durable HDD?
With the v1.9.0 release, Geth separated its database into two parts (done by Péter Szilágyi, Martin Holst Swende and Gary Rong):
Recent blocks, all state and accelerations structures are kept in a fast key-value store (LevelDB) as until now. This is meant to be run on top of an SSD as both disk IO performance is crucial.
Blocks and receipts that are older than a cutoff threshold (3 epochs) are moved out of LevelDB into a custom freezer database, that is backed by a handful of append-only flat files. Since the node rarely needs to read these data, and only ever appends to them, an HDD should be more than suitable to cover it.
A fresh fast sync at block 7.77M placed 79GB of data into the freezer and 60GB of data into LevelDB.
Who doesn’t just love JSON-RPC? Me!
As its name suggests, JSON-RPC is a Remote Procedure Call protocol. Its design goal is to permit calling functions, that do some arbitrary computation on the remote side, after which they return the result of said computation. Of course — the protocol being generic — you can run data queries on top, but there’s no standardized query semantic, so people tend to roll their own.
Without support for flexible queries however, we end up wasting both computational and data transfer resources:
RPC calls that return a lot of data (e.g. eth_getBlock) waste bandwidth if the user is only interested in a handful of fields (e.g. only the header, or even less, only the miner’s address).
RPC calls that return only a bit of data (e.g. eth_getTransactionReceipt) waste CPU capacity if the user is forced to repeat the call multiple times (e.g. retrieving all receipts one-by-one results in loading all of them from disk for each call).
In the case of Ethereum’s JSON-RPC API, the above issues get exacerbated by the mini-reorg nature of the blockchain, as doing multiple queries (e.g. eth_getBalance) need to actually ensure that they execute against the same state and even against the same node (e.g. load balanced backends might have slight sync delays, so can serve different content).
Yes, we could invent a new, super optimal query mechanism that would permit us to retrieve only the data we need, whilst minimizing computational and data transfer overhead… or we could also not-reinvent the wheel (again) and rather use one that’s been proven already: GraphQL.
Wallets, wallets everywhere!
When Ethereum launched in 2015, there was no third party tooling whatsoever, so client implementations needed to be these all-encompassing Swiss army knives. Ranging from peer-to-peer networking, through account management, to contract and user interactions, everything was done by the client. This was necessary, but seriously sub-optimal: accounts and networking don’t go well together security wise, and everything done by a single binary doesn’t permit a composable ecosystem.
We’ve been wanting to do this for at least 2 years now, and Geth v1.9.0 finally ships the work of Martin Holst Swende (with the help of many others): a standalone signer for the entire Ethereum ecosystem called Clef. As simple as a “standalone signer” might sound, Clef is the result of an insane amount of architectural work to make it secure, flexible and composable.
A small release blog post simply cannot do this project justice, but we’ll try nonetheless to at least mention the major features of Clef, the design decisions behind them and how they can enable a whole set of new use cases.
The main reason for creating Clef was to remove account management from Geth (don’t worry, the old way will still work for the foreseeable future). This permits Geth to be an “insecure” network gateway into Ethereum, which should solve many many issues with regard to accidentally exposing accounts via RPC (and unlocking them, the deadly combo).
But hogging all this work for Geth wouldn’t be nice of us. Instead, we designed Clef to be usable by arbitrary programs, so that you can have a single signer securely managing your keys, to which arbitrary applications (e.g. Geth, Parity, Trinity, Metamask, MyCrypto, Augur) can send signing requests to!
To achieve this, Clef exposes a tiny external API (changelog) either via IPC (default) or HTTP. Any program that can access these endpoints (e.g. Geth via IPC, Metamask via HTTP) can send signing requests to Clef, which will prompt the user for manual confirmation. The API is deliberately tiny and uses JSON-RPC, so it should be trivial to support in any project.
Our goal with Clef is not to be “The Geth Signer”, rather we’d like it to become a standalone entity that can be used by any other project.
Light clients are tricky and they make everything more complicated than it should be. The root cause is more philosophical than technical: the best things in life are free, and the second best are cheap. In Ethereum client terms, the “best” clients are those that work with 0 overhead (think Metamask, Infura), the second best are the light clients.
Problem is, trusted servers go against the ethos of the project, but light clients are often too heavy for resource constrained devices (ethash murders your phone battery). Geth v1.9.0 ships a new mode for light clients, called an ultra light client. This mode aims to position itself midway on the security spectrum between a trusted server and a light server, replacing PoW verification with digital signatures from a majority of trusted servers.
With enough signatures from independent entities, you could achieve more than enough security for non-critical DApps. That said, ultra light client mode is not really meant for your average node, rather for projects wishing to ship Geth embedded into their own process. This work was spearheaded by Boris Petrov and Status.
Light clients are dirty little cheats! Instead of downloading and verifying each header from the genesis to chain head, they use a hard coded checkpoint (shipped within Geth) as a starting point. Of course, this checkpoint contains all the necessary infos to cryptographically verify even past headers, so security wise nothing is lost.
Still, as useful as the embedded checkpoints are, they do have their shortcomings:
As the checkpoints are hard coded into our release binaries, older releases will always start syncing from an older block. This is fine for a few months, but eventually it gets annoying. You can, of course, update Geth to fetch a new checkpoint, but that also pulls in all our behavioral changes, which you may not want to do for whatever reason.
Since these checkpoints are embedded into the code, you’re out of luck if you want to support them in your own private network. You’d need to either ship a modified Geth, or configure the checkpoints via a config file, distributing a new one whenever you update the checkpoint. Doable, but not really practical long term.
This is where Gary Rong’s and Zsolt Felföldi’s work comes in to play. Geth v1.9.0 ships support for an on-chain checkpoint oracle. Instead of relying on hard-coded checkpoints, light clients can reach out to untrusted remote light servers (peer-to-peer, no centralized bs) and ask them to return an updated checkpoint stored within an on-chain smart contract. The best part, light clients can cryptographically prove that the returned data was signed by a required number of approved signers!
Wait, how does a light client know who’s authorized to sign an on-chain checkpoint? For networks supported out of the box, Geth ships with hard coded checkpoint oracle addresses and lists of authorized signers (so you’re trusting the same devs who ship Geth itself). For private networks, the oracle details can be specified via a config file.
Although the old and new checkpoint mechanisms look similar (both require hard-coded data in Geth or a config file), the new checkpoint oracle needs to be configured only once and afterwards can be used arbitrarily long to publish new checkpoints.
Ethereum contracts are powerful, but interacting with them is not for the faint of heart. Our checkpoint oracle contract is an especially nasty beast, because a) it goes out of its way to retain security even in the face of chain reorgs; and b) it needs to support sharing and proving checkpoints to not-yet-synced clients.
As we don’t expect anyone (not even ourselves) to manually interact with the checkpoint oracle, Geth v1.9.0 also ships an admin tool specifically for this contract, checkpoint-admin. Note, you’ll only ever need to care about this if you want to run your own checkpoint oracle for your own private network.
Monitoring -> Metrics collection
First thing’s first, metrics need to be gathered before they can be exported and visualized. Geth can be instructed to collect all its known metrics via the — metrics CLI flag. To expose these measurements to the outside world, Geth v1.9.0 features 3 independent mechanisms: ExpVars, InfluxDB and Prometheus.
ExpVars are a somewhat custom means in the Go ecosystem to expose public variables on an HTTP interface. Geth uses its debug pprof endpoint to expose these on. Running Geth with — metrics — pprof will expose the metrics in expvar format at http://127.0.0.1:6060/debug/metrics. Please note, you should never expose the pprof HTTP endpoint to the public internet as it can be used to trigger resource intensive operations!
ExpVars are well-ish supported within the Go ecosystem, but are not the industry standard. A similar mechanism, but with a more standardized format, is the Prometheus endpoint. Running Geth with — metrics — pprof will also expose this format at http://127.0.0.1:6060/debug/metrics/prometheus. Again, please never expose the pprof HTTP endpoint to the public internet! Shoutout to Maxim Krasilnikov for contributing this feature.
Whereas ExpVars and Prometheus are pull based monitoring mechanisms (remote servers pull the data from Geth), we also support push based monitoring via InfluxDB (Geth pushes the data to remote servers). This feature requires a number of CLI flags to be set to configure the database connection (server, database, username, password and Geth instance tag). Please see the METRICS AND STATS OPTIONS section of geth help for details ( — metrics.influxdb and subflags). This work was done by Anton Evangelatov.
Now here’s another piece of legacy infrastructure! Apart from a teeny-tiny modification, Ethereum’s discovery protocol has been specced, implemented and set in stone since pretty much forever. For those wondering what the discovery protocol is all about, it’s the mechanism through which a new node can find other Ethereum nodes on the internet and join them into a global peer-to-peer network.
So… what’s wrong with it then? Didn’t it work well enough until now? If it ain’t broken, don’t fix it and all that?
Well, Ethereum’s original discovery protocol was made for a different time, a time when there was only one chain, when there weren’t private networks, when all nodes in the network were archive nodes. We outgrew these simplistic assumptions, which although is a success story, it also brings new challenges:
- The Ethereum ecosystem nowadays has many public, private and test networks. Although Ethereum mainnet consists of a large number of machines, other networks are generally a lot smaller (e.g. Görli testnet). The discovery protocol doesn’t differentiate between these networks, so connecting to a smaller one is a never ending trial and error of finding unknown peers, connecting to them, then realizing they are on a different network.
- The same original Ethereum network can end up partitioning itself into multiple disjoint pieces, where participants might want to join one piece or the other. Ethereum Classic is one of the main examples here, but a similar issue arises every time a network upgrade (hard fork) passes and some nodes upgrade late. Without information concerning the rules of the network, we again fall back to trial and error connectivity, which is computationally extremely expensive.
- Even if all nodes belong to the same network and all nodes adhere to the same fork rules, there still exists a possibility that peering is hard: if there is connectivity asymmetry, where some nodes depend on services offered by a limited subset of machines (i.e. light clients vs. light servers).
Long term we’re working towards a brand new version of the discovery protocol. Geth’s light clients have been since forever using a PoC version of this, but rolling out such a major change for the entire Ethereum network requires a lot of time and a lot of care. This effort it being piloted primarily by Felix Lange and Frank Szendzielarz in collaboration with Andrei Maiboroda from Aleth/C++, Antoine Toulme with Java, Age Manning from Lighthouse/Rust and Tomasz Stańczak from Nethermind/C#.
Ethereum Node Records
The above was a whole lot of text about something we didn’t ship! What we did ship however, is the Ethereum Node Record (ENR) extension of the new discovery protocol, which can actually run on top of the old protocol too! An ENR is a tiny, 300 byte, arbitrary key-value data set, that nodes can advertise and query via discovery. Although the new discovery protocol will provide fancy ways of sharing these in the network, the old protocol too is capable of directly querying them.
The immediate benefit is that nodes can advertise a lot of metadata about themselves without an expensive TCP + crypto handshake, thus allowing potential peers to filter out unwanted connections without ever making them in the first place! All credits go to Felix Lange for his unwavering efforts on this front!
Ok, ok, we get it, it’s fancy. But what is it actually, you know, useful for, in human-speak?
Geth v1.9.0 ships two extensions to the discovery protocol via ENRs:
- The current discovery protocol is only capable of handling one type of IP address (IPv4 or IPv6). Since most of the internet still operates on IPv4, that’s what peers advertise and share with each other. Even though IPv6 is workable, in practice you cannot find such peers. Felix Lange’s work on advertising both IPv4 and IPv6 addresses via ENRs allows peers to discover and maintain Kademlia routing tables for both IP types. There’s still integration work to be done, but we’re hoping to elevate IPv6 to a first class citizen of Ethereum.
- Finding a Rinkeby node nowadays works analogously to connecting to random websites and checking if they are Google or not. The discovery protocol maintains a soup of internet addresses that speak the Ethereum protocol, but otherwise has no idea which chain or which forks they are on. The only way to figure out, is to connect and see, which is a very expensive shooting-in-the-dark. Péter Szilágyi proposed an extension to ENR which permits nodes to advertise their chain configuration via the discovery protocol, resulting in a 0-RTT mechanism for rejecting surely bad peers.
The most amazing thing however with ENR — and its already implemented extras — is that anyone can write a UDP crawler to index Ethereum nodes, without having to connect to them (most nodes won’t have free slots; and crawlers that do connect via TCP waste costly resources). Having simple access to all the nodes, their IPs/ports, capabilities and chain configurations permits the creation of a brand new discovery protocol based on DNS, allowing nodes with blocked UPD ports (e.g. via Tor) to join the network too!
We’ve had a varying number of bootnodes of varying quality, managed by varying people since the Frontier launch. Although it worked well-ish, from a devops perspective it left a lot to desire, especially when it came to monitoring and maintenance. To go along our Geth v1.9.0 release, we’ve decided to launch a new set of bootnodes that is managed via Terraform and Ansible; and monitored via Datadog and Papertrail. We’ve also enabled them to serve light clients, hopefully bumping the reliability of the light protocol along the way. Huge shoutout to Rafael Matias for his work on this!
Beside all the awesome features enumerated above, there are a few other notable changes that are not large enough to warrant their own section, but nonetheless important enough to explicitly mention.
- The origin check on WebSocket connections ( — wsorigins) is enforced only when the Origin header is present.
- This makes it easier to connect to Geth from non-browser environments such as Node.js, while preventing use of the RPC endpoint from arbitrary websites.
- You can set the maximum gas for eth_call using the — rpc.gascap command line option. This is useful if exposing the JSON-RPC endpoint to the Internet.
- All RPC method invocations are now logged at debug level. Failing methods log as warning so you can always see when something isn’t right.
- Geth v1.9.0 supports the eth_chainId RPC method defined in EIP 695.
- The default peer count is now 50 instead of 25. This change improves sync performance.
- A new CLI tool (cmd/devp2p) was added to the source tree for for debugging P2P networking issues. While we don’t distribute this tool in the alltools archive yet, it’s already very useful to check issues with peer discovery.
- The P2P server now rejects connections from IPs that attempt to connect too frequently.
- A lot of work has gone into improving the abigen tool. Go bindings now support Solidity struct and function pointer arguments. The Java generator is improved as well. The mobile framework can create deploy transactions.
- Significant parts of the go-ethereum repo now build without CGO. Big thanks to Jeremy Schlatter for this work.
For complete list of features included, please check official GETH post: https://blog.ethereum.org/2019/07/10/geth-v1-9-0/
Be part of the Smilo hybrid blockchain movement!