About Hivemind, the recent Steemit outage, and "Fake Nodes"steemCreated with Sketch.

in #witness2 years ago (edited)

techtalk.png

Background

The recent Steemit outage has created a bit of community discussion around the API node system, which includes a certain amount of misunderstanding.

I just want to clarify a couple of points about API nodes. This will be a semi-technical post, but I'll try my best to keep it understandable.

Steemit.com API

The main Steemit website currently only uses the official api.steemit.com API endpoint. There's no facility for the end user to change that without deploying their own version Condenser (the Steemit.com front end codebase).

In the event of any issue with api.steemit.com, the main Steemit.com website is therefore unreliable / unavailable.

Two potential workarounds for this would be allowing the user to select an alternative endpoint, or tweaking the live Steemit.com deployment so that it's able to change to a backup API node (or load balanced set of nodes) if a failure is detected.

Discussing those workarounds is outside the scope of this post. Here, I'm only aiming to educate about the situation.

How a typical API node is configured

A "full" API node typically consists of multiple steemd node installations on a single virtual or dedicated machine. Each instance of steemd has its own blockchain database, as well as it's own configuration, since each instance will be running a different set of plugins.

Usually, full nodes will be running an "API" instance of steemd, configured to run the following plugins:

webserver p2p json_rpc account_by_key reputation market_history
database_api account_by_key_api network_broadcast_api reputation_api
market_history_api condenser_api block_api rc_api

Each plugin makes the steemd node respond to a different set of API requests.

Alongside the "API" instance, an "AH" (account history) node will be running with:

webserver p2p json_rpc account_history_rocksdb 
database_api account_history_api condenser_api

Potentially, a full node will also run a local Hivemind database -- but more on that in a minute.

When an API request is sent to the machine, none of those separate steemd instances answers directly.

Rather, there is a "reverse proxy" called Jussi which listens to all incoming requests and routes them to the appropriate steemd node. A request comes in for an API call on the reputation_api, and it'll be handled by the "API" instance. A request for a call within the account_history_api will go to the "AH" instance.

Jussi forwarding and Hivemind

The way Jussi is configured means that the full node owner can specify which of their local instances should handle which type of request. Each steemd instance is listening on a different address in a virtual network created by Docker, so an example Jussi config would contain lines that look like this:

    ["appbase","http://172.18.0.4:8091"],
    ["appbase.account_history_api","http://172.18.0.2:8092"],
    ["appbase.condenser_api.get_account_history","http://172.18.0.2:8092"],
    ["appbase.condenser_api.get_ops_in_block","http://172.18.0.2:8092"],

You can see that certain specific API requests within the appbase namespace get routed to the AH node, while the rest of that namespace gets routed to the API node.

Hivemind, which I mentioned above, is a database layer which monitors incoming blocks and extracts certain data to store in a SQL database. This provides much faster access to the data which Hivemind stores than could be achieved by querying the blockchain directly.

For this reason, a certain set of API calls need to be routed to a Hivemind instance.

In an ideal world, every full API node should be running their own local instance of Hivemind. However, at the moment I believe it's more common for most API node operators to forward Hivemind requests to the centralised Steemit.com Hivemind API; so in their Jussi config you would see something like this:

    ["appbase.condenser_api.get_discussions_by_author_before_date","https://hivemind.steemit.com"],
    ["appbase.condenser_api.get_post_discussions_by_payout","https://hivemind.steemit.com"],
    ["appbase.condenser_api.get_comment_discussions_by_payout","https://hivemind.steemit.com"],

Why many full nodes aren't running Hivemind

When setting up a new full node, it's not really feasible to just install the steemd instances, configure Jussi, and have the steemds download the entire blockchain from seed nodes. It would take weeks - months to sync from scratch, since the blockchain is now so large.

For this reason, @ety001 kindly provides a set of bootstrap files. These are regularly updated with archived copies of the blockchain taken from ety001's own running nodes, and they're customised to each provide the correct subset of blockchain info required by the API and AH nodes. ety001 also provides separate downloads for witness nodes and seed nodes.

Unfortunately, following some previous issues with Hivemind, ety001's Hivemind database became corrupt, and has to be recreated from scratch. @ety001 is doing this right now, but it'll take some time - at least one more week from now.

In the meantime, new full nodes (of which several have come online in recent weeks), are unable to acquire the Hivemind database. Certainly this is the case for api.steemwow.com; at the moment we forward all Hivemind requests to the official endpoint, but plan to run our own Hivemind as soon as the data is available.

I don't believe there's much point in starting a "replay race" with ety001, as it's extremely unlikely that anyone will be able to overtake his replay. Having said that, I do know that at least 2 people are replaying independently.

Please note that none of this is a criticism of ety001 in any way! It's a massive undertaking on his part to provide the bootstraps, in both time and hosting costs. It's merely an unfortunate centralisation that has occurred because no-one else is willing to provide a bootstrap service.

"Fake Nodes"

If you were paying close attention you may have picked up on something in the above.

Because we can tell Jussi to forward requests elsewhere, it would definitely be possible to create a "fake node" where every incoming request just gets forwarded to the official Steemit.com endpoint:

    ["appbase","https://api.steemit.com"],
    ["appbase.account_history_api","https://api.steemit.com"],
    ["appbase.condenser_api.get_account_history","https://api.steemit.com"],
    ["appbase.condenser_api.get_ops_in_block","https://api.steemit.com"],

From the outside, it would look like a real API node; it's responding to all requests correctly, so long as the steemit.com machines are working.

Do I think that's what the operators of the API nodes on @justyy's list are doing? NO.

This is obviously a potential issue and source of paranoia, and during the recent outage, the fact that most of the independant API nodes failed at the same time as api.steemit.com has only served to fuel that fire.

However, there is a very core point that is not being made.

What actually happened, and why it tells you nothing about whether "Fake Nodes" are even a thing

The recent steemit.com outage was caused by a Hivemind failure. A particular transaction, malformed in a very specific way, appeared on the chain in a new block. Hivemind didn't know how to parse it correctly, and crashed.

Steemit.com went down, and people looking at the independent API nodes saw that many of those had gone down as well.

THIS DOES NOT INDICATE THAT THEY WERE FORWARDING HIVEMIND REQUESTS TO STEEMIT.COM.

This wasn't a case where hivemind.steemit.com failed because of some server issue, or power outage. If that was the case, and independent API nodes failed too, then yes, that would indicate they are forwarding requests.

Rather, the actual incident occurred on ALL Hivemind installations simultaneously. As soon as they saw the new block, they crashed.

This is also why all independent Condenser installations didn't fare any better than Steemit.com, whether the API endpoints they were connecting to were forwarding Hivemind requests or not.

@ety001 leapt to the rescue with a fix and was soon able to bring hivemind.steemit.com up again. Other independent nodes would have needed to apply the same fix, available from the Hivemind github repo, in order to bring up their own Hiveminds again.

Steemit.com Mitigation

As mentioned above, some options that could help Steemit.com mitigate this situation in the future could include:

  • Ability for the user to pick an alternative endpoint when using Steemit.com
  • Ability for Steemit.com to automatically failover to independent / official backup nodes

I'm not really here to address the pros and cons of those fixes today; I just wanted to bring a little clarity about the landscape.

Steem Community Mitigation

More top witnesses providing bootstraps to enable quick recovery for independent nodes would be excellent to see.

One thing that would definitely help is if Jussi would implement an API to allow us to query its endpoints. If we could at least get a "this API is local" / "this API is forwarding externally" lookup in Jussi, it would help to understand the level of (de)centralisation of the independent full nodes.

However, if you really truly wanted to create a "fake node", that really wouldn't prevent you hacking Jussi to always return "this API is local".

In more general terms, the SteemWOW team is evaluating a few ideas for enhanced resilience across the whole Steem ecosystem.

Please vote for SteemWOW as a witness!

SteemWOW are committed to providing full STEEM witness services: Witness node with zero block misses, reliable price feed, full API node, fast seed node, and continuing development and user education work.

We'd love to have a witness vote from you to support what we're doing 🙂

Simply go to the Steemit Wallet Witness list, scroll to the bottom, and fill in the form:

vote for us.PNG

Sort:  

Thank you very much for these insightful explanation.

A particular transaction, malformed in a very specific way, appeared on the chain in a new block. Hivemind didn't know how to parse it correctly, and crashed.

This sounds a bit like an attack scenario that definitely needs to be analysed. I hope it will be solved in such a way that attackers will no longer be able to exploit this "issue" and crash Steemit.

Other independent nodes would have needed to apply the same fix, available from the Hivemind github repo, in order to bring up their own Hiveminds again.

Since it affects all independent condenser installations, a patch is absolutely necessary here so that all other nodes can also be updated. However, I have not seen that the repo on github has been updated accordingly.

There's no facility for the end user to change that without deploying their own version Condenser (the Steemit.com front end codebase).

I ran the condenser locally in developer mode during the downtime of Steemit. I assume that the API requests are directed to https://api.steemitdev.com. Obviously the requests could be replied to there anyway or there is also a dev-hivemind version.

I'd also like to hear more about this malformed transaction, what it was, how it got onto the chain, and whether this points out a vulnerability that needs to be fixed.

While I strongly agree that there's no security through obscurity, and even though @ety001 has already patched the issue, I'd rather not elaborate on the particular transaction here until I've had time to audit the codebase a little deeper. I hope you understand!

Just to add that this tx was completely valid from the chain perspective -- no issue with steemd.

I think it probably has been fixed. I noticed this in github the other day: remove special char from hive role title.

Thanks. I overlooked the commit. Probably because it has not yet been pulled into the branch.

That is the fix, yes :)

I have not seen that the repo on github has been updated accordingly.

The pull request on the repo is still pending. I believe it won't be approved until it meets the Continuous Integration checks.

However, I miss-spoke regarding operators needing to update from Github. Rather, most are running Hivemind in production using this docker container.

UPDATE: Even though others have a head start, it may in fact still be worth entering The Great Replay Race.

I've kicked off a Hivemind replay on one of @steemwow's big machines (Ryzen 9 3900).

If it manages to replay first, I'll host an archive until everyone who needs it has been able to grab it.

Thanks for the good information 👍

Your post is manually rewarded by the
World of Xpilar Community Curation Trail

STEEM AUTO OPERATED AND MAINTAINED BY XPILAR TEAM
https://steemit.com/~witnesses vote xpilar.witness

Upvoted! Thank you for supporting witness @jswit.

Hello @rexthetech! You are excellent!


command: !thumbup is powered by witness @justyy and his contributions are: https://steemyy.com
More commands are coming!