RE: Properly Decentralising Steem & Cutting Costs by Witnesses running their own Servers & APIs

themarkymark (78) 7 years ago

There is no reason why a Threadripper server with 128Gb ECC Ram would have more replays than a high end Xeon server.

Sure there is, you have 3 machines for the typical one machine. Unless you have redundancy by having 2 of each there will be more downtime. Nothing to do with the CPU, in fact many public full nodes outside of Steemit are in fact AMD CPUs already.

With regard to replays, the collocation of multiple machines means it is much quicker to download the blockchain at 1 or 10 Gb/s ethernet from the other collocated machine.

This isn't the time-consuming part of a replay for a full node.

Why would a replay takes days on a 128Gb 8/16 3.8Ghz machine

Because it takes days on Intel Gold Xeon as well, if you don't have a lot of ram or super fast array of NVMe. A full node replay ranges from about 12 hours to 7 days depending on configuration.

Regarding hosting at home, it depends on whether you have an appropriate setup with fast internet

While Internet speed is important (full nodes will easily go 5-20 TB/month of traffic) power is equally important and UPS isn't good enough to ensure outages in storm situations, depending on where you are during the winter 2-5 days without power isn't uncommon.

Regarding costs of hosting, the point is that witnesses are already paying the cost of running 3 machines (main, backup & seed).

A full node is a public service, a consensus witness node is supposed to be private, hidden, and secure. It is not in the best interest of the network to have them combined on a server that is being accessed by the public.

$0.06

apshamilton (64) 7 years ago

Thanks again. A few clarifications on replay times.
What do you mean by "a lot" of RAM for replay purposes? Is 128Gb not a lot?
NVMe drives doing Up to 3400 MBps are super cheap these days. See https://www.newegg.com/Product/Product.aspx?Item=N82E16820147691
Is this what you mean?

bobinson (63) 7 years ago

What do you mean by "a lot" of RAM for replay purposes? Is 128Gb not a lot?

No, 128GB is not enough. The shared memory file (which is nothing but RAM) is around 260GB right now.

I am actually testing using various crazy configurations to reduce the amount of RAM that is needed & even managed to get Intel's blessing for the same. ( https://steemit.com/steemdev/@bobinson/accelerating-steem-with-intel-optane )

NVMe drives

Optane is better than NVMe in many cases and if the direct memory access works as expected, full nodes, witness nodes etc can be lot more cheaper.

$0.06

apshamilton (64) 7 years ago

Yes, I had read your Optane post. It will be very interesting to see the replay improvement from NVMe to Optane.

bobinson (63) 6 years ago

So replay is slightly faster in optane in comparison to SSD when running in seed and witness modes. But the real challenge is the full nodes where both the memory usage is very high and history files needs additional ~150 GB disk space. For steemit too full nodes seems to be where majority of the infrastructure expenses comes from. There are very few like @anyx and @themarkymark and handful of others running full nodes apart from Steemit Inc

$0.00

apshamilton (64) 6 years ago

Given that replay times are primarily single core CPU bound, I'm interested in whether the much higher single core performance of HEDT CPUs with only 64Gb or 128Gb RAM combined with Optane and/or NVMe can't provide a much faster replay times for a much lower price for both witness and full RPC nodes.

If Optane can act as extra RAM (via cache) then the RAM limitations of 64Gb (for LGA 1151 motherboards) & 128Gb (for LGA 2066 motherboards) are overcome and the dramatically faster single core performance will dominate.

@bobinson did a replay in little over 5 hours on a Xeon Gold 6142 with "block_log, block_log.index and shared_memory.bin all on Optane". This CPU is clocked at 2.6 Ghz base / 3.7 Ghz boost with a single core PassMark of only 1909 (based on similar Xeon Gold 6126). This is faster than @anyx's Xeon Gold 6130 (at 1636) but much slower than HEDT CPUs.

By comparison the i9-9900K does a PassMark of 2909 single core and the i7-7740X does 2622 single core. Even the lowly i5-8400 does 2335.

With the same all on Optane setup these HEDT CPUs with much faster single core performance can bring replay times under 3 hours.

@themarkymark & @anyx: If replay times are less than 3 hours then multiple cheap HEDT machines (even without ECC) will provide much greater overall uptime & performance than a single Xeon Gold for both witness & full RPC nodes.

$0.10

3 votes

bobinson (63) 6 years ago

I am doing a test with 1.4TB memory which is system memory + Optane in IMDT. Hope this will give more details.

$0.00

Show 3 more replies

themarkymark (78) 7 years ago

What do you mean by "a lot" of RAM for replay purposes? Is 128Gb not a lot?

Not for a full node it isn't but if you are breaking modules across machines, it isn't so bad. But unless 100% is in ram, you will notice a huge increase in replay times.

NVMe drives doing Up to 3400 MBps are super cheap these days

Most NVME drives will do no where near 3400 MB/s in reality, more like 1000-2400 far more than SSD but no where near where they claim, and FAR FAR from real memory speeds.

$0.00

apshamilton (64) 7 years ago

Also 3 machines can create redundancy that single mega server doesn’t have. Heavily used APIs can be on 2 of 3 machines while lightly used only on one. If you go to 4 x $2000 machines you get heaps of redundancy for half annual price of renting a mega server.

Posted using Partiko iOS

themarkymark (78) 7 years ago (edited)

Not really, unless you are talking 3 machines running ALL plugins. If so, 128GB will be a problem. If they are running only some of the plugins, then you will have problems when one goes down, they basically all go down because your account history may be up but your tags or follows are down and that breaks whatever app you are using.

Jussi would handle the routing, but it does not have load balancing.

So splitting plugins across three machines increases your risk for downtime. Much like Raid 0, if any disk goes down, you are down.

$0.00

apshamilton (64) 7 years ago

Re security, can it not be achieved by running witness node and API node in separate docker containers or VMs, sharing only the consensus data. You can even use the dual ethernet on most HEDT motherboards to provide 2 completely separate internet access and IP addresses (one hidden, one public). Only in the event of one internet going down would they share internet. Redundant internet is a small cost. This is the sort of setup you can only do on your own machines, not in a data center.

themarkymark (78) 7 years ago

Just knowing the IP of a witness node is a security risk. Even if you had multiple IP's they would be on the same subnet most likely and easy to track it down.

You can't share consensus data between VM's, they can't be both writing to the blockchain file.

$0.00

apshamilton (64) 7 years ago (edited)

Two diverse internet providers would provide completely different IPs.
Surely only the witness node would be writing to the blockchain file? Isn't that the whole point of dPOS consensus?
The APIs should just be reading from it? Or am I misunderstanding something.
If so then Docker can allow this. https://www.digitalocean.com/community/tutorials/how-to-share-data-between-docker-containers

@anyx your thoughts on these issues would be appreciated.

themarkymark (78) 7 years ago (edited)

An RPC node runs the witness plugin and the blockchain file grows on all nodes even seed nodes. They cannot share the same file.

While docker images can easily share data, that does not mean the underlying applications can.

Two diverse internet providers would provide completely different IPs.

I thought you were talking about sharing witness and full node on same hardware, now you are talking about two ISP?

$0.00

apshamilton (64) 6 years ago

HEDT motherboard often have two Ethernet Ports. Internet via two separate ISPs can plug into each port and each VM or docker container can use a different ISP as main and the alternate as backup. Only in event of outage of one would witness & API nodes be using same ISP & IP address range.

Posted using Partiko iOS

4 votes

apshamilton (64) 7 years ago

Power outages are extremely rare in central Tel Aviv but I know that they are much more common in parts of the US.

Posted using Partiko iOS

themarkymark (78) 7 years ago (edited)

Maybe so, but there are so many other issues that's only one of the many, and I can assure you they are far more common than a data center which virtually has zero. Even a 10 second power outage will cost you 18 hours to 14 days of downtime.

$0.00