RE: I'm starting to learn Node.js and Steem apis: Lorenz curves of post rewards for past two weeks
I think it's important to point out that the last couple of weeks have been – unusual, to put it mildly, compared to historical data in terms of post rewards and payouts for posting. With a lot of the big players actually weighing in on controversial subjects, you get a lot more SP moving in terms of rewarding their posting because they already have wide audiences who are interested in what they have to say (for good or ill). There's been a lot of attention on the platform to what movers and shakers have to say, so inevitably they are going to see more of the voting traffic going to their work than elsewhere.
Normally, this is where I would say that it would be more productive to compare the last week's traffic to that of the same week a year ago, except that circumstances are completely different and there is absolutely no way that you can use information during a black swan event to make assessments and decisions because there's no baseline.
That's not to say that this isn't interesting and important work, because it is. Even if looking at JavaScript makes me physically ill. It takes all kinds. It's a good first cut at getting information out of what is a complicated system. I despair of you ever getting anything useful out of trying to work out how the curational rewards impact things, but that's not your fault – that's the fault of them being completely mad.
When looking at systems like this, you are always going to see some sort of exponential curve. It's the nature of the beast. In particular with proof of stake systems, the more stake you have, the more reason there is to invest more stake as you come closer to having enough to determine governance decisions. That leaves aside the usefulness of being able to direct portions of the reward pool as useful in and of itself. In these kinds of environments, the real question is how much of a change in that exponential curve do you see from period to period?
It might be helpful to do a log scale on the share percentages along the Y axis. Since we know it's going to be exponential upfront, the real question is how does it differ over the testing window and how much does it deviate from the given log base? Juxtaposing these graphs onto the same graph and giving us a log scale vertically would help make it easier to pull good information out of the presentation.
Good work.
Yes, the focus on recent weeks wasn't because I think that's the most interesting data, but just to keep things constrained while the project is in its early phases. Once I have more automation in the flow and more confidence that everything is working properly I'll have it look at a longer historical record.
I've never seen a curve like this on a nonlinear scale before, but I suppose it couldn't hurt to see if it helps make it easier to interpret the data. It's not clear to me that the curves should be exponential if we're just extrapolating from first principles, but from eyeballing these it does seem to be a good approximation. Here's what it looks like when I switch the charts from the first post to log scale on the Y axis.
This is a pretty good plan. It's also really good stage to start thinking about what kind of tools you can use to present your findings in graphical form a little more clearly. Habits you build at this point will serve you really well going forward.
In particular, start looking at ways that you can put these curves on the same graph so that they can be visually compared in a very direct way. Different colors on each and a slightly lower opacity for the line so that you can see when they cross is a good start. Also, for logarithmic Y axes, you're going to really want to do more than one horizontal indicator bar so that the casual observer can see that the progression is logarithmic and tightening as it goes up. No more than eight or it gets really messy in this amount of space, but less than five I don't find to be quite as useful.
I was just reading an article the other day on graph presentation (though focusing on temporal series rather than snapshots, but you will want to be doing temporal series relatively soon if you keep going down this path) that was really helpful.
Looking at what we have right in front of us with these curves, even without laying them over one another we can see that the left side starts further to the right in the newer data but it still retains that interesting upward hook feature at the end, even with a logarithmic presentation. A cynical take might be that a lot of the accounts that were getting the smallest shares of income stopped posting between those periods – but since we don't know who those accounts are (and some of my poking around a couple of years ago suggests that there are a lot of very small post/reward collectors who aren't human at all), it's impossible to really interpret what that might mean.
Good stuff.