Curating for Value: How "Follower Network Strength" Improves Steem Post Ranking

in Suggestions Club6 months ago (edited)

Background


Image by Google Gemini

Last weekend, I published the Steem Follower Checker GitHub repo and posted about it in the article, Introducing "Steem Follower Checker": A new Open Source browser plugin. This concept actually came out of my work on "suggested votes" in the Steemometer, about three months ago. Today, I want to talk some more about the "Follower Network Strength" metric that's at the heart of the browser plugin.

Before I get into the details of the metric, itself, let me cover the reasoning behind it. The reasoning goes something like this:

  1. The goal of the community of Steem curators is (or should be) to correctly rank all Steem posts in terms of value.
    • In this context, "value" does not mean "payout". It's an abstract and subjective ranking. In ranking the posts, the goal of curation is to get the "payout" near the "value".
  2. At a high level, the value of a Steem post is (or should be) primarily determined by two factors:
    1. The quality of the post.
      • This includes aspects like originality, writing quality, surprise factor, relevance, information content, attractiveness of the presentation, etc., and it is best measured by a human, an AI, and/or some combination of the two.
    2. The reach, virality, or engagement of the post.
      • Ideally, we could measure this with view counters, but that information is not available on the blockchain (also, view counters can be gamed), so we need a way to estimate it. @realrobinhood estimates it by counting resteems and comments.
      • The "Follower Network Strength" is another way to estimate this in a way that lets the curator attempt to predict virality as a function of the author's historical ability to build a follower network.
      • The benefit of "Follower Network Strength" in contrast to measuring comments and resteems is that it (hopefully) gives an earlier estimate of a post's potential for virality.
    3. Note that I am not including factors such as token burning or club status in the post value. In my opinion, these should be viewed as levers for audience building, but they do not directly impact the value of a post.
  3. As blockchain usage patterns change and possible improvements are discovered, the "Follower Network Strength" metric will need repeated adjustments.
    • Accordingly, I have set a goal to publish a new version at the end of June, and then quarterly after that.
  4. If curators adopt this metric and it starts to affect authors' payouts, changes will become controversial. So, transparency is required.
  5. The current "Follower Network Strength" score is based on two values that can be easily pulled from the blockchain:
    1. The number of followers
    2. The median reputation of followers
    • These values are combined into a score that ranges from 0.01 to 1.414 (square root (2)).
  6. The two main improvements that I think are needed for the end of June are:
    1. Follower networks are comprised of active and inactive followers, so all follower counts are not the same. For example, someone who raised 2,000 followers in six months should probably get a higher score than someone who raised 2,000 followers in six years, since a higher percentage of the 2nd author's followers are likely to be inactive.
    2. It is too easy for an author to get a maximum score.

So there's the background. The current methodology is described in Introducing "Steem Follower Checker": A new Open Source browser plugin, and - in the spirit of transparency - here's how I intend to address those two topics for the next update.

Planned Updates for the Next Version

Attempting to adjust for inactive followers

I have known about the problem of inactive followers for a while. However, the naïve solution is to check the follower accounts' recent activity, and adjust the follower counts accordingly. Unfortunately, this involves numerous network calls, so I believe that it's unsuitable for a browser plugin. So, I was stuck for a while without any feasible ideas.

Last week, I had another thought, though.

What we can do without adding much network activity is check the age of the author's account, and then average the number of followers into a time-based average over the life of the account (i.e. new followers per day, new followers per week, new followers per month, etc.). Then, we can replace the follower count with the time-averaged follower count in the initial calculation.

If we do this, then the author who collects 2,000 followers in six months gets a higher score than the author who collects 2,000 followers in six years. Additionally, if an account goes inactive and their follower count stops growing, then their "Follower Network Strength" will gradually decline.

Making it more challenging for an author to receive a top-score

The naïve solution is just to play with the minimum and maximum settings and look at how they change the heatmap. I had actually started down this path before I thought of the time-averaged follower count. However, implementing the time-averaged follower count means that all parameter settings will need to be changed. Therefore, these changes would become obsolete.

Instead, what I'll need to do is to start almost from scratch with the min and max parameters after replacing the total follower count with the time-averaged follower count and then tune the values according to the updated heat maps.

Summary

In conclusion, the planned changes for June are as follows:

  1. Change "follower count" to "time-averaged follower count" as one factor of the "Follower Network Strength" calculation.
  2. Tune parameters to award top scores to a smaller proportion of accounts.

After soliciting feedback from ChatGPT and Google Gemini, one point that I also want to emphasize is that this metric is not intended to be a standalone value. Instead, the idea is for curators to incorporate this metric as just one factor of many that feeds into their voting and following decisions.

Thank you for your attention, please let me know if you have any feedback or suggestions.


Thank you for your time and attention.

As a general rule, I up-vote comments that demonstrate "proof of reading".




Steve Palmer is an IT professional with three decades of professional experience in data communications and information systems. He holds a bachelor's degree in mathematics, a master's degree in computer science, and a master's degree in information systems and technology management. He has been awarded 3 US patents.


image.png

Pixabay license, source

Reminder


Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.

Sort:  

Follower networks are comprised of active and inactive followers, so all follower counts are not the same. For example, someone who raised 2,000 followers in six months should probably get a higher score than someone who raised 2,000 followers in six years, since a higher percentage of the 2nd author's followers are likely to be inactive.

In principle, I would agree with this, because it can be assumed that new followers are constantly being added to active (even older) accounts.

I have to point out an older quote from you in this context:

I'm seeing that it's probably far too easy to get a top score.

Wouldn't your new approach with a bunch of spam accounts lead to the top even more easily?

and then average the number of followers into a time-based average over the life of the account (i.e. new followers per day, new followers per week, new followers per month, etc.).

I was wondering how you want to determine the number of followers per day (i.e. for a certain period of time), etc. I probably had the same difficulties in understanding as event-horizon. However, a look at your new code revealed to me that you calculate the number over the entire account lifetime.

It would be more interesting and comparable to look at the same time period for all accounts, but this would not be possible with the existing blockchain methods (or only with more network load).
However, Steemchiller's data service could be useful here once again. Among other things, it offers a follower history that can also be filtered by time period:
https://sds0.steemworld.org/followers_api for Follow and Unfollow

 6 months ago 

Wouldn't your new approach with a bunch of spam accounts lead to the top even more easily?

So far, that doesn't seem to be the case. In general, it has lowered the scores on most of the accounts that I've checked. The big problem with this new scoring has been that there is sort of a dumbbell shaped distribution with a number of the big/old/inactive accounts still retaining a top score. The more recent accounts that I've checked all look fairly reasonable, but there are a number of accounts who still appear to have strong follower networks, even though they've been inactive for 4+ years.

However, Steemchiller's data service could be useful here once again. Among other things, it offers a follower history that can also be filtered by time period:
https://sds0.steemworld.org/followers_api for Follow and Unfollow

Thanks! This might be good way of dealing with that dumbbell shape. I might might not get that in until September, though. I think my next step is to write up a script to collect all the accounts for one day, then score and visualize them so I can gauge the score distribution for the accounts who are actively posting. I'm doubtful about whether I'll find time for both before the end of June.

but there are a number of accounts who still appear to have strong follower networks, even though they've been inactive for 4+ years.

Das wäre allerdings nicht nachvollziehbar.

I might might not get that in until September

There's no rush. I also don't have much time at the moment (when it's warmer outside). There are soo many other tasks on my list :-))

collect all the accounts for one day

Di you really mean all accounts? That would of course be a good basis, especially if the activity is recorded at the same time. Once you have the data, you can also work better with minimal changes without having to request the data every time...

 6 months ago 

Here are two visualizations of all the accounts that posted or commented during a 24 hour period from early yesterday until early today using the scoring that I had posted last weekend. With my setup, it takes about 6 hours to collect and score a day's worth of accounts. The median score is about 0.225.

Followers / month across the X-axis, median follower reputations along the Y. Grey is the low end of the scale, orange is the top end.

In this visual, the bigger circles have more accounts in the range. The color gradients are the same as above.

Most of the circles outside the main cluster are size=1. The fair-sized circle at [0, 0] was sort of a surprise. I didn't check, but I'm guessing that the big groups with the 61 median rep and low follower counts are nearly all followed by the same account.

I was able to correctly guess the one in the [392-408, 24-26] bucket. I'm guessing that the 0.01 score tells us something about how the account got its ~33k followers.

I think the score already works quite well. The extreme cases at the edges (top and left) will probably be interesting and important for the "settings"... the clusters should be equalised with the "settings" in order to be able to differentiate them better.

I find the value at [48,24] remarkable. Although the follower median is very low and the number of monthly followers is not very high, it has a high score. This must then probably compensate for the low median with an above-average number of followers.

I also find it interesting that the score in the "columns" of the X-axis differs only slightly along the Y-axis. I'm not sure what this means. The score seems to be more influenced by the median than the total number of followers. Does this indicate that over all accounts, the number of new followers could be almost constant, and thus the age of the accounts (for active accounts) could have almost no influence?

 6 months ago 

I think the score already works quite well.

Agreed. There's certainly still room for improvement, but this feels much better to me than the previous version. The big thing that I wanted to do was make it harder to get a top-score, and I think I probably accomplished that. IMO, top scores should be pretty rare. In general, I also think that the overall distribution of scores is much better than before, but I can't be sure based on the spotty checking that I did with the May 18 version.

I just updated my python script & PowerBI report so I can collect a day's worth of comments now and score them with both the old and new methods and then compare the visualizations. I'll run it overnight so that over the weekend I'll be able to see what the two methods look like side by side with the same set of accounts from a full day of posting/commenting.

I find the value at [48,24] remarkable.

Yeah, I'm not sure what I think about this one. On one hand, 48 followers per month (the real value is 50, but it's in the 48 bin) really is uncommonly high, so the score might not be unreasonable. On the other hand, the follower quality still seems low. I reduced the reputation cut-off back down from 30 to 25 in this version. I might bump it back up, at least part way.

Does this indicate that over all accounts, the number of new followers could be almost constant, and thus the age of the accounts (for active accounts) could have almost no influence?

This is kind-of what I had in mind by switching to followers per month. The hope was that an author would need to keep gaining followers with similar reputations in order to maintain their score. If an account stops gaining followers (or slows down), then the score would decay. As implemented, it seems to me that new accounts might have a small advantage, but that might not be too bad. It could potentially give a little bit of a boost to new arrivals.

TEAM 5

Congratulations! This post has been upvoted through steemcuratorXX We support quality posts, good comments anywhere, and any tags.



Curated by : @soulfuldreamer

Just installed the extension in my browser and checked my follower strength.

average the number of followers into a time-based average over the life of the account (i.e. new followers per day, new followers per week, new followers per month, etc.).

I'm sorry if I didn't get it right...Do you mean the network strength should be calculated in specified time stamps? Like event-horizon's follower network strength in July 2022 or from Jan 2021 to Sep 2021?

I do have a lot of inactive followers and so does any user who joined years later than me. People keep joining and quitting all the time so if we are to check the overall network strength then shouldn't only active followers be factored in even while calculating the time-averaged follower count?

(If there's absolutely no activity for over 3 months then the follower/user should be rendered inactive.)

These are just my initial thoughts. I'm sure you already considered everything, so my apologies if I sound too naive.

Edited to add:

It just occured to me that you weren't talking about network strength in a specififed time. Instead, the rate at which a user gain new followers in different time frames, right?

 6 months ago (edited)

Here are my results 🤔:

image.png

Does this mean I have more inactive followers than you?

I think this only means you have less followers. I joined earlier so obviously I have more followers and probably much more inactive followers than you.

In the current version of this extension, remlaps hasn't taken inactive followers into account. He's working on it. Once updated, your network strength would be more, in my opinion.

 6 months ago 

This is correct. The current version only looks at total number of followers. The next version will average the follower count over the age of the account in order to get a (very) rough estimate for the declining influence of inactive followers. (Actually counting the inactive followers would involve too much network activity, so for now an estimate is the best I can do.)

It just occured to me that you weren't talking about network strength in a specififed time. Instead, the rate at which a user gain new followers in different time frames, right?

Yes, I think that's right. Basically, I'll be scoring it based on "new followers per month" on one axis, and "median follower reputation" on the other.

cc: @o1eh, FYI

Here's an example with my account, though the June 30 version will continue to change during the next month:

May 18 versionDRAFT June 30 version (still being revised)

Ahh... right. Thank you for explaining. Looking forward to the new version.

TEAM 5

Congratulations! This post has been upvoted through steemcuratorXX We support quality posts, good comments anywhere, and any tags.



Curated by : @soulfuldreamer

 6 months ago 

Thank you, @soulfuldreamer!

 6 months ago 

Thank you, because I did not understand everything that was described in the post 😆

TEAM 5

Congratulations! This post has been upvoted through steemcuratorXX We support quality posts, good comments anywhere, and any tags.



Curated by : @soulfuldreamer

Thank you for your support. (:

This post has been featured in the latest edition of Steem News...

Upvoted. Thank You for sending some of your rewards to @null. It will make Steem stronger.

Hello @remlaps :)

I'm @lyh5926, a member of @h4lab.witness who is a witness to Steem Blockchain.

As a Korean Witness, we provide steem Statistics and Haircuts, and steemit Guides on our site, and we also operate a Voting service.

We ran for witness this time and is currently ranked 21st in witness rank with the support of many witnesses.
With your support, we would like to be the main witness in the top 20 and serve as a helpful witness for the Steem Blockchain.
Please vote for @h4lab.witness :)

We are running our dapp service for steemit users. We offer steem statistics (steem blockchain, haircut, user stats), voting service and APR.
(https://h4lab.com/?lang=en)

It also listed on dappradar.
(https://dappradar.com/dapp/h4lab)

And posts daily steem stats on X (https://x.com/H4LAB_twit?t=-FOVir_ZvLb9FiUHWxVCOQ&s=09)

We also now developing Posting Tool & NFT projects.

[H4LAB] Introducing 2024 Our New Project🔥

Witness Vote Link : https://steemitwallet.com/~witnesses

 6 months ago 

Hi, do you run a public API node?

Not yet. However, we will also run API nodes for our project this year, the development of stable posting tools :)
Therefore, we are trying to be the main witnesses and secure financial resources.

[H4LAB] Introducing 2024 Our New Project🔥