RE: [deardiary] 1 day out of 364 does not make for a goodyear , dear
@checky is written in JavaScript using steemjs but I'm sure the same overall logic applies to Python and steempy. Your situation is way trickier than mine though since you actually deal with upvotes and money transfers, I feel like you definitely should do error checking! It's not fun but it's quite essential, especially when money is involved. I don't know how complicated it is in Python though since that's a language I still have to learn, in JS it's quite easy to check for errors.
The version currently running doesn't actually parse the post's body but makes use of the users array added to the post's json_metadata by some apps. The next version though will parse the body to find mentions, I've been testing it for two days now and I feel like it should be ready to be deployed by tomorrow. Mentions found in the post get filtered to make sure that I get the least not Steem related mentions. For example, mentions with a social network related word close to them are filtered out, mentions that end with popular domain extensions like @website.com are filtered out too (because some people use the '@' character to say 'at'), etc.... As for filtering out all existing users, it indeed checks the mentions against a JSON file that contains a user object for each user it encountered while streaming operations, the file fills up pretty quickly. For example, the user object associated to your username is the following (on my computer).
{
"mode": "regular",
"ignored": [],
"delay": 0,
"occurrences": 0,
"mentioned": []
}
There may be over a million accounts but most of them are inactive so since they don't make any operation, they aren't added to the file. Last time I checked, the JSON file on my computer had about 70k usernames but the one used by the bot on the server probably has way more than that. If it can't find the username in the file, it uses the Steem API to lookup the account name (lookup_account_names) and if that API returns null for the account, it means that it doesn't exist and the bot broadcasts a comment. If it returns some data on the account, the account exists so it adds it to the JSON file with the default settings. "occurrences" and "mentioned" don't exist for the version currently running, they are both used for mention correcting on the next version of the bot. I'm using Peter Norvig's spelling corrector algorithm which is a bit of a brute force approach but it works pretty well. "occurrences" is the number of times you were mentioned and "mentioned" is an array of users you mentioned at least once. I may drop that array one day if it makes the file too heavy.