0 notes &
Better Identica Stats
Over the past 24 hours I’ve been working on a way to generate a list of legitimate active Identica users; of course the problem is automatically telling spam from non-spam. However, as in any social network the only things that are any good at telling spam from non-spam are the users - Real users don’t follow spam.
Thus, by starting with one user that follows are fairly large group of people, adding all their followers to a list then adding all of their followers to a list, we get a list of what should be most of the real Identica users. I quickly discovered that there is a problem with this when scanning accounts that do or have auto-followed because they are following spam accounts, there is no way to discern if an account is auto-following through the API, however by blacklisting a few known auto-following accounts and refusing to scan accounts with over 500 followers I’ve brought this problem to almost non-existence.
So I started with my account at the root, I’m following 283 people and I’ll make a conservative estimate that the average legitimate user follows around 150 users (I will run some analysis on my results set to find what it really is, but it takes a while). This came up with 9167 unique users, but to create a more useful result set I whittled this down to the users that have been active in the last month: 3736.
I couldn’t really speak for degrees-of-separation on Identica, but due to the relatively small number of users on and the fact that 283 * ~150 is 42,450 but only yielded 9167 unique users I don’t think any further recursive depth is needed at the moment especially seeing as there is a real issue with time complexity due to making so many server requests. Having myself at the root of the calculations probably isn’t the best solution, so I’ll be having a look around to find a user who is following the most non-spam users.
Having a list of Identica’s active non-spam users is extremely useful for the calculation of statistics, for example by regularly running the scripts I’ll be able to calculate a percentage growth figure for active usage or by tracking newcomers to the list I’ll be able to suggest new legitimate users and more importantly track the percentage of new users that stick around. These statistics will be available shortly when a) I’ve got some hosting b) I’ve finished coding.
Due to the nature of how these statistics are calculated there are things that Identica users can do to make the results as accurate as they can be:
- Take care not to follow spam accounts.
- Send reports of auto-following/spam-following accounts to me so I can blacklist them.
- Follow new users :-) (This will be easier when I can list some!).
My eventual plan is to create a recommendation/suggestions/featured engine for Identica but I’m still working out the designs for this.
My scripts are all open-source but as my projects are rated Freedom on the freedom, Freedom, FREEDOM!!1 scale I’m not going to release the newest updates until they’re in better and more efficient shape. The new code will soon show up here where you can currently see the code for my real-time search aggregation of beginnings of the suggested/featured front-end.