Mining the “Influencers” using Graph Neural Networks (GNN)

7 min readDec 13, 2020

Written by Prasun Biswas and Chandan Durgia

Some anniversaries are a bit sadistic but worth remembering and cherishing. It was 3 years ago in the same month we killed our entrepreneurship dream (may be temporarily).

It had been over one and a half years we were struggling with the professional and more importantly personal lives. We started our journey with big, yet pragmatic, goals in mind and strongly believed our idea was an astounding one (it still is). However, we lacked experience in some of the areas of business management and it was indeed overwhelming to manage so many core functions — business, Finance, HR, IT, Marketing, legal and what not!!

Even the worst of the fables have glimpses of happiness and for us our marketing success was a story which we find worth sharing even today. It included a lot of great, not so common, commonsensical decisions and some decisions which were highly backed by data. The data scientist in us helped in many ways which many of our entrepreneurial peers struggle with.

One of the many marketing decisions we took was finding out the answer to the question — how can we increase our product/services outreach?

Like many other entrepreneurs, we were always short of capital and on the marketing front founded many innovative (importantly cheap ways) which were giving us remarkable success.

Given the nature of products/services we were into we had a unanimous decision to go for a few good influencers and then the search/analysis started. This time we had to pay a considerable amount and therefore the answer had to be anything but remarkable.

Moving fast forward — the analysis was simple but somewhat state of the art at that point and worth the effort given the success it brought along. This article captures our viewpoint on how we went about it and includes some conceptual and coding details. So, here was our somewhat special sauce.

Before we deep dive into our analysis, let’s have a quick overview of Graph Neural Network.

In machine learning landscape, the data points are seldom truly independent. As a matter of fact, there typically is rich information in the connections between data points, which, is voided by the assumption of independence. Imagine the data representing the social network of an individual, as below:

The image represents unstructured arbitrary plots of the social network of a social media user on a vector space layout. Graph Neural Network (GNN) is one of the most efficient ways to represent and analyze this form of data.

Recent research has showed many successful applications of Graph Neural Networks (GNNs) on different tasks like node classifications, link predictions, clustering, and learning high-quality embeddings by allowing deep learning methods to operate on graphs. To understand GNN, it’s important to understand the graph theory. Here is an interesting read to go through the basics:

https://www.analyticsvidhya.com/blog/2018/09/introduction-graph-theory-applications-python/

For the defined geography and our targeted demographics (certain age group), we started with finding a set of people on twitter/Facebook who had interest in our area of services, and that’s how we started with the “universe” graph.

Per our budget availability, we divided the analysis in 2 phases:

1. With less budget: Find one influencer who can maximize the outreach. (In-scope of this article)

2. With more budget: Find a group of influencers who can maximize the outreach. This problem is not the same as #1. As here the objective is to find influencers who collectively can influence the largest possible network. The measure here is “collective influence” which is not the sum of individual influence. (Out-of -scope of this article)

Finding one amazing Influencer

Though the objective sounds simple, however, when we started breaking it down further, we faced the following questions to answer:

1. Should we find the person who has maximum “direct” connectivity with others? “M” below has the highest “direct” connectivity. In graph theory, this is termed as “Degree Centrality”. One can think of ‘Degree Centrality’ as sum of number of followers and followings.

2. Should we find the person who could have fewer direct connections but its average outreach with all the other nodes on the network is the smallest? “K” below though doesn’t have the highest direct connectivity but is on an average closest to all the other nodes in the universe. In graph theory, this is termed as “Closeness Centrality”. Closeness Centrality (Ci) is measured by:

3. Should we find the person who is in the center and act as a connection between various groups of nodes? “H” below is the center of the “universe”, which connects two big groups on either side. In graph theory, this is termed as “Betweenness Centrality”. As one would expect most of the information would essentially pass through this node. Betweenness Centrality (bi) can be defined as:

4. Should we find the person who might not have many connections itself but is connected to other nodes which are highly connected. i.e. this node is connected to multiple high degree nodes? “M” below is connected to multiple nodes which are highly connected to others in the universe. In graph theory, this is termed as “Eigenvector Centrality”

Centrality in layman terms is just a measure of the importance of a node in a graph. When we heard the name centrality the normal measures of centrality in statistics came to our mind and in a way, we were able to relate the above with those measures.

After deep deliberation, we went ahead with Eigenvector centrality as it suited our purpose the best as we wanted to find the person who is connected to other influential nodes with high “Degrees”.

Also, we learnt that Google’s PageRank also used this measure. Our naivety!!

Interestingly, he was one of the top 5 we anyways had in mind. However, every entrepreneur would vouch for this, when something is backed by data it gives much more confidence, especially when a significant amount of spending is at stake.

Luckily, this influencer turned out to be a huge success factor for our revenue growth and we were almost boastful of our analysis and ability to make data driven decisions in marketing.

In neo4j, the same can be modelled as:

CALL algo.eigenvector.stream(Name’, ‘LINKS’, {})

YIELD nodeId, score

RETURN algo.asNode(nodeId).name AS page,score

ORDER BY score DESC.

Finding Multiple Influencers

The next challenge was a bigger one wherein we wanted to find a number of influencers who can maximize our outreach. Again, note that this was just not about finding the top “n” individual influencers which we, anyways, got by our first analysis. For this, we used a number of techniques here Linear Threshold Model (LTM), Independent Cascade Model (ICM) and Weighted Cascade Model (WCM) and this further improved our brand outreach and revenue significantly. (Detailed article on this coming soon !)

Now-a-days finding influencers search has expanded in use cases significantly including product promotion, behavior analysis and viral marketing.

For a small brand like ours, with no capacity to spend on radio ads, TV ads or posters, finding a social media influencer with loyal followers turned out to be much more economical and optimal and led to us surviving the most brutal of the market.

Human biology is as strange as anything could be. While writing this article we still have goosebumps, a silly grin, the disappointment and the happiness all at once. We still remember the lines we read before we concluded, for then, our entrepreneurial journey. “No matter how hard the loss, defeat might serve as well as victory to shake the soul and let the glory out.”

Till we bring out our best again to the world !! Wish you a happy learning experience with us!!

PS: If you are wondering what went wrong, we locked-in our input materials prices at a certain level and soon the prices went down under the gravity of Jupiter and our competitors had a clear advantage. One wrong decision my friend, just one wrong decision !!

Disclaimer: The views expressed in this article are opinions of the authors in their personal capacity and not of their respective employers.

References:

https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/centrality/
Efficient collective influence maximization in cascading processes with first-order transitions (https://arxiv.org/pdf/1606.02739.pdf)
Network Centrality: An Introduction (https://arxiv.org/pdf/1901.07901.pdf)
Machine Learning Techniques for brand-influencer Matchmaking on the Instagram Social Network (https://arxiv.org/pdf/1901.05949.pdf)
Theories for influencer identification in complex networks (https://arxiv.org/abs/1707.01594)