Emails are a critical element of internal and external business communications, and inefficiencies in tracking and management, can be crippling to the organization. Combining data analytics and natural language processing (NLP), Capco were able to build a solution that would optimize processes and free up employees to work on more complex tasks.
Email is universally used by all businesses, their customers and suppliers. It is an official communications tool, but it was never designed for business. And in 2018, over 300 million emails were sent or received within one tier 1 global investment bank’s operations function.
Therefore, we wanted to create a solution that could uncover insights, increase automation, identify potential risks and enable organizations to operate in a more cost effective and secure way.
To extract opportunities for operational improvement, effectiveness and cost reduction, we followed these four key steps:
1. Obtain the data set
The first step was to gather the data. As we didn’t have access to a client’s dataset, we appropriated real open-source data, including 500,000+ emails from 150 employees.
2. Data cleaning
Before applying data analytics, we used NLP to identify common linguistic patterns and key words for the algorithm to ignore.
3. Automated discovery
The next step was to use the NLP method of Latent Dirichlet Allocation (LDA) to extract the main topics from the dataset.
4. Data analytics
We built a dashboard to house the real-time data from the email dataset. This allowed us to visualize the relationship between email ID and specific topics, alongside ID network and behavioral trends.
Capco were able to combine NLP with data science, to read and understand a test email dataset, and the specific data patterns within them. Once we had built the dashboard, we were able to: