The incredible advancement of machine learning (ML) technologies has outreached a variety of business use cases.  One such use case is, assignment of priority and risk levels to business tasks and issues, in systems such as those that handle IT support tickets, Retail customer care complaints and so on. Figuring out how important something is (the priority level) or what are the risks involved (the risk rating) is always a key factor in ensuring efficient management of tasks in these systems. ML techniques offer value by looking for hidden patterns within the data and providing insights undetectable to even the keenest of analysts. 

Take the case of an IT support ticketing system.  The tickets are generally assigned priorities by the IT professional handling the ticket. They assign priority based on their assessment of the business impact and it could vary from person to person.  Although they might be following certain guidelines, the priority assignment may not be consistent across the organization. An ML approach offers an alternative, near real-time, more standardized and automated solution.  It identifies patterns between ticket characteristics and the potential business impact of the issue underlying the ticket.  Ticket characteristics could include the description of the issue, department, designation of employee, and ticket or case type.  These patterns are used to predict each ticket’s priority level.

Technical Approach 

Here’s a visualized workflow of building an IT ticket priority assignment system:

Fig. 1. Workflow of Priority-Setting Using Machine Learning 

IT ticket information consists of structured and unstructured data.  Structured data include things like ticket type, employee designation, department and issue type. It is organized and has a specific, pre-defined format.  Unstructured data on the other hand, could include issue description and notes pertaining to the ticket.  Unlike structured data it is unorganized and doesn’t have a pre-defined format.   

Preprocessing

First step in the workflow is to preprocess the data before feeding it to the model. Preprocessing differs for structured and unstructured data.

Structured Data
  1. Mean/ Mode Imputation – N/A and NULL or blank values in the data are treated using mean (average) or mode values, calculated statistically for each feature or column
  2. Categorical Encoding and Normalization – Features especially those containing categories are converted to numeric values and are then scaled –.
  3. Unwanted and redundant features such as date, name, id and so on, are removed to improve the efficiency of the model.
Unstructured Data
  1. Tokenization – Raw text is split into words or group of words called tokens.
  2. Stop Word Removal – Unwanted tokens or words like “the”,“is”, etc. are then removed using stop word libraries
  3. Lemmatization – Suffixes or other inflected forms of token are then converted to root of that token (For example jumping, jump, jumped -> jump)
  4. TFIDF Calculation – A new input dataset is created with tickets as observations and tokens as features. Tokens have values indicating the normalized frequency of the token on a given ticket    
Clustering

After preprocessing, structured and unstructured dataset are joined at ticket or issue ID to create data inputs for a clustering algorithm (like K-Means Clustering).  Clustering algorithms group data points together by figuring out the underlying patterns and hidden relationships between them. 

The output of K-Means consists of cluster labels for each observation (cluster #1, cluster #2, etc.).  Clusters are then converted into priority levels (low, medium, high, etc.) using business domain knowledge.

           Fig. 2. Grouping Data Points Using a Clustering Algorithm   

K-Means works by grouping data into K groups. K is either fixed in advance or optimized using the Elbow Method. The Elbow Method selects the number of clusters. This maximizes the similarity between data points within each cluster without creating more clusters than what is required.  Total variation within clusters is graphed and the elbow point is identified as the optimal number of clusters.   

            Fig. 3. Optimizing the Number of Clusters

K-Means helps us figure out priority levels for the input data (tickets). Priorities are used as target variable Y in building an online model from input feature set X.  Thus, the problem is translated from an unsupervised learning to a multi-class classification problem

Model Evaluation & Deployment

The trained model is evaluated against test data and ticket data that was not used in building it.  Test data is fed into the model and priority levels are outputted.  These priorities are compared against the test data’s actual priority levels as determined by K-Means.  If the model evaluation is satisfactory, then it is deployed.  The online model generalizes to unseen data by updating results in near real-time, and then displays a priority level for the new ticket or issue.

Fig. 4. Model Deployment Process

Challenges

The success of a priority classification system is contingent upon overcoming several challenges.  Firstly, figuring out the number of desired priority levels is necessary for setting the number of clusters. Each cluster is converted to a priority level, so having more priority levels than clusters, won’t work. 

Secondly, priority classification heavily relies on the quality of K-Means Clustering results. Suppose a cluster is poorly formed because of lumping together observations that are too dissimilar. Then, the assigned priority level for that cluster will likely not reflect that actual level of risk for many of the observations within that cluster

Finally, clusters must be correctly converted to their appropriate priority levels or else the classification model will predict the wrong priority levels. Even if observations are similar within each cluster, if the wrong priority level is assigned to each cluster, then you

Conclusion

This ML approach for assigning priority/risk is scalable and generic enough to be applied to other domains with case management use cases, such as retail customer service, employee management, internal IT support, and customer-facing product support. Strong customer satisfaction is necessary in today’s world, helping triage their issues swiftly and appropriate is a big first step. This can lead to downstream improvements in key metrics (such as NPS) and help provide quantifiable justification for investing in AI. 

Manage your risk, the ML way!

Get In Touch

Related Content

Blog

Read our thoughts on organizing unstructured data in healthcare

Streamlining Operational Processes by Moving to the Cloud

We helped a leader in behavioral health reduce payment processing efforts by 50% by utilizing the cloud

Investing to Help Their Customers Make Data-Driven Decisions

A workforce management company partnered with us to strategically enhance their analytics offering