Fraud Detection with Splunk

For enterprise organisations, effective fraud detection and prevention is a critical aspect of daily business operations.

Fraud can manifest in many forms, including financial fraud, identity theft, and cybercrime.

The impact of fraud goes further than just monetary losses: it damages reputations, erodes trust, and can potentially incur regulatory penalties if not enough is done to prevent it.

The good news is that you can leverage your organisation's existing analytical tools, like Splunk, to detect and prevent fraud, identify anomalous activities, and mitigate potential financial losses.

This comprehensive guide focuses on equipping you with the knowledge and techniques necessary to detect and prevent fraud using Splunk.

Is Splunk an appropriate tool to use for fraud detection and prevention?

Yes! Splunk is a powerful tool for fraud detection and prevention. It can ingest, process, and analyse large volumes of diverse data, including fraud data, from various sources in real-time.

That being said, Splunk is not designed to handle every stage of your fraud detection pipeline. That’s because while Splunk excels at ingesting and analysing fraud data, it depends on external data sources.

The 3 stages of fraud detection

In general, fraud detection is carried out across several discrete stages:

Data collection and ingestion (via data sources such as a fraud detection agent). This is the stage where you will generate the data that you will use to detect fraud in Splunk. Typically, organisations don't use Splunk at this stage, since Splunk is designed to ingest data collected from data sources. Instead, organisations will typically use a fraud detection agent like Antifraud (offered by Cosive) as a data source, which collects dozens of behavioural and device fingerprinting signals about your users, then ships these to Splunk for ingestion.
Automated analysis (Splunk). This is where Splunk shines. Enriched with thousands (or millions) of datapoints from your fraud detection agent, Splunk can perform automated anomaly detection on ingested fraud data. We'll cover how to set-up these automated analysis processes later on in this guide.
Automated and manual response (Splunk). Splunk can generate alerts for human analysts, or make API requests to connected services (as part of its "alert actions" feature) to trigger an automated fraud response, such as blocking a specific user or transaction.

Splunk’s fraud detection features

Here are some of Splunk's features that make it a great fit for handling automated analysis and alerting for fraud teams:

Data aggregation and correlation. Splunk aggregates data from numerous sources, including fraud detection agents, servers, applications, networks, and security devices. By correlating this data, it can identify patterns, anomalies, and trends that may indicate fraud.
Real-time monitoring. An effective fraud response must be timely. Splunk allows for real-time monitoring and analysis, enabling instantaneous detection of potential fraud as it occurs. It can also be used to trigger a real-time automated response.
Customizable dashboards, alerts, and automated triggers. Users can create custom dashboards in Splunk to visualise relevant data and metrics related to fraud detection.
ML and AI features. Splunk offers machine learning and AI capabilities to improve the accuracy of your fraud detection program by learning from historical data and identifying fraud trends
Scalability. Splunk can scale as needed to accommodate progressively larger volumes of data.
Ad hoc queries and analysis. Fraud analysts can perform ad hoc queries in Splunk to augment your organisation’s automated response with manual review of specific incidents or areas of concern.
Integration with third-party tools. Splunk integrates with many other tools useful for fraud detection, including other security/SIEM tools, threat intelligence feeds, and databases.
Comprehensive auditing and reporting capabilities. Many industries have stringent compliance and regulatory requirements related to fraud prevention and reporting. Splunk's comprehensive auditing and reporting capabilities make it a good fit for helping you maintain compliance.

Splunk's impressive capabilities in data aggregation, real-time monitoring, customisation, machine learning, scalability, integration, compliance support, and historical data analysis make it an excellent tool for fraud detection and prevention across many different industries and organisational contexts.

Does Splunk do everything needed to detect and respond to fraud?

It’s important to note that Splunk cannot detect fraud without first ingesting data collected from elsewhere, such as a fraud detection agent.

A fraud detection agent is a script that runs in the background of user sessions (typically in the browser or embedded in an app) with the purpose of gathering and sending data relevant to fraud detection to your log analysis platform.

While there are fraud detection tools on the market combining both data collection and analysis capabilities, these are often not suitable for organisations already using (and paying for) Splunk.

You may end up paying twice for, essentially, the same functionality duplicated across two tools.

Instead, we recommend letting Splunk’s strengths shine by pairing it with a fraud detection agent designed to integrate seamlessly with Splunk, such as Antifraud.

This means your fraud detection pipeline will rely on two tools working together: 1) a fraud detection agent (or multiple agents) for data collection and 2) Splunk, for data ingestion and analysis.

Building a fraud detection and response pipeline with Splunk

Step 1: Conduct a fraud risk assessment

The purpose of a fraud risk assessment is to understand the various types of fraud that pose the greatest risk to your organisation.

For example, the types of fraud faced by a bank (such as money laundering, account takeover, and ATM skimming) will be very different to the types of fraud faced by a large eCommerce website (such as payment fraud, return fraud and gift card fraud).

A fraud risk assessment will help you allocate your fraud prevention efforts and investment with the types of fraud that present the greatest risk to your organisation.

Start by reviewing your organisation’s historical fraud data to identify patterns and common fraudulent activities.
Assess your internal processes and systems to pinpoint potential weak points.
Collaborate across multiple departments to gather insights into potential fraud risks specific to their areas of operation. This is particularly critical in enterprise organisations which may be involved in many different types of business activities, such as a bank which offers bank accounts, credit cards, home loans, and insurance.
Identify emerging fraud risks by consulting industry reports and studies, relevant news and publications, and engaging with industry groups.

Step 2: Collect data associated with your top fraud risks

The collection stage of fraud detection is typically handled by a fraud detection agent. This agent is responsible for automatically collecting relevant fraud detection signals from your website or app, and shipping them to Splunk in an easily digestible format (typically JSON).

It’s important to choose an agent that collects signals that are most relevant to your top fraud risks.

For example, Antifraud collects data related to the types of fraud typically faced by financial institutions, such as banks, insurance providers, and FinTech companies.

One of the biggest concerns for these types of companies is account takeover (ATO) attacks, in which an unauthorised person gains access to an account using methods such as phishing, credential stuffing, or social engineering.

ATO attacks can be detected by identifying anomalies in a user’s behaviour (e.g. access time, access location, or interactions with the system) or their device fingerprint (e.g. operating system, browser, or device architecture) which could suggest an unfamiliar individual is using the account.

Therefore, major banks use Antifraud to ship dozens of these behavioural and device signals to Splunk, handing this data over for processing using Splunk’s powerful anomaly detection capabilities.

How to configure your new fraud detection data source in Splunk

The steps to configure your fraud detection data source will depend on the method you’re using to collect fraud data.

Navigate to data inputs in Splunk. Click on the "Settings" menu (gear icon) in the top right corner. Under "Data" or "Data inputs," select the type of data source you want to configure (e.g., Files & Directories, TCP/UDP, HTTP Event Collector, etc.).
Select your data source type. Choose the appropriate data source type as per the instructions for your fraud detection agent, such as "Files & Directories" for file-based data or "TCP/UDP" for network-based data.
Configure your data source. Each data source type will have specific configurations. For example, for files & directories: Specify the path to the data source (file or directory) you want to monitor. Configure other settings such as sourcetype, index, input settings, and other relevant options. For TCP/UDP: Specify the port and protocol for the data source. Configure sourcetype, index, source, and other relevant option
Configure sourcetype. Define the sourcetype, which determines how Splunk processes and categorises the data. You can use a predefined sourcetype or create a custom one.‍
Configure the index. Assign the index where you want to store the data from this data source. You can use an existing index or create a new one.‍
Set source and host values. Configure the source and host values to categorise and identify the data. Typically, the source is the file or data input, and the host is the origin of the data.‍
Save and verify. Save your new data source configuration.

Step 3: Verify that Splunk is correctly ingesting data

You are likely already using Splunk for log analysis, so once you ship data from your fraud detection agent into Splunk you’ll need to verify the new data is being ingested correctly alongside your existing logs.

Check the source configuration in Splunk settings. In Splunk’s settings menu, navigate to “Data” and find the data source you configured. Verify that the configuration details are accurate, including the data source path, input settings, sourcetype, and host.
Search for your newly ingested data. Use the Splunk search bar to query for the data from the new data source. You can use a search query like: sourcetype="your_sourcetype" to filter the data from the specific sourcetype. Make sure the search results include the expected data from the new data source.
Inspect the indexed data. Run a search query and view the raw data or parsed fields to ensure that the data is correctly indexed and structured. Use the "Fields" sidebar in the search results, check whether the expected fields are being extracted from the data. Check that timestamps in the data are correctly extracted and indexed, as this is crucial for time-based searches and analytics.
Check for parsing errors. Look for any parsing errors or warnings in Splunk. You can address any parsing errors by adjusting the sourcetype configuration or updating your parsing settings.
Perform sample comparisons. Compare a sample of the ingested data with the source data to verify its accuracy.

Step 4: Configure your fraud analysis workflows in Splunk

Now that you’ve verified that Splunk is correctly ingesting your fraud detection data, it’s time to leverage Splunk’s fraud detection and response capabilities to set your fraud prevention program in motion.

Anomaly detection

The foundation of Splunk’s fraud detection capabilities are its anomaly detection features.

Anomaly detection in Splunk involves identifying patterns, events, or data points that deviate significantly from the expected or normal behaviour within a dataset.

The objective of this process is to automatically detect unusual activity, which may indicate potential fraud.

An anomaly detection example:

Imagine a hypothetical user who likes to review their retirement fund balance over the weekend newspaper every Sunday at approximately 10am from their home in Melbourne, Australia.

Splunk will begin to associate these features (a pattern of time and location) with the user.

One day, logs are ingested for the same user showing access at 4am from an IP address in Hong Kong.

This is likely to be picked up as an anomaly because these features don’t match the typical pattern associated with that user.

What is less clear is the cause of this anomaly. Does this indicate an account takeover, or is the user simply jetlagged after flying to Hong Kong for business?

This is where enriching Splunk’s automated detection with manual oversight from fraud analysts can prove extremely useful.

This example can also be extended to demonstrate the power of combining multiple behavioural and device signals together.

For example, if the account was not only being accessed at an unusual time (4am) and location (Hong Kong), but also using an unfamiliar browser and device, and a faster than typical typing speed, the scales may tip toward a possible ATO.

This is why it’s important to use fraud detection software that collects a myriad of device and behavioural signals to help your analysts more accurately distinguish unusual but legitimate behaviour from true fraudulent activity.

How to configure anomaly detection in Splunk

Splunk have created their own app that provides functionalities to create, train, and apply anomaly detection models to your data without requiring your team to have an ML or data science skill set.

One of the benefits of this app is that it uses an anomaly detection algorithm called ADESCA which is well-suited for use with time series data (such as logs).

To get started, first, download and install the Splunk App for Anomaly Detection from Splunkbase.

Next, create a new job using the app. Add your fraud detection dataset and select the field you want to mark for anomaly detection. You can also configure the detection sensitivity level for this field. For stable fields that don’t change often (such as the user’s operating system) you may want a high sensitivity. For fields with a large amount of variance, such as time, you may want to select a lower sensitivity.

The best way to check the appropriateness of the sensitivity level you’ve selected is to click ‘Detect Anomalies’ and review the resulting data, observing how many false positives are generated.

Note that while false positives are typically much more visible than missed detections, missed detections are just as important to consider--if not more so. ‍

Ideally, you will run a test detection on a known dataset where you've previously identified fraudulent activity. This will help you avoid both missed detections and false positives.

Finally, you can ‘Save Job’ and schedule it to run at set intervals from the Job Dashboard.

Splunk UBA

For more complex anomaly detections you may want to consider Splunk’s User Behaviour Analytics (UBA) product, which can stitch multiple anomalies together to accelerate the detection of common fraud profiles. This tool automates aspects of fraud detection which might otherwise require custom development using ML techniques.

Machine Learning Toolkit (MLKT)

Splunk also offers a free Machine Learning Toolkit app where you can configure your own custom machine learning pipelines and detections for fraud detection, such as outlier detection. However, using this app will require knowledge of ML techniques.

Step 5: Build an alerting system for possible fraud

It's easy to create alerts based on detected anomalies and outliers, either in real-time as they come in, or on a scheduled basis (batching anomalies together). As a general rule, fraud detection and response is best done in real-time where possible.

There are three main aspects to consider when configuring an alert:

Search. Configure the SPL query you’ll use as the basis for your alerting.
Trigger Conditions. Specify the condition a search result must match to trigger an alert.
Trigger Actions. Specify what happens when a trigger condition is met; from sending an email, to outputting results to a telemetry endpoint. Trigger actions enable you to send alerts to human fraud analysts for manual review. You can also use trigger actions to automate your fraud response by running a script or firing a webhook to trigger custom business logic, such as triggering a 2FA challenge for the user.

A fraud detection alerting use case

A common field for anomaly detection in banking is transaction amount.

That's because most of us make transactions of a similar size, at a similar cadence. For example, these might include our rent or mortgage payments, utility bills, or recurring subscriptions.

Fraudulent transactions often deviate from the user’s typical transaction pattern - in particular, they may be much larger than the user’s typical transaction volume, as fraudsters attempt to quickly move large amounts of money out of the account. This makes fraudulent transactions a good candidate for anomaly detection.

Imagine that we have set up anomaly or outlier detection on the "transaction amount" field. Next, we could create two different alert rules based on how much the outlier deviates from what we expect for the user:

Alert 1: For outliers less than two standard deviations from the mean, this alert will trigger a Splunk message intended for human analyst review.

Alert 2: For outliers greater than two standard deviations from the mean, this alert will trigger a script that sends an SMS to the user notifying them of the transfer.

As you can see, the power and flexibility of Splunk alerts means they’re capable of forming the basis of both your manual and automated fraud response strategy.

Talk to us about fraud detection with Splunk
‍
We are a full service consultancy with deep experience building fraud detection and response workflows using Splunk.

‍Reach out to us for a no-obligation initial chat to discuss your fraud prevention goals and get advice on the best way to leverage Splunk as part of your fraud detection program.

We can also provide you with more information on Antifraud, our fraud detection agent designed to integrate seamlessly with Splunk.