How to launch your AI startup in 30 days? Register for free here

Recommendations systems with SQL: The complete guide

Recommendation systems are crucial in today’s digital landscape, helping users discover new products, services, or content based on their preferences and behaviour. These systems leverage advanced algorithms to analyze user data and generate personalized recommendations. While there are several approaches to building recommendation systems, SQL (Structured Query Language) can be a powerful tool for implementing various aspects of the recommendation process.

This comprehensive guide will explore how SQL can be used to develop recommendation systems. We will cover the choice of algorithms, dataset composition, and implementation techniques, providing a solid understanding of leveraging SQL for building effective recommendation systems.

The Process

Several key steps must be followed to build a recommendation system using SQL. Let’s take a closer look at each of these steps:

Choice of Algorithms: The first step in building a recommendation system is to select suitable algorithms. Depending on the nature of your dataset and the specific requirements of your application, you can choose from a range of algorithms. Some common choices include:
- Popularity: Recommending popular items based on metrics such as page views or ratings.
- Context: Recommending items based on associations, such as “things bought together” or user preferences.
- Relevance: Utilizing more advanced techniques like matrix factorization to identify latent factors and provide personalized recommendations. We will not discuss advanced topics here but in the second part of this article. You can subscribe to our newsletter to stay up to date.
Dataset Composition: The composition of your dataset is crucial for training and evaluating your recommendation system. This involves collecting and organizing user interactions, item attributes, and other relevant data. SQL can be used to define database schemas, import data, and perform data preprocessing tasks to ensure the dataset is ready for analysis.
Implementation: The implementation phase involves translating the chosen algorithms into SQL queries. SQL allows you to retrieve and manipulate data efficiently, enabling you to calculate similarities, apply filtering techniques, and generate recommendations. You can leverage SQL’s power to join tables, aggregate data, and perform calculations to implement different recommendation strategies effectively.

Following these steps, you can develop recommendation systems using SQL that provide valuable and personalized recommendations to users, enhancing their overall experience.

In the following sections of this story, we will delve deeper into each step, providing concrete examples and best practices for implementing recommendation systems with SQL.

So, let’s dive in and explore the world of recommendation systems and SQL, unlocking the potential to deliver tailored recommendations to your users.

Simple Algorithms

Trivial solution. SQL (Popularity)

You can implement simple recommendation algorithms using basic SQL queries. These algorithms often rely on straightforward metrics or heuristics to suggest items to users. While they may lack the complexity and personalization of advanced recommendation techniques, they can still be useful in many scenarios.

One example of a trivial recommendation algorithm is recommending items based on their popularity or recent popularity metrics. For instance, you can use a query like the following:

SELECT * FROM articles ORDER BY page_views DESC

It retrieves a list of products sorted by the number of page views. This query suggests popular items to users based on the assumption that products with higher page views are more likely to be of interest. For example, if your best-selling product is iPhone 15, users are most likely to want to see the iPhone 6.

Similarly, you can modify the query to consider other metrics such as ratings, sales, or reviews. For instance, you could use the following.

SELECT * FROM products ORDER BY average_rating DESC

It recommends products based on their average rating, assuming that highly rated items are more likely to be preferred by users.

These trivial recommendation algorithms are easy to implement in SQL and can provide simple recommendations based on readily available metrics. However, they may not capture personalized preferences or consider the user’s context. Thus, while they can serve as a starting point or supplement to more advanced approaches, it’s important to consider their limitations when building recommendation systems.

Custom methods based on age and popularity

When building recommendation systems, it’s often beneficial to incorporate the age of products alongside popularity metrics such as page views or ratings. Considering the age of a product can help balance the recommendation by giving newer items a chance to be suggested to users, even if they may not have accumulated high popularity metrics yet. Combining popularity and age can create a more comprehensive and dynamic recommendation algorithm.

To incorporate age into your recommendation algorithm, you can modify the SQL queries to include a weighting factor that combines popularity metrics with the age of the products. Here’s an example that demonstrates how you can achieve this:

-- Recommend products based on popularity and age
SELECT *
FROM products
ORDER BY (page_views / POWER(DATEDIFF(NOW(), release_date), 2)) DESC;

In the above query, we divide the popularity metric (e.g., page views) by the square of the number of days since the product’s release. By taking the square of the days since release, we give more weight to recent products while still considering popularity. This way, newer products can be recommended even if they haven’t accumulated many page views yet.

You can apply a similar approach when considering average rating along with product age:

-- Recommend products based on rating and age
SELECT *
FROM products
ORDER BY (average_rating / POWER(DATEDIFF(NOW(), release_date), 2)) DESC;

In this case, we calculate the weighted score by dividing the average rating by the square of the days since release. This approach ensures that newer products can be recommended based on their rating, accounting for the age factor.

By incorporating the age of products into your recommendation algorithm, you can provide users with a more balanced and up-to-date set of recommendations. However, it’s important to adjust the weighting factor and consider the specific dynamics of your product catalogue to achieve optimal results.

Remember that the examples provided here are just one way to combine popularity metrics and age in SQL queries. To fine-tune the recommendation results, you can customize these queries based on your requirements and experiment with different weighting factors or mathematical functions.

Context-based solutions ( Bayesian probability and lift )

Based on user behaviour and historical data, bayesian probability and lift are statistical measures commonly used in recommendation systems to assess the likelihood of an item being recommended. By incorporating these measures into your recommendation algorithm, you can make more informed suggestions to users, considering both popularity and the association between items.

To incorporate Bayesian probability and lift into your recommendation algorithm, you can use SQL queries that calculate these measures and adjust the ranking accordingly. Here’s an example that demonstrates how you can achieve this:

-- Recommend products based on Bayesian probability and lift
SELECT p.*
FROM products p
JOIN (
    SELECT p1.product_id, (COUNT(*) / (SELECT COUNT(*) FROM purchases)) * (SELECT COUNT(*) FROM purchases WHERE product_id = p1.product_id) / (SELECT COUNT(*) FROM purchases WHERE user_id = 'target_user_id') AS bayesian_probability
    FROM purchases p1
    WHERE p1.user_id != 'target_user_id'
    GROUP BY p1.product_id
) bp ON p.product_id = bp.product_id
ORDER BY bp.bayesian_probability DESC, (bp.bayesian_probability / (SELECT COUNT(*) FROM purchases WHERE product_id = p.product_id)) DESC;

In the above query, we calculate the Bayesian probability for each product by considering the ratio of purchases involving the product and the total number of purchases made by the target user. This probability reflects the likelihood of a user purchasing a particular product given their historical behaviour.

Furthermore, we incorporate the lift concept by dividing the Bayesian probability by the ratio of the overall purchases involving the product. This adjustment helps account for the item’s base popularity and measures how much the product stands out compared to the average popularity.

By ordering the results based on the Bayesian probability and lift, the query generates recommendations considering the association between items and their relative popularity.

It’s important to note that the example provided here assumes a purchase history as the basis for the recommendation. Depending on the available data and the specific context of your recommendation system, you can adapt this approach to other interaction types, such as ratings, views, or clicks.

Complex process

Recommendation systems can be implemented using SQL and other programming languages or tools. Here are a few approaches to building recommendation systems with SQL:

Collaborative Filtering:

User-Based Collaborative Filtering: Create a user-item matrix in SQL, where each row represents a user and each column represents an item. Calculate the similarity between users based on their interactions with items (e.g., ratings, purchases). Recommend items that similar users have interacted with.


-- Calculate user-user similarity based on ratings
SELECT u1.user_id, u2.user_id, AVG(u1.rating * u2.rating) AS similarity
FROM ratings u1
JOIN ratings u2 ON u1.item_id = u2.item_id
WHERE u1.user_id != u2.user_id
GROUP BY u1.user_id, u2.user_id;

-- Recommend items for a specific user
SELECT r.item_id, AVG(r.rating) AS average_rating
FROM ratings r
JOIN (
    SELECT user_id, item_id, similarity
    FROM user_user_similarity
    WHERE user_id = 'target_user_id'
    ORDER BY similarity DESC
    LIMIT 5
) s ON r.user_id = s.user_id
WHERE r.user_id != 'target_user_id'
GROUP BY r.item_id
ORDER BY average_rating DESC
LIMIT 10;

Item-based Collaborative Filtering: Similar to user-based filtering, but instead of finding similar users, you identify similar items based on user interactions. Recommend items similar to those a user has already interacted with.

-- Calculate item-item similarity based on ratings
SELECT i1.item_id, i2.item_id, AVG(i1.rating * i2.rating) AS similarity
FROM ratings i1
JOIN ratings i2 ON i1.user_id = i2.user_id
WHERE i1.item_id != i2.item_id
GROUP BY i1.item_id, i2.item_id;

-- Recommend items for a specific user
SELECT r.item_id, AVG(r.rating) AS average_rating
FROM ratings r
JOIN (
    SELECT item_id, similarity
    FROM item_item_similarity
    WHERE item_id IN (
        SELECT item_id
        FROM ratings
        WHERE user_id = 'target_user_id'
    )
    ORDER BY similarity DESC
    LIMIT 5
) s ON r.item_id = s.item_id
WHERE r.user_id != 'target_user_id'
GROUP BY r.item_id
ORDER BY average_rating DESC
LIMIT 10;

Content-Based Filtering:

Create a table that stores item attributes and features (e.g., genre, keywords, tags) and their corresponding item IDs. Use SQL queries to calculate item similarity based on these attributes. Recommend items similar to those a user has previously shown interest in.

-- Calculate item-item similarity based on attributes
SELECT i1.item_id, i2.item_id, COUNT(*) AS similarity
FROM item_attributes i1
JOIN item_attributes i2 ON i1.attribute = i2.attribute
WHERE i1.item_id != i2.item_id
GROUP BY i1.item_id, i2.item_id;

-- Recommend items for a specific user
SELECT i.item_id, AVG(r.rating) AS average_rating
FROM ratings r
JOIN (
    SELECT item_id, similarity
    FROM item_item_similarity
    WHERE item_id IN (
        SELECT item_id
        FROM ratings
        WHERE user_id = 'target_user_id'
    )
    ORDER BY similarity DESC
    LIMIT 5
) s ON r.item_id = s.item_id
WHERE r.user_id != 'target_user_id'
GROUP BY r.item_id
ORDER BY average_rating DESC
LIMIT 10;

Hybrid Approaches:

Combine collaborative filtering and content-based filtering. Use SQL to calculate user-user and item-item similarities based on interactions and item attributes. Combine the results to generate recommendations.

-- Calculate user-item similarity based on ratings
SELECT u.user_id, i.item_id, AVG(u.rating * i.attribute_similarity) AS similarity
FROM ratings u
JOIN (
    SELECT i1.item_id, i2.item_id, COUNT(*) AS attribute_similarity
    FROM item_attributes i1
    JOIN item_attributes i2 ON i1.attribute = i2.attribute
    WHERE i1.item_id != i2.item_id
    GROUP BY i1.item_id, i2.item_id
) i ON u.item_id = i.item_id
WHERE u.user_id != 'target_user_id'
GROUP BY u.user_id, i.item_id;

-- Recommend items for a specific user
SELECT r.item_id, AVG(r.rating) AS average_rating
FROM ratings r
JOIN (
    SELECT item_id, similarity
    FROM user_item_similarity
    WHERE user_id = 'target_user_id'
    ORDER BY similarity DESC
    LIMIT 10
) s ON r.item_id = s.item_id
WHERE r.user_id != 'target_user_id'
GROUP BY r.item_id
ORDER BY average_rating DESC
LIMIT 10;

Association Rules:

Mine association rules from transactional data in SQL (e.g., “Customers who bought item A also bought item B”). Recommend items based on these association rules.

-- Mine association rules
SELECT lhs.item_id AS item_a, rhs.item_id AS item_b, COUNT(*) AS support
FROM transactions lhs
JOIN transactions rhs ON lhs.transaction_id = rhs.transaction_id
WHERE lhs.item_id != rhs.item_id
GROUP BY lhs.item_id, rhs.item_id
HAVING support >= 10;

-- Recommend items based on association rules
SELECT rhs.item_id, COUNT(*) AS support
FROM transactions lhs
JOIN transactions rhs ON lhs.transaction_id = rhs.transaction_id
WHERE lhs.item_id = 'target_item_id'
  AND rhs.item_id NOT IN (
      SELECT item_b
      FROM association_rules
      WHERE item_a = 'target_item_id'
  )
GROUP BY rhs.item_id
HAVING support >= 10
ORDER BY support DESC
LIMIT 10;

Matrix Factorization:

Use SQL to create a matrix factorization model by factorizing the user-item matrix. This approach involves decomposing the matrix into lower-rank matrices to find latent factors. Generate recommendations based on the factorized matrices.

Matrix factorization typically involves more complex mathematical operations and iterative algorithms, making it challenging to implement solely in SQL. Using a programming language or a machine learning framework for this method is recommended.

The next part of this tutorial will be about implementing those algorithms with python, numpy, Keras and Tensorflow for a more robust and accurate outcome. You can subscribe to our newsletter here to be notified.

Let's Innovate together for a better future.

We have the knowledge and the infrastructure to build, deploy and monitor Ai solutions for any of your needs.