Movie-Recommender-System

 In recent times, technology has been fast evolving for humanity, and it’s become widespread to be constantly faced with various tasks that require devoting a lot of time to finish accurately.  Incorporating the process of Automation to tackle these everyday tasks has contributed to the growth of industries in their various fields. One such automation is a 'recommendation system’. With the power of AI, this simple recommendation system has been built and embedded in our device applications and various platforms by combining different natures of data models i.e While surfing Instagram we see ads on items that we must have looked up in our google search, or how YouTube and Netflix uses a recommendation system to automatically suggest movies and videos to users based on their past preference providing users with unlimited options to select from to their maximum satisfaction.


Recommendation systems are of different types:

Content-based,

Popularity based,

Collaborative filtering

Hybrid


Content-based

Content-based recommender systems were the 1st type of recommender systems, they predict ratings based on the content of the product. Products with similar content or features are recommended to the user. While this method performs excellently well in recommending items that will suit the user based on historical data it still has its shortcomings. i.e. Lack of historical user data leading to the issue of 'Cold-start' for new users, sparsity and the tendency to create a filter bubble by constantly recommending already consumed items are a few of its limitations.


Popularity based

 Here the principle of popularity is used. The system checks for the most popular product or content and recommends it to the user. An example of this is YouTube’s trending section. Where trending videos are recommended to users. The advantage it has over Content Based recommenders is that there is no need of having the user’s historical data. One major drawback with this approach is that the user's individual preference is not considered in the recommendation and there is a high chance of the user not liking the trending product.

Collaborative filtering (CF)

Using this method, the preference of similar users is considered. For example, if user A and user B both likes movie X and user A also likes movie Y we can recommend movie Y to user B.CF can be-

1. User-based: Basically, measures the degree of similarity between the target user and other users.

2. Item-based: Measures the similarity between items the user must have rated or interacted with and other items.


Besides the issue of Cold-start and scalability CF also suffers from sparsity which happens as a result of having a large number of objects in a collection and having users who rate only a small part of the collection. This is a big issue as the system will obviously favour mainstream items, without focusing on other items. This led to the development of a Hybrid model.


3. Hybrid Models/Systems: this is when one or more of the aforementioned types are combined in a recommendation system. They make up a comprehensive model by combining the properties of both (CF and Content-based) approaches.


Building a movie recommendation system


In this project, the following python libraries were used - Pandas, NumPy, sklearn, matplotlib FastAPI, and scikit-surprise. Two models were built the first using the collaborative filtering algorithm (Probabilistic Matrix Factorization for the surprise library) and the other was the Pearson correlation using the Pandas library. 


Dataset


The datasets used for this project can be gotten Here. The Kaggle data consists of five datasets the first four include combined data [1-4].txt and the movies_titles.csv.


Recommendation models are implemented in 2 major ways in this project:


  • Collaborative filtering


  • Pearson R’ Correlation: - Pearson’s R correlation measures the linear correlation between review scores of all pairs of movies and then provides a list of the top 10 with the highest correlation


  • Other methods of finding the similarity between 2 movies are using a method of cosine similarity, TensorFlow-recommender etc.


Limitation


The dataset was really large and working with the entire dataset can crash the system due to limited computational power. This limitation we believe can be overcome by using services like amazon, azure, and google services


For this same reason, the model was unable to be deployed on Heroku

Comments

Popular posts from this blog

Structural Integrity Assessment at Ebute Metta West Using SONREB and Rebar Mapping