Original Source Here

Full Scale development of Recommendation system as ML Engineer from scratch.

Recommendation system is one of the most important and complex software engineering system from many aspects.It is one of the reason of success of giants like netflix,youtube etc.

Making a full scale recommendation system as a ready deployable product for a working company requires great knowledge of deep learning,logic programming,database and domain ,apart from these,you should have required software engineering and development knowledge for deployment and maintenance.[“yeah I know all these, huhhhh”]

Hii Everyone,I am Aditya Raj,currently in 2nd year at IIIT Allahabad and also working remotely as Machine Learning Engineer at yellowbacks.com.

I have been working for 5 months now and feel immensely lucky to work at a growing startup,I got to learn ocean of things and got opportunity to design,make and deploy some interesting AI based systems.

Well,here I will be talking of my favourite project that literally made me a complete software Engineer.Well, I will try to explain from scratch but basic Engineering and mathematical knowledge will not make you feel shit.

A worthy Knowledge of Computer Vision is preferred here.If you don’t have,Below is link to another blog.

Also I will be keeping things little basic to compared to original project so that anyone can reference this for their future project.

Recommendation System :- Recommendation system is AI based software system which recommends products to user on basis of features of product or features of users or both.

In item based recommendation system, we find features of items,and then do recommendation on basis of similarity between features.(content filtering)

In user based,we find features of users and then recommend similar products to similar users.(collaborative filtering)

The hybrid recommendation system is one using both of them.I am working on development of hybrid recommendation system.

Today I would be explaining about content based part,I am still working on its conversion to hybrid one.The recommendation system I made is for a fashion startup and so its a visual based product(Computer Vision stuff).It is basically the thing which shows you product in form of “YOU MAY ALSO LIKE” on amazon.

A basic demo of my working system:-

Now compare it with a big tech company’s:-

Welll, Thanks,I know I am good……..

Don’t cry for design as it is just an api yet to be embedded in a beautiful software.

Now let’s learn.

So I will explain the whole system in these parts:-

High level look and design
R&E (Research and Experiments)
Deep learning and data engineering
Backend coding and API making
Deployment

High level look and design

So,This part is most difficult for me to write,lets try:-

Making it 007000000128 times simple,consider that each of the image of the the fashion website is stored in a database server.Suppose every image is converted to 2048 dimensional mathematical vector(or array with 2048 elements for coders).Now suppose a user is viewing a jeans for his girlfriend and my AI system has to recommend similar jeans to him,how will it?

It’s simple,calculate the distance between current jean’s vector with all images vector and output the images having least distances.

So , I have to design and write an Algorithm for finding embeddings(vector) for every image and save it in database and another basic code to find distance between vectors and recommending images having less distance.

Well it’s not as simple,finding embedding for every image requires great deep learning,machine learning knowledge and experience and most importantly patience.

Research and Experiments part

This part included more of studying and understanding existing systems and models on github,colab,medium and other open source platforms.

Now explaining my all research and experiments will surely bore you,so directly jumping to conclusions.

What I learned from research and experiments:-

I should use transfer learning approach with fashion image classification as a primer.
DeepFashion Dataset is the best option available.
Transfer learning architecture should be modified to object detection or bounding box architecture in parallel with classification to ignore background as well as recommend every fashion subpart separately.
The embeddings should be one-dimensional and be ranged to make database less expensive and make distance calculation faster.
Latency of system should be low and Euclidean distance is simple,easy and efficient method here.
Object detection and cropping should also be done very efficiently and since,in my case topwear and bottomwear recommendation was needed,I used Facebook AI’s detectron 2 for cropping(“ I am pretty advanced”).
With detectron 2,deepfashion 2 was used for multi-object detection.

Deep Learning and Data Engineering

This part comes into the picture for most complicated aspect of the project i.e getting embeddings(mathematical vectors).

If you don’t know basics of computer vision,I am sorry,I have another blog for you ,which will be provided in description.

Before calculating the vector,we need to crop bottomwear and topwear image from the image.We will do it using detectron:

step 1:- Preprocessing deepfashion2:-

from PIL import Image
import numpy as np
import json

dataset = {
    "info": {},
    "licenses": [],
    "images": [],
    "annotations": [],
    "categories": []
}

lst_name = ['short_sleeved_shirt', 'long_sleeved_shirt', 'short_sleeved_outwear', 'long_sleeved_outwear',
            'vest', 'sling', 'shorts', 'trousers', 'skirt', 'short_sleeved_dress',
            'long_sleeved_dress', 'vest_dress', 'sling_dress']

for idx, e  in enumerate(lst_name):
    dataset['categories'].append({
        'id': idx + 1,
        'name': e,
        'supercategory': "clothes",
        'keypoints': ['%i' % (i) for i in range(1, 295)],
        'skeleton': []
    })

num_images = 32153 #191961 
sub_index = 0  # the index of ground truth instance
for num in range(1, num_images + 1):
    json_name = '/content/validation/annos/' + str(num).zfill(6) + '.json'
    image_name = '/content/validation/image/' + str(num).zfill(6) + '.jpg'

if (num >= 0):
        imag = Image.open(image_name)
        width, height = imag.size
with open(json_name, 'r') as f:
            temp = json.loads(f.read())
            pair_id = temp['pair_id']

            dataset['images'].append({
                'coco_url': '',
                'date_captured': '',
                'file_name': str(num).zfill(6) + '.jpg',
                'flickr_url': '',
                'id': num,
                'license': 0,
                'width': width,
                'height': height
            })
for i in temp:
if i == 'source' or i == 'pair_id':
continue
else:
                    points = np.zeros(294 * 3)
                    sub_index = sub_index + 1
                    box = temp[i]['bounding_box']
                    w = box[2] - box[0]
                    h = box[3] - box[1]
                    x_1 = box[0]
                    y_1 = box[1]
                    bbox = [x_1, y_1, w, h]
                    cat = temp[i]['category_id']
                    style = temp[i]['style']
                    seg = temp[i]['segmentation']
                    landmarks = temp[i]['landmarks']

                    points_x = landmarks[0::3]
                    points_y = landmarks[1::3]
                    points_v = landmarks[2::3]
                    points_x = np.array(points_x)
                    points_y = np.array(points_y)
                    points_v = np.array(points_v)
                    case = [0, 25, 58, 89, 128, 143, 158, 168, 182, 190, 219, 256, 275, 294]
                    idx_i, idx_j = case[cat - 1], case[cat]

for n in range(idx_i, idx_j):
                        points[3 * n] = points_x[n - idx_i]
                        points[3 * n + 1] = points_y[n - idx_i]
                        points[3 * n + 2] = points_v[n - idx_i]

                    num_points = len(np.where(points_v > 0)[0])

                    dataset['annotations'].append({
                        'area': w * h,
                        'bbox': bbox,
                        'category_id': cat,
                        'id': sub_index,
                        'pair_id': pair_id,
                        'image_id': num,
                        'iscrowd': 0,
                        'style': style,
                        'num_keypoints': num_points,
                        'keypoints': points.tolist(),
                        'segmentation': seg,
                    })

json_name = '/content/deepfashion2_train.json'
with open(json_name, 'w') as f:
    json.dump(dataset, f)

step 2:- training detectron2 on DeepFashion2 :-

cfg = get_cfg()

cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("deepfashion_train",)
cfg.DATASETS.TEST = ()
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")  # Let training initialize from model zoo

cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 1500
cfg.SOLVER.STEPS = (1000,1500)
cfg.SOLVER.GAMMA = 0.05
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 13

cfg.TEST.EVAL_PERIOD = 500os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

step 3:- saving the ready model and config in a zip

import shutil
shutil.make_archive('fashion_model', 'zip', '/content/output')
%cp "/content/fashion_model.zip" "/content/drive/My Drive/"

Now lets develop a crop function that will crop any image into topwear and bottomwear with the help of trained detectron model.

Now that we have our topwear,bottomwear image separated,we have to train the model for embeddings.

This is a proper deeplearning project.Use a resnet 50 model of keras and fine tune last few layers on Deepfashion. Now to have better result we will add 2 layers for bounding box also and finally do classfication detection on deepfashion.

model_resnet = ResNet50(weights=’imagenet’, include_top=False, pooling=’avg’)

for layer in model_resnet.layers[:-12]:

layer.trainable = False

x = model_resnet.output

x = Dense(512, activation=’elu’, kernel_regularizer=l2(0.001))(x) # THIS IS ELU, NOT RELU

y = Dense(46, activation=’softmax’, name=’img’)(x)

x_bbox = model_resnet.output

x_bbox = Dense(512, activation=’relu’, kernel_regularizer=l2(0.001))(x_bbox)

x_bbox = Dense(128, activation=’relu’, kernel_regularizer=l2(0.001))(x_bbox)

bbox = Dense(4, kernel_initializer=’normal’, name=’bbox’)(x_bbox)

final_model = Model(inputs=model_resnet.input,

outputs=[y, bbox])

architecture before:-

__________________________________________________________________________________________________ conv5_block3_3_bn (BatchNormali (None, None, None, 2 8192 conv5_block3_3_conv[0][0] __________________________________________________________________________________________________ conv5_block3_add (Add) (None, None, None, 2 0 conv5_block2_out[0][0] conv5_block3_3_bn[0][0] __________________________________________________________________________________________________ conv5_block3_out (Activation) (None, None, None, 2 0 conv5_block3_add[0][0] __________________________________________________________________________________________________ avg_pool (GlobalAveragePooling2 (None, 2048) 0 conv5_block3_out[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 512) 1049088 avg_pool[0][0] __________________________________________________________________________________________________ dense (Dense) (None, 512) 1049088 avg_pool[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 128) 65664 dense_1[0][0] __________________________________________________________________________________________________ img (Dense) (None, 46) 23598 dense[0][0] __________________________________________________________________________________________________ bbox (Dense) (None, 4) 516 dense_2[0][0] ================================================================================================== Total params: 25,775,666 Trainable params: 6,653,618 Non-trainable params: 19,122,048

This architecture was trained for 120 epochs on gpu on deepfashion dataset.

architecture after:-

conv5_block3_3_bn (BatchNormali (None, None, None, 2 8192 conv5_block3_3_conv[0][0] __________________________________________________________________________________________________ conv5_block3_add (Add) (None, None, None, 2 0 conv5_block2_out[0][0] conv5_block3_3_bn[0][0] __________________________________________________________________________________________________ conv5_block3_out (Activation) (None, None, None, 2 0 conv5_block3_add[0][0] __________________________________________________________________________________________________ avg_pool (GlobalAveragePooling2 (None, 2048) 0 conv5_block3_out[0][0]

The 2048 length vector output from this modified model after training will be used as embeddings for images.

The deep learning part was done here and now we will move to the backend coding part.

Backend coding and API development

Lets first get the crop images function using the detectron hardwork we did in object detection part:-

def crop_images(image):
  config_file_path = "config.yaml"
  model_path = "model_final.pth"

  lst_name = ['short_sleeved_shirt', 'long_sleeved_shirt', 'short_sleeved_outwear', 'long_sleeved_outwear',
              'vest', 'sling', 'shorts', 'bottom_wear', 'skirt', 'short_sleeved_dress',
              'long_sleeved_dress', 'vest_dress', 'sling_dress']

  bottom = ['shorts', 'bottom_wear', 'skirt']
  top = ['short_sleeved_shirt', 'long_sleeved_shirt', 'short_sleeved_outwear', 'long_sleeved_outwear','vest','short_sleeved_dress',
              'long_sleeved_dress', 'vest_dress']


  cfg = get_cfg()
  cfg.merge_from_file(config_file_path)
  cfg.MODEL.WEIGHTS =  model_path
  cfg.DATASETS.TEST = ("deepfashion_val", )
  cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.55   # set the testing threshold for this model
  predictor = DefaultPredictor(cfg)

  im = cv2.imread(image)
  outputs = predictor(im)


  boxes = {}
  for coordinates in outputs["instances"].to("cpu").pred_boxes:
    coordinates_array = []
    for k in coordinates:
      coordinates_array.append(int(k))

    boxes[uuid.uuid4().hex[:].upper()] = coordinates_array

  image_batch = []
  image_details = []
  for k,v in boxes.items():
    crop_img = im[v[1]:v[3], v[0]:v[2], :]
    image_batch.append(crop_img)

  for i in range(0,len(boxes)):
      image_details.append(lst_name[outputs['instances'][i].pred_classes.item()])

  botret = []
  topret = []
  for i in np.arange(0,len(image_batch)):
    if image_details[i] in bottom:
      botret.append(image_batch[i])
    if image_details[i] in top:
      topret.append(image_batch[i])

  print("Cropping Successful")
  return topret,botret

After getting the cropped image,its embedding time.

Lets see the code for generating embedding database with the help of crop images and this model

for items in os.listdir('images'):
    if items in rotten:
        continue
    else:
        top,bottom = crop.crop_images('images/'+items)

        for img in top[:1]:
            img = img.reshape(1,img.shape[0],img.shape[1],img.shape[2])
            topwear_database[items] = embed(img).numpy()[0]

        for img in bottom[:1]:
            img = img.reshape(1,img.shape[0],img.shape[1],img.shape[2])
            bottomwear_database[items] = embed(img).numpy()[0]

        im = cv2.imread('images/'+items)
        img = im.reshape(1,im.shape[0],im.shape[1],im.shape[2])
        main_database[items] = embed(img).numpy()[0]

after this the long process the database has been prepared.

Now we just have to take user image and recommend similar image based on the distances of embedding vector.

Function for getting k similar recommendations:-

def similar_k(image_id,k):

  toprecommended = []
  bottomrecommended = []
  output1 = topwear_database[image_id]
  output2 = bottomwear_database[image_id]

  dis_dict = {}
  for key in topwear_database:
    dis_dict[key] = euclidean_distance(topwear_database[key],output1)

  sorted_dis_dict = dict(sorted(dis_dict.items(), key=lambda item: item[1]))
  i = 0

  for key in sorted_dis_dict:
    if i == k:
      break
    if key == image_id:
      continue;
    else:
        toprecommended.append(key)

    i = i+1

  dis_dict = {}
  for key in bottomwear_database:
    dis_dict[key] = euclidean_distance(bottomwear_database[key],output2)

  sorted_dis_dict = dict(sorted(dis_dict.items(), key=lambda item: item[1]))
  i = 0

  for key in sorted_dis_dict:
    if i == k:
      break
    if key == image_id:
      continue;
    else:
        bottomrecommended.append(key)

    i = i+1


  return toprecommended,bottomrecommended

All the following code fucntions are used in API developed with the help of fastAPI. The code work of fastapi is very simple and need not be discussed.

Deployment:

I had to deploy the api onto docker so that it can be used by other developers to integrate.

Docker is kind of a virtual container which can be transfered within servers and systems.

First I set up all my requirements into requirements.txt file and then make a docker file.

API can be used and integrated directly,but for database making and embedding generation we certainly need Docker.

FROM ubuntu:20.04


RUN apt update && apt install -y htop python3
RUN apt install -y python3-pip

RUN pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install detectron2==0.1.3+cpu -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html

COPY requirements.txt ./requirements.txt

RUN pip install -r requirements.txt
COPY . .

RUN python3 database.py && python3 test.py

Sending build context to Docker daemon 613.5MB
Step 1/9 : FROM ubuntu:20.04
— -> 7e0aa2d69a15
Step 2/9 : RUN apt update && apt install -y htop python3
— -> Using cache
— -> 75031337f4ce
Step 3/9 : RUN apt install -y python3-pip
— -> Using cache
— -> af01d295f9e7
Step 4/9 : RUN pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
— -> Using cache
— -> cb2ab85a33c0
Step 5/9 : RUN pip install detectron2==0.1.3+cpu -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html
— -> Using cache
— -> 1059cff99453
Step 6/9 : COPY requirements.txt ./requirements.txt
— -> Using cache
— -> 1ef1b3763905
Step 7/9 : RUN pip install -r requirements.txt
— -> Using cache
— -> 54aafe521f90
Step 8/9 : COPY . .
— -> 4f991a474a99
Step 9/9 : RUN python3 database.py && python3 test.py
— -> Running in 391ab8cfe128

WE ARE READY

Any doubt,remark,suggestion,thanks will be appreciated

My whatsapp no:- 8292098293

Email Id:- adityaraj20008@gmail.com

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

via WordPress https://ramseyelbasheer.io/2021/06/12/full-scale-development-of-recommendation-system-as-ml-engineer-from-scratch/