Fine-tuning and Running Custom AI Models in Simplifier

Written by Jennifer Häfner
April 9, 2025

How can I run my local AI models in Simplifier?

While using cloud-based AI services via an API call certainly has its advantages, some use cases require custom models running on the client for better performance, higher privacy and security, and also cost control.

In this article, you will learn how to

fine-tune a pre-trained model on your custom data,
export the fine-tuned model,
import the fine-tuned model in Simplifier,
run the fine-tuned model on your custom data.

The process consists of three steps: 1) Model Training (or rather fine-tuning), 2) Model Export, and 3) Model Inference.

We train/fine-tune the model in Python, since this is the most popular programming language for machine learning tasks.
We run the model in JavaScript using Transformers.js, since this is the programming language that is used to implement Simplifier app logic.

To make sure that our custom model can operate in both programming environments, we export it as ONNX format. Models in this format can be applied in many different frameworks, and also benefit from access to hardware optimizations (but this will not be addressed in this article).

Use Case

In this tutorial, we want to implement an app with image detection functionality. Our goal is to detect specific objects in the webcam video stream in real-time and to determine their position as exactly as possible by drawing a bounding box around them.

There are already pre-trained models available that can detect a wide range of everyday objects. For example, the pre-trained model YOLO by Ultralytics can detect 80 different classes.

In many cases, using a pre-trained image detection model is a sufficient choice. However, if you want to teach the model to also detect custom objects (for example, specific components from your manufacturing process), it’s necessary to fine-tune the model to your custom dataset.

Prerequisites

For model fine-tuning:

Get access to a Python development environment
You can simply use a Jupyter Notebook service, e.g., Google Colab
Generate an annotated dataset for your use case
Collect images and annotate the objects that you want to detect using bounding boxes. More information on the annotation process and popular annotation tools can be found here: Data Collection and Annotation
To follow this tutorial, your data should be saved in the YOLO format, meaning that all images are stored in a folder called images, and all annotations are stored as a separate .txt file in a folder called labels, with the same file name as the corresponding image.
You don’t have to provide a .yaml file or split the data into train/validation/test set just yet, as we will cover these steps in this tutorial.

For model inference:

Create a Simplifier app to run your custom model
Create a client-side Business Object (CBO)
Upload the Transformers.js library to your Simplifier instance
To do so, add the following code in the section JS Code to include (see screenshot):
```
import ("https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.2.4").then((tf) => {
    window.transformers = tf;
});
```
Then, upload an empty, zipped .js file (since Simplifier expects a ZIP file with an included .js file for each custom library)
Add the Transformers.js library as a dependency to your previously created CBO

STEP 1: Model Training/Fine-tuning

Language: Python
Environment: Jupyter Notebook (I use Google Colab in this tutorial)

If you already have a custom model in ONNX format, you can skip the training/fine-tuning and the export step and jump directly to STEP 3: Model Inference.

If you want to use a pre-trained model without fine-tuning it, you can skip the training/fine-tuning step and jump directly to STEP 2: Model Export.

The Training/Fine-tuning step requires the connection to a GPU runtime in your Python notebook. For exporting to ONNX, a CPU runtime is sufficient.

In this tutorial, we are fine-tuning a pre-trained YOLO model by Ultralytics by using Ultralytic’s Python package. This package offers functions to train, evaluate and export YOLO models.

Data Preprocessing

First, we want to load our data (images and labels) and split them into train/validation/test set. Then, we generate a yaml file containing the configuration for the model training.

Install the ultralytics package:

!pip install -q -U ultralytics

Mount your custom dataset from Google Drive (or your respective data source).
The data is stored in the Google Drive folder yolo. The folder contains the images and the labels for each image.

from google.colab import drive

drive.mount('/content/drive')

Split the data into train/eval/test sets. This is only required if the data is not split yet:

import os
import numpy as np
from sklearn.model_selection import train_test_split

images = os.listdir('/content/drive/MyDrive/yolo/images')
labels = os.listdir('/content/drive/MyDrive/yolo/labels')

images.sort()
labels.sort()
images_with_labels = list(zip(images, labels))

print(f'Loaded {len(images_with_labels)} annotated images.')

# split all into train and test
data_train, data_test = train_test_split(images_with_labels, test_size=0.2, random_state=42)

# split test into eval and test
data_eval, data_test = train_test_split(data_test, test_size=0.5, random_state=42)

print(f'Train data: {len(data_train)} samples.')
print(f'Eval data: {len(data_eval)} samples.')
print(f'Test data: {len(data_test)} samples.')

Store the splits in separate folders:

import shutil
from tqdm.auto import tqdm

train_folder_path = '/content/drive/MyDrive/yolo/train/'
val_folder_path = '/content/drive/MyDrive/yolo/val'
test_folder_path = '/content/drive/MyDrive/yolo/test'

# create folders if they do not exist
if not os.path.isdir(train_folder_path):
    os.mkdir(train_folder_path)
if not os.path.isdir(val_folder_path):
    os.mkdir(val_folder_path)
if not os.path.isdir(test_folder_path):
    os.mkdir(test_folder_path)

print('Storing train data...')
for image_with_label in tqdm(data_train):
    shutil.copy(f'/content/drive/MyDrive/yolo/images/{image_with_label[0]}', f'{train_folder_path}')
    shutil.copy(f'/content/drive/MyDrive/yolo/labels/{image_with_label[1]}', f'{train_folder_path}')

print('Storing eval data...')
for image_with_label in tqdm(data_eval):
    shutil.copy(f'/content/drive/MyDrive/yolo/images/{image_with_label[0]}', f'{val_folder_path}')
    shutil.copy(f'/content/drive/MyDrive/yolo/labels/{image_with_label[1]}', f'{val_folder_path}')

print('Storing test data...')
for image_with_label in tqdm(data_test):
    shutil.copy(f'/content/drive/MyDrive/yolo/images/{image_with_label[0]}', f'{test_folder_path}')
    shutil.copy(f'/content/drive/MyDrive/yolo/labels/{image_with_label[1]}', f'{test_folder_path}')

print('Data splitting done.')

Generate a yaml file for the train configuration and store it in the yolo folder. Adapt the classes that you want to detect in the yaml file:

import yaml

yaml_content = {
    'path': '/content/drive/MyDrive/yolo',
    'train': 'train',
    'val': 'val',
    'nc': 1, # number of classes
    'names':
        {0: 'roll'} # adapt your classes here
    }
yaml_path = '/content/drive/MyDrive/yolo/train_model.yaml'

with open(yaml_path, 'w') as yaml_file:
    yaml.dump(yaml_content, yaml_file, default_flow_style=False)

Load Model and Fine-Tune

Load a pretrained YOLO model:

from ultralytics import YOLO

model = YOLO("yolo11s.pt")

Fine-tune the model with the configuration in the generated yaml file:

model.train(data="/content/drive/MyDrive/yolo/train_model.yaml", epochs=50, imgsz=640, plots=True) # 50 epochs should be enough

To check the model performance, you can inspect the generated result files/plots and also test the model with a random image:

import os, random

test_folder_path = '/content/drive/MyDrive/yolo/test'
test_images = list(filter(lambda x: '.txt' not in x, os.listdir(test_folder_path)))

Get a random sample image:

random_index = random.randint(0, len(test_images))
print(f'random index: {random_index}')

sample_image = test_images[random_index]
sample_image_path = os.path.join(test_folder_path, sample_image)
print(f'sample image: {sample_image_path}')

Test the object detection:

result = model(sample_image_path)

# display the result
result[0].show()  # display to screen

STEP 2: Model Export to ONNX

Language: Python
Environment: Jupyter Notebook (I use Google Colab in this tutorial)

ONNX Export

We now have a fine-tuned YOLO model that can detect our custom class. To run it in Simplifier, we have to export the model in ONNX format, so that it can also be used in a JavaScript environment.

The Ultralytics Python package offers a function to do that:

model.export(format="onnx", dynamic=True, simplify=True, nms=True, opset=19) # nms=True works for YOLO >= v11

Then, store the exported ONNX model on your local system.

Config File Creation

To load the ONNX model with Transformers.js later, you have to create two additional files containing information about the model and the detected classes:

config.json

Create this file by hand. It contains the classes of the fine-tuned YOLO model, for example:

{
    "id2label": {
        "0": "roll"
    },
    "label2id": {
        "roll": 0
    },
    "model_type": "yolov11"
}

preprocessor_config.json

Created this file by hand. It contains the instructions how the image for the model must be processed, for example:

{
    "do_normalize": false,
    "do_pad": false,
    "do_rescale": true,
    "do_resize": true,
    "feature_extractor_type": "ImageFeatureExtractor",
    "resample": 2,
    "rescale_factor": 0.00392156862745098,
    "size": {
        "width": 640,
        "height": 640
    }
}

STEP 3: Model Inference (Running the Model)

Language: JavaScript
Environment: Simplifier (App and Client-Side Business Object)

Now that we have fine-tuned the pre-trained model and exported it in ONNX format, it’s time to import it to Simplifier and use it there. Just follow these steps:

Import the ONNX Model and Config Files

Open the Simplifier app in which you want to run your model. Upload the ONNX model file, the file config.json and the file preprocessor_config.json in the assets section of your app.

Load and Run the Model

Now, switch to the CBO that you have created as a prerequisite and create a new function. Use the following code to load the model and the data preprocessor using the files that you have uploaded in the app’s asset section.

const sModelId = 'data'; // this is the assets folder where both .config and .onnx are stored
let oModel, oProcessor;

// init transformers.js environment variables
transformers.env.allowRemoteModels = false;
transformers.env.allowLocalModels = true;
transformers.env.localModelPath = '.';

// Load the fine-tuned model
oModel = await transformers.AutoModel.from_pretrained(sModelId, {
     model_file_name: 'YOLO_with_NMS', // adapt this to match your model name
     subfolder: '',
     dtype: 'fp32', // new transformers.js version (after 3.x) full-precision model: 'fp32', quantized model: 'int8'
});
sModelInputName = oModel.sessions.model.inputNames[0];
sModelOutputName = oModel.sessions.model.outputNames[0];
		
// Load the image processor
oProcessor = await transformers.AutoProcessor.from_pretrained(sModelId);

Then, make a snapshot of the webcam video stream, preprocess the image and run the model on the image data.

// Read the current frame from the video
const oPixelData = oContext.getImageData(0, 0, width, height).data;
const oRawImage = new transformers.RawImage(oPixelData, width, height, 4);
	
// Process the image
const {
    pixel_values,
    reshaped_input_sizes
} = await oProcessor(oRawImage);

// Run the model
const aModelOutputs = await oModel({
    [sModelInputName]: pixel_values
});

The model will check the image for the class(es) of objects that it has been fine-tuned on in STEP 1. Then, it returns the coordinates of bounding boxes around the detected objects.

Complete Code Example

Here is a complete CBO function that starts the webcam, loads the custom AI model and runs it on snapshots of the webcam video stream. You can use this CBO function in your app, e.g., triggered by the event onAfterRendering.

On your app screen, add the widget ui_core_HTML. For the widget’s property content, add the following HTML:

<div id=”container”>
<video id=”video” autoplay=”autoplay” muted=”” width=”300″ height=”150″></video> <canvas id=”canvas” width=”360″ height=”240″></canvas>
<div id=”overlay”></div>
</div>

Here is the code for the CBO function detectObject:

// Reference the elements that we will need
const oContainer = document.getElementById('container');
const oOverlay = document.getElementById('overlay');
const oCanvas = document.getElementById('canvas');
const oVideo = document.getElementById('video');

// if we detect only 1 class, we technically only need 1 colour
const COLOURS = [
    "#EF4444", "#4299E1", "#059669",
    "#FBBF24", "#4B52B1", "#7B3AC2",
    "#ED507A", "#1DD1A1", "#F3873A",
    "#4B5563", "#DC2626", "#1852B4",
    "#18A35D", "#F59E0B", "#4059BE",
    "#6027A5", "#D63D60", "#00AC9B",
    "#E64A19", "#272A34"
];
let bIsProcessing = false;
const oContext = oCanvas.getContext('2d', {
    willReadFrequently: true
});
const sModelId = 'data'; // this is the assets folder where both .config and .onnx are stored
let oModel, oProcessor;
let iThreshold = 0.25;

(async () => {
    try {
        // init transformers.js environment variables
        transformers.env.allowRemoteModels = false;
        transformers.env.allowLocalModels = true;
        transformers.env.localModelPath = '.';

        // Load the fine-tuned model
        oModel = await transformers.AutoModel.from_pretrained(sModelId, {
            model_file_name: 'YOLO_with_NMS', // adapt this to match your model name
            subfolder: '',
            dtype: 'fp32', // new transformers.js version (after 3.x) full-precision model: 'fp32', quantized model: 'int8'
        });
        sModelInputName = oModel.sessions.model.inputNames[0];
        sModelOutputName = oModel.sessions.model.outputNames[0];

        // Load the image processor
        oProcessor = await transformers.AutoProcessor.from_pretrained(sModelId);

        // Start the video stream
        navigator.mediaDevices.getUserMedia({
                video: {
                    facingMode: 'environment'
                }
            }, // Ask for video
        ).then((stream) => {
            // Set up the video and canvas elements.
            oVideo.srcObject = oStream;
            oVideo.play();

            const oVideoTrack = oStream.getVideoTracks()[0];
            const {
                width,
                height
            } = oVideoTrack.getSettings();

            oCanvas.width = width;
            oCanvas.height = height;

            // Set container width and height depending on the image aspect ratio
            const fAspectRatio = width / height;
            const [iContainerWidth, iContainerHeight] = (fAspectRatio > 720 / 405) ? [720, 720 / fAspectRatio] : [405 * fAspectRatio, 405];
            oContainer.style.width = `${iContainerWidth}px`;
            oContainer.style.height = `${iContainerHeight}px`;

            // Start the animation loop
            window.requestAnimationFrame(updateCanvas);

            fnSuccess({});
        });
    } catch (e) {
        fnError(e.message);
    }
})();

// Run processor and model on pixel data every frame
function updateCanvas() {
    const {
        width,
        height
    } = oCanvas;
    oContext.drawImage(oVideo, 0, 0, width, height);

    if (!bIsProcessing) {
        bIsProcessing = true;
        (async function() {
            try {
                // Read the current frame from the video
                const oPixelData = oContext.getImageData(0, 0, width, height).data;
                const oRawImage = new transformers.RawImage(oPixelData, width, height, 4);

                // Process the image
                const {
                    pixel_values,
                    reshaped_input_sizes
                } = await oProcessor(oRawImage);

                // Run the model
                const aModelOutputs = await oModel({
                    [sModelInputName]: pixel_values
                });

                // Update UI
                oOverlay.innerHTML = '';

                // Draw boxes on video
                const aSizes = reshaped_input_sizes[0].reverse();

                // Iterate over all detected boxes and check the box data
                aModelOutputs[sModelOutputName].tolist()[0].forEach(aBox => renderBox(aBox, aSizes));

                bIsProcessing = false;
            } catch (e) {
                fnError(e.message);
            }
        })();
    }
    window.requestAnimationFrame(updateCanvas);
}

// Render a bounding box and label on the image
function renderBox([iXmin, iYmin, iXmax, iYmax, fScore, iClassId], [iImageWidth, iImageHeight]) {
    try {
        if (fScore < fConfidenceThreshold) return; // Skip boxes with low confidence

        // Generate a random color for the box
        const sColour = COLOURS[iClassId % COLOURS.length];

        // Draw the box
        const oBoxElement = document.createElement('div');
        oBoxElement.className = 'bounding-box';
        Object.assign(oBoxElement.style, {
            borderColor: sColour,
            left: 100 * iXmin / iImageWidth + '%',
            top: 100 * iYmin / iImageHeight + '%',
            width: 100 * (iXmax - iXmin) / iImageWidth + '%',
            height: 100 * (iYmax - iYmin) / iImageHeight + '%',
        })

        // Draw label
        const oLabelElement = document.createElement('span');
        oLabelElement.textContent = `${oModel.config.id2label[iClassId]} (${(100 * fScore).toFixed(2)}%)`;
        oLabelElement.className = 'bounding-box-label';
        oLabelElement.style.backgroundColor = sColour;

        oBoxElement.appendChild(oLabelElement);
        oOverlay.appendChild(oBoxElement);
    } catch (e) {
        fnError(e.message);
    }
}

That’s it!

Now you have learned how to fine-tune AI models, export them in ONNX format and run them in a Simplifier app with Transformers.js. Of course, you can also use other AI-related JavaScript libraries, like ONNX Runtime JavaScript API, to run your ONNX models. Just upload the ONNX models to the assets section of your Simplifier app and then follow the libraries’ documentation to adapt the JavaScript code in your CBO function.

Was this article helpful?

2 Yes No