AI · Machine Learning · Vision · NLP · Applied mathematics

Cybélia R&D

We take on the hard problems.

The ones that need mathematics.

✓ Computer vision · Image recognition · OCR

✓ Language processing · LLM · NLP · RAG

✓ Predictive models · Time series · Anomaly detection

✓ Private AI · On-premise · Integrated into your IT

We design, train and deploy artificial intelligence systems

on real use cases, with deliberate scientific rigour.

Our philosophy

We say what we do, we do what we say

and why it works!

AI is not magic. It is linear algebra, probabilities, tensor computations, optimisation and a great deal of engineering.

At Cybelia Cloud, we work at the model level: architecture choice, loss function, regularisation, cross-validation.

We support companies with serious projects at the frontier of mathematics and computer science, looking for a partner able to hold a technical conversation.

Research areas

Four technical pillars. One single requirement: rigour.

From raw signal to production model — we cover the entire chain.

Computer vision

CNNs, convolutions, pooling, object detection (YOLO, R-CNN), semantic segmentation. From raw image to feature vector.

OCR & Recognition

Tesseract pipeline, OpenCV preprocessing, hOCR, NLP post-correction. Structured extraction from scanned documents, invoices and forms.

NLP & Voice

STT on Android with Sherpa-onnx. Acoustic model, language model, MFCC, VAD. WER as the reference metric.

Machine Learning

Supervised and unsupervised models, feature engineering, GridSearchCV, cross-validation, F1/AUC/mAP metrics. PyTorch, TensorFlow, scikit-learn.

Application cases

OCR on administrative documents and invoices

Real problems. Solutions that ship.

1. OpenCV preprocessing

deskew, adaptive binarisation (Otsu), denoising (median filter, morphology).

2. Zone segmentation

text block, table and field detection via contour analysis.

3. Tesseract recognition (LSTM)

psm 6 config, fine-tuning on a business corpus.

4. NLP post-correction

error detection with a domain dictionary, correction via Levenshtein distance.

5. JSON structuring

field mapping → target schema, validation via business rules.

Result: recognition rate > 92% on the test corpus, processing time < 800 ms per page.

Object detection and classification with CNNs

Problem: identify and locate specific items in a video stream or industrial images.

Architecture:

- CNN backbone — convolutional layers (3×3, stride 1),

batch normalisation, ReLU, max pooling.

- Transfer learning from ResNet-50 pretrained on ImageNet —

fine-tuning the last layers on the business dataset.

- Detection head — bounding-box regression +

multiclass classification (Softmax).

- Combined loss: BCE for classification + L1/IoU for localisation.

Training: PyTorch, Adam (lr=1e-4),

cosine annealing scheduler, data augmentation (flip, crop, jitter).

mAP@0.5: 87.3% on the validation set.

Predictive model on business data

Problem: anticipate a business event

(failure, churn, anomaly) from heterogeneous historical data.

Methodology:

- Exploration and cleansing — missing values (KNN imputation),

outliers (IQR), categorical encoding (target encoding).

- Feature engineering — sliding time windows,

statistical aggregates, derived features.

- Model selection — Random Forest, XGBoost,

LightGBM compared with stratified cross-validation (k=5).

- Hyperparameter tuning — Optuna / GridSearchCV.

- Interpretability — SHAP values for business explainability.

Metrics: F1-score 0.89, AUC-ROC 0.94 on the test set.

Stack & tools

Open source tools, proven, documented and maintainable

No opaque proprietary frameworks. Every brick is audited, understood and mastered.

Vision & Image

OpenCV · Pillow · scikit-image · Tesseract 5 · PyTorch · torchvision · ONNX Runtime

Audio & NLP

Vosk · Sherpa-onnx · WebRTC VAD · NLTK · spaCy · HuggingFace Transformers · Kaldi

ML & Data

scikit-learn · XGBoost · LightGBM · Optuna · SHAP · Pandas · NumPy · Matplotlib

Our method

From the problem to the model in production — no detour

Scientific framing

We start by understanding the real problem, not the imagined solution. Formal definition of the task, inputs/outputs and success metrics.

Data & exploration

Audit of the available data — volume, quality, bias, distribution. We promise nothing before having seen the data.

Experimentation & baseline

Setting up a simple baseline model, then controlled iterations tracking metrics. Reproducibility guaranteed.

Deployment & integration

ONNX export, REST API or native integration (Android JNI, Python module). Technical documentation delivered with the model.

Who is this for?

You have a hard problem you want solved properly.

CIOs & technical leaders

You have an AI project under way or under consideration and you're looking for a rigorous outside opinion to frame, evaluate or de-risk it.

Deeptech startups

You have a strong idea but lack ML/vision/NLP resources to take a POC to product.

R&D project owners

You're working on a subject at the frontier of AI and mathematics and you need a technical partner, not a general-purpose service provider.

You have a hard problem. We love that.

Describe your project in a few lines — we'll respond with a technical analysis, not a sales quote.