Back to ProjectsBiological Tissue Stress Response Classification from Raman Spectroscopy Data
We developed an ML pipeline for non-contact detection of biological tissue stress response from Raman spectroscopy data — classifying HSP70 protein expression into three classes with an interpretable ensemble approach.
Client
NDA
Period
1 week
Format
ML research
About the project
We developed an ML pipeline for non-contact detection of biological tissue stress response from Raman spectroscopy data. Target: classify HSP70 protein expression into three classes — healthy tissue, endogenous stress response, and exogenous stress response. Two spectral ranges were examined — 1500 cm⁻¹ and 2900 cm⁻¹, with the 1500 cm⁻¹ range showing key diagnostic value due to protein marker association.
The Challenge
We needed an interpretable ML system capable of determining tissue stress response from spectral data without overfitting to noise and hardware artifacts — avoiding the black-box effect by showing which spectral features and biomarkers influence the diagnosis.
Our Solution
- Built spectral preprocessing pipeline: axis interpolation, median filtering, Savitzky-Golay smoothing, and SNV normalization
- Rejected raw data training due to hardware artifacts and physically uninformative noise
- Built Soft Voting ensemble on XGBoost and Random Forest
- Trained on full interpolated spectral feature vector without forced dimensionality reduction
- Implemented strict Patient-Level GroupKFold to prevent data leakage
- Split inference: per-spectrum prediction and patient-level aggregation via majority voting
- Added explainable AI module based on Gini Feature Importance for biomarker localization
Results
- ML pipeline for non-contact tissue stress response classification delivered
- High model accuracy confirmed on informative spectral range
- Ensemble approach applicability to high-noise Raman spectroscopy data proven
- Result interpretability ensured through significant spectral biomarker identification
- Project shows potential for automated optical biopsy and in vitro tissue diagnostics
Similar projects

Multi-Component Oil Formulation Parameter Prediction Algorithm
We developed an ML prototype for predicting parameters of multi-component oil formulations based on component data — simplifying formulation work and reducing manual calculations.

Optical System Optimization Algorithm
In collaboration with ITMO University, we developed a solution for selecting optical system configurations based on given physical constraints and image quality requirements. The project sat at the intersection of engineering, computational optics, and applied R&D.

Power Line Insulator Integrity Analysis System
We developed an initial ML prototype for detecting insulators in power line images and analyzing their condition — laying the foundation for automated energy infrastructure diagnostics using computer vision.
Let's buildsomething extraordinary.
Ready to start your next project? Reach out and let's discuss how we can help you achieve your goals.