Skip to main content

Automatated HLA Typing Kit Designer with ML

·2 mins·
Jason Dai
Author
Jason Dai
I am a bioinformatics scientist, software developer, and data scientist passionate about leveraging AI and advanced computing to create innovative solutions across bioinformatics and fintech domains.
Table of Contents

The process of designing a new HLA diagnostic kit is extremely time consuming and usually takes a team of scientists months to develop and test. In the field of HLA, new alleles of the HLA gene are discovered every day with the number of alleles growing exponentially. Because of this, typing kits must be able to detect and resolve new alleles, which requires frequent redesigns. The program I developed at Thermo Fisher Scientific is designed to replace the time-consuming process of manually redesigning a kit. The algorithm uses optimization and principles in information theory to generate more efficient kits than those manually designed by experts, resulting in massive cost savings in raw materials and wages. It also incorporates a machine learning model trained on QC data from previously designed probes to predict hybridization performance of new, untested probes.

Key Skills & Technologies Demonstrated
#

This project has been an invaluable learning experience, allowing me to apply and strengthen a diverse set of skills:

  • Algorithm Development: Designed and implemented a novel optimization algorithm to interpret data and design a Sequence-Specific Oligonucleotide (SSO) kit.
  • Information Theory: Applied principles of entropy to measure the informational value of probes, ensuring maximum discriminatory power and diagnostic accuracy.
  • Created an Automated Design Tool: Built user-friendly software that automates the process of designing diagnostic kits.
  • Interdisciplinary Expertise: Combined techniques from computer science and information theory to solve a challenging problem in molecular biology, showcasing strong problem-solving and cross-domain expertise.
  • Molecular Biology Applied knowledge of DNA hybridization to inform design criteria.
  • Data-Driven Optimization: Integrated a machine learning model to create better probes based on proprietary historical QC data.