‘My pipeline can help researchers to find new potential medicine’

  • Student project
Thijs Kok 2.jpg

Every year thousands of molecules are tested for their medicinal potential. However, there are even more molecules that are never tested for these properties, but potentially they are able to influence diseases like Parkinson’s. During his graduation project at the Research Centre Biobased Economy, master student Thijs Kok showed that data science techniques can help to find more potential medicine more quickly.

In labs around the world, researchers work on making new molecules for lots of different applications. Some of these molecules might also be suitable as medicine, and therefore it could be interesting to scan them on their properties. For his project, Thijs worked with the data from another student: ‘She trained a machine learning model to gather molecules that were mentioned in the PubChem database of research articles, and predicted which molecules might have a previously unknown relation to a disease. My task was to take these interesting molecules and look for other compounds with similar structures, so you have more options to test.’

For this expansion Thijs build a so-called pipeline, an algorithm that takes the molecule through multiple operations. ‘First I used the name of the molecule to obtain a so-called SDF file from PubChem, and then I turned it into SMILES, a notation that computers can also understand’, Thijs explains. ‘Then you use a Rdkit package in python to make a fingerprint of the structure, which turns the chemical information into bits.’ The last step is comparing the structure with a database of other molecules. ‘This yields a list of similar molecules that might also have the same nutritional or medicinal properties.’

Thijs chose this research because he wanted to utilise his background in chemistry. ‘I also looked at companies that wanted to use data science to analyse their processes, but I think this project was more interesting. It might be a good stepping stone that further research can benefit from, and who knows where it will lead in the end.’

But before this can happen, the pipeline needs to be tested further and optimised. ‘Another student will continue this work, because it is not perfect yet.’ For Thijs the next step is finding a job. ‘Ideally I would like to combine chemistry and data science, but I also saw some interesting traineeships that offered the opportunity to learn more about data science. It is a field that develops fast, so this might also be interesting.

Fields of interest

  • Exact and Information Sciences
  • Sports and Health