Author: Umit V. Ucak, Islambek Ashyrmamatov and Juyong Lee
Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
Tokenization is an important preprocessing step in natural language processing that may have a significant influence on prediction quality. This research showed that the traditional SMILES tokenization has a c...
Reconstruction of lossless molecular representations from fingerprints
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the intern...