Programmatic Compilation of Chemical Data and Literature from PubChem using MATLAB

  • Vincent F. Scalfani University of Alabama
  • Serena C Ralph
  • Ali Al Alshaikh
  • Jason E Bara University of Alabama


MATLAB live scripts are useful for reproducible programmatic compilation of chemical data and literature. In this article, we use a combination of the PubChem PUG REST Application Programming Interface (API), Structured Data Query (SDQ) agent, and text extraction with MATLAB live scripts that allow programmatic PubChem similarity searching, SMARTS substructure queries, literature searching, compound-based bibliometric data compiling, and SDfile data extraction. All MATLAB live scripts are openly available and adaptable with minimal modification to the script code. We discuss how these live scripts can increase scientific reproducibility and be integrated into chemistry and chemical engineering education.

Author Biographies

Vincent F. Scalfani, University of Alabama

Vincent F. Scalfani is an Associate Professor in the Rodgers Library for Science and Engineering at The University of Alabama. He is the information specialist for Chemistry, Chemical Engineering, and Mathematics. Before joining the University of Alabama in 2012, he studied block copolymer phase behavior and earned a PhD in Chemistry from Colorado State University (2012) and a BS in Chemistry from SUNY Oswego (2007). His research interests include chemical information and cheminformatics.

Serena C Ralph

Serena C. Ralph is an undergraduate at The University of Alabama. Over the past two years, she has been actively engaged in cheminformatics and big data research. She is also interested in economics and market research.

Ali Al Alshaikh

Ali Al Alshaikh earned a BS in Chemical Engineering from The University of Alabama (2018), and is currently an MS student in Chemical Engineering. He is working on a variety of independent research projects, including compound bibliometrics, literature data curation, and interactive scientific visualizations.

Jason E Bara, University of Alabama

Jason E. Bara earned a BS in Chemical Engineering from Virginia Commonwealth University (2002) and a PhD in Chemical Engineering from the University of Colorado – Boulder (2007).  After working as Senior Research Associate at CU-Boulder from 2007-2009, he started his academic career at The University of Alabama in 2010 and was promoted with tenure to Associate Professor in 2015. His research interests include polymer materials, green chemistry, and processes for greenhouse gas reduction.

Class and Home Problems