Skip to main content

Data Analytics on Small and/or Large High-Dimensional Observations in Finance and Beyond - The Credit Research Initiative at the National University of Singapore

This talk comprises two connected parts -- (1) introducing a new Data Analytics (DA) tool that handles high-dimensional data where the number of observations may be large or small, and (2) describing the Credit Research Initiative (CRI) launched in 2009 at the National University of Singapore, which serves as a template for deploying DA in building infrastructure for public-sector research undertakings in social sciences.

DA in the modern context is often understood as the processes and tools for extracting information out of Big Data. The challenge in DA is on the dimension of the data (i.e., the number of potential variables) rather than on the size (i.e., the number of observations). Considering interaction terms is, for example, natural in social sciences, but that makes the dimension of data much larger without an corresponding increase in the number of observations. Neural Networks and many machine learning tools are often ill-suited for such kinds of DA problems. I will introduce and demonstrate a new variable selection tool based on a zero-norm penalty that can handle high dimensionality without being sensitive to the number of observations.

Being launched post the 2008-09 global financial crisis, the CRI is predicated on the principle of treating credit ratings as a "public good", and DA plays a vital role in its development and implementation. The CRI has in essence created an operational platform for transforming Big Data (structured financial data and non-structured information) into Smart Data (probabilities of default for individual corporations), and the Smart Data are then freely distributed. The CRI operation is staffed by a team of 40+ employees, and on a daily basis generates probabilities of default on over 67,000 exchange-listed firms in 128 economies globally with a term structure from one month to 5 years. These Smart Data are used by financial institutions, central banks, supranational organizations and individual researchers for business operations, policy analyses, and scientific research.