The paper presents a study that analyzes financial information on Italian companies from 2001 to 2018, using the AIDA database. The data set includes bankrupted companies that had revenues between 1 million to 40 million euros in at least one of the last 5 years of life before they went bankrupt and a company lifetime of at least 10 years, as well as active companies that had similar revenues in the last 5 years. The data set contains more than 8959 companies and 15 most important financial features. The study applies feature reduction, missing value imputation, and standard scaling to the data set. Finally, the study evaluates the performance of a machine learning classifier, adopting AUC as the metric used in previous works. The imbalanced nature of the data set is addressed by sampling active companies in a controlled way. The study aims to predict bankruptcies of companies, with the focus on the recall of the confusion matrix, as finding all companies that most likely will declare bankruptcy is more important than precision.Computational results this text presents the results of a study that compared the performance of several machine learning algorithms in predicting bankruptcy of companies. The study found that the gradient boosting algorithm outperformed logistic regression and neural network models and achieved similar but slightly better results than random forest. The study also implemented a two-round data set procedure to overcome the difficulty of highly unbalanced data and found that the second round improved the AUC metric by at least 7%. The study used only 15 financial and operational features as independent variables and achieved comparable or better results than previous studies that used more than 40 features. The study also found that the prediction rate was almost constant up to five years, in comparison to previous studies that considered at most 18 months.
Niciun comentariu:
Trimiteți un comentariu