Research Article
Identifying facile material descriptors for Charpy impact toughness in low-alloy steel via machine learning

https://doi.org/10.1016/j.jmst.2022.05.051Get rights and content

Highlights

  • Facile descriptors of CIT in low-alloy steels were identified via a ML method.

  • The predicted values of models have a high consistency with the experimental values.

  • The mathematical expression of CIT is optimized by symbolic regression.

  • The ML model can guide the optimal design of low-alloy steel with desired properties.

Abstract

High toughness is highly desired for low-alloy steel in engineering structure applications, wherein Charpy impact toughness (CIT) is a critical factor determining the toughness performance. In the current work, CIT data of low-alloy steel were collected, and then CIT prediction models based on machine learning (ML) algorithms were established. Three feature construction strategies were proposed. One is solely based on alloy composition, another is based on alloy composition and heat treatment parameters, and the last one is based on alloy composition, heat treatment parameters, and physical features. A series of ML methods were used to effectively select models and material descriptors from a large number of alternatives. Compared with the strategy solely based on the alloy composition, the strategy based on alloy composition, heat treatment parameters together with physical features perform much better. Finally, a genetic programming (GP) based symbolic regression (SR) approach was developed to establish a physical meaningful formula between the selected features and targeted CIT data.

Introduction

Steel materials, with combinations of good economic benefits, excellent mechanical performance, and outstanding service properties, have become one of the most important materials in human civilization [1]. As one of the typical low-cost and high-performance steels, low-alloy steel is defined as the steel with a total amount of the alloy elements Mn, Si, Cu, Ni, Cr, P, etc., less than 5 wt.% [2]. According to the classical defect theory [3,4], strength and toughness usually display mutual exclusion, wherein strength represents the stress of material against irreversible deformation, whereas toughness represents the ability of a material to resist fracture [5]. In engineering structures and machines, the toughness of materials can be measured as the energy required to cause a fracture and is a significant indicator to avoid catastrophic fractures [6].

Due to its simplicity and relatively small sample requirement, Charpy impact test has been widely used to evaluate the impact toughness of steels. A metal sample with standard size and a V-shaped or U-shaped notch is tested in a simply supported beam state on a Charpy impact testing machine [7]. The absorbed energy of the tested sample during the failure process is evaluated by the height difference of the pendulum [8]. Nonetheless, the failure mechanism for structural materials in Charpy impact toughness (CIT) tests has been not fully understood at present, and effectively improved strategies are lacking in this community [9,10].

The CIT performance depends on many factors including facile material properties, processing states of the studied sample, external test conditions, etc. [11]; and some statistical models have been proposed to quantify their correlation with the CIT property in steels. To predict the toughness of structural steels more intuitively, Oh [12] proposed a simplified toughness model based on finite tensile data, which was verified by comparing the tensile test data of several typical structural steels that the predicted value and the measured value show a linear correlation. To build a relationship between impact energy and chemical composition in cast duplex stainless steels, Thankachan and Sooryaprakash [13] constructed an artificial neural network (ANN) model to predict the CIT of 220 duplex stainless steel. It was found that a multilayer feed-forward ANN model with two hidden layers provides the best linear correlation between chemical composition and impact energy, and it was effective to develop duplex stainless steels with required impact toughness. However, as facile material descriptors beyond composition have not been considered in the previous work, more complex non-linear relationships result in difficulties in modeling and prediction [14]. The development of material informatics provides a possibility to realize the complex nonlinear quantitative relationship [15].

Recently, machine learning (ML) approaches have been widely utilized to predict the properties of metal materials [16], [17], [18], [19], [20]. For example, Xue et al. [21] predicted the transformation temperature of shape memory alloys using an ML model established by three intrinsic material descriptors. Wen et al. [22] formulated a property-orientated materials design strategy combining ML, experiment, and feedback to search for high-entropy alloys with targeted hardness. Wang et al. [23] established a design system based on ML and high-throughput optimization algorithms to improve the comprehensive mechanical properties of reduced activation ferritic/martensitic steels, and the results confirmed that the optimized ML method is reliable and effective in discovering new alloys and predicting their properties [24].

In the present work, an effective prediction model of low-alloy steel based on ML was established to build the quantitative relationship between facile material descriptors and CIT performance, and then, improved strategies for high-performance CIT were proposed. The critical facile features for data-driven modeling in the CIT of low-alloy steel were selected and determined via feature engineering. Different ML algorithms were performed to optimize prediction models for the CIT performance of low-alloy steel, wherein a range of different evaluation indexes are applied. Moreover, a physically interpretable formula between CIT and selected features was obtained by using a symbolic regression (SR) method based on symbolic regression (GP).

Section snippets

Methodology

Generally, the relationship between material features (xi) and target attribute Y can be derived from the surrogate model (F) via ML algorithms, that is Y = F(xi). If there are p ML algorithms and n material features, then there are p × (2n−1) alternative combinations [25]. This study aims to establish a quantitative relationship between these features and CIT via ML approaches in low-alloy steels, and to choose the most suitable combination of ML models and features. The flowchart is

Results

Different ML surrogate models were trained based on the dataset containing CIT (y) and composition (Ci) for each element. The obtained surrogate model with composition, i.e., yi=f(Ci), was applied to a material searching space, which was represented by Strategy I. In addition, to incorporate more materials descriptors to improve the performance, heat treatment parameters (Hi) and physical features (Pi) were adopted to describe the alloys, which were indicated by Strategy II and Strategy III,

Discussion

Considering the developed RF model is a “black box”, an interpretable algorithm SR was utilized to identify a mathematical formula for predicting the CIT performance [15,38]. SR is an approach to acquiring a proper mathematical formula to represent the observed data, with two ambitions: maximizing prediction accuracy and minimizing the complexity of the formulas produced. The optimization algorithm used in SR is individual from traditional analysis/numerical optimization methods [39]. Instead

Conclusion

In summary, CIT, which represents toughness performance, directly determines the safety of low-alloy steel commonly used as structural materials. Three feature construction strategies were proposed, and different CIT prediction models based on the three strategies were established. It was found that the strategy based on both alloy composition and descriptors of the properties of low-alloy steel outperforms the strategy merely based on the alloy composition. By using ML approaches, including

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Nos. 52122408, 52071023, 52071038, 51901013), H.H. Wu also thanks the financial support from the Fundamental Research Funds for the Central Universities (University of Science and Technology Beijing) (Nos. FRF-TP-2021-04C1 and 06500135). The computing work was supported by USTB MatCom of Beijing Advanced Innovation Center for Materials Genome Engineering.

References (45)

  • D. Bacon et al.

    Prog. Mater. Sci.

    (1980)
  • S.W. Wang et al.

    Sci. Bull.

    (2021)
  • H. Kim et al.

    Acta Mater.

    (2015)
  • J. Brnic et al.

    Mater. Des.

    (2013)
  • P. Xie et al.

    J. Mater. Sci. Technol.

    (2021)
  • G. Oh

    Int. J. Pres. Ves. Pip.

    (2022)
  • C.S. Shen et al.

    J. Mater. Sci. Technol.

    (2021)
  • Y.L. Liu et al.

    J. Mater. Sci. Technol.

    (2020)
  • J. Xiong et al.

    J. Mater. Sci. Technol.

    (2021)
  • J. Xiong et al.

    J. Mater. Sci. Technol.

    (2022)
  • Q. Lu et al.

    Mater. Des.

    (2020)
  • X.Y. Zhou et al.

    Acta Mater.

    (2022)
  • D.Z. Xue et al.

    Acta Mater.

    (2017)
  • C. Wen et al.

    Acta Mater.

    (2019)
  • C.C. Wang et al.

    Nucl. Eng. Technol.

    (2020)
  • Z.L. Wang et al.

    Mater. Sci. Eng. A

    (2019)
  • Y. Zhang et al.

    Acta Mater.

    (2020)
  • J.J. He et al.

    Acta Mater.

    (2021)
  • X. Jiang et al.

    Scr. Mater.

    (2020)
  • Y.P. Diao et al.

    J. Mater. Sci. Technol.

    (2022)
  • M. Militzer

    Science

    (2002)
  • M. Rashid

    Science

    (1980)
  • Cited by (30)

    • Data-driven design of brake pad composites for high-speed trains

      2023, Journal of Materials Research and Technology
    View all citing articles on Scopus
    View full text