156
Views
0
CrossRef citations to date
0
Altmetric
Focus on Dr. Ariga 60th Anniversary: From Nanotechnology to Nanoarchitectonics Select: 60 Materials Informatics

Machine learning strategy to improve impact strength for PP/cellulose composites via selection of biomass fillers

& ORCID Icon
Article: 2351356 | Received 24 Jan 2024, Accepted 30 Apr 2024, Accepted author version posted online: 08 May 2024
Accepted author version

ABSTRACT

Lignocellulosic materials have inherent complexities and natural nanoarchitectures, such as various chemical constituents in wood cell walls, structural factors such as fillers, surface properties, and variations in production. Recently, the development of lignocellulosic filler-reinforced polymer composites has attracted increasing attention due to their potential in various industries, which are recognized for environmental sustainability and impressive mechanical properties. The growing demand for these composites comes with increased complexity regarding their specifications. Conventional trial-and-error methods to achieve desired properties are time-intensive and costly, posing challenges to efficient production. Addressing these issues, our research employs a data-driven approach to streamline the development of lignocellulosic composites. In this study, we developed a machine learning (ML)-assisted prediction model for the impact energy of the lignocellulosic filler-reinforced polypropylene (PP) composites. Firstly, we focused on the influence of natural supramolecular structures in biomass fillers, where the Fourier transform infrared spectra and the specific surface area are used, on the mechanical properties of the PP composites. Subsequently, the effectiveness of the ML model was verified by selecting and preparing promising composites. This model demonstrated sufficient accuracy for predicting the impact energy of the PP composites. In essence, this approach streamlines selecting wood species, saving valuable time.

GRAPHICAL ABSTRACT

Impact Statement

This paper introduces a data-driven method to efficiently design lignocellulosic polymer composites with high-impact energy, optimizing components and surface areas using infrared spectroscopic data.

Disclaimer

As a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
This article is part of the following collections:
Dr. Ariga 60th Anniversary: From Nanotechnology to Nanoarchitectonics

1. Introduction

A growing interest has been in utilizing lignocellulose fillers (LFs) derived from wood biomass as thermoplastic reinforcement [Citation1,Citation2]. These natural fillers offer numerous advantages, including abundance, renewability, low specific gravity, high specific strength and stiffness, and cost-effectiveness, enabling lightweight construction designs and reducing petroleum resources. LFs have been converted to nanofibrillated fillers, which are called lignocellulose nanofibers (LCNFs), via mechanical fibrillation processes, such as wet disk-milling, gaining significant attention as superior reinforcements in olefin-based polymer nanocomposites compared to non-/less-fibrillated lignocellulosics [Citation3–5]. Since these LFs and LCNFs have inherent complexities and natural nanoarchitectures, such as a variety of chemical constituents in wood cell walls (cellulose, hemicellulose, lignin, and others), various structural factors such as fillers (fiber width/length, specific surface area (SSA), crystallinity index (CI), etc.), surface properties (wettability, reactivity, etc.), and variation in production (wood species, provenance, production year, etc.), comprehensive utilization of such natural fillers for polymer composites is not straightforward, requiring laborious trial-and-error optimization process to achieve good dispersion in thermoplastics (polyethylene, polypropylene (PP), etc.), and thus high mechanical properties (modulus, strength, impact energy, etc.) (). Furthermore, further steps need to be optimized in producing polymer composites, such as the base material selection, filler pretreatment, and composite processes [Citation6].

Several studies have been conducted to address this challenge to improve the mechanical properties of LFs/LCNFs - reinforced polymer composites [Citation7]. Among various mechanical properties, enhancing impact properties is a crucial issue directly linked to the durability and safety of the products [Citation8]. Since composites are inherently complex because of their numerous parameters, one strategy is understanding how specific major parameters could affect impact property. For example, variations in constituent components, such as the cellulose/lignin/hemicellulose ratio in biomass fillers, affect the mechanical properties of these polymer composites materials, in which a complexity extends to differences among wood species [Citation9–14]. A comparative review of lignocellulose composites has highlighted superior impact strength [Citation15]. T The SSA, fiber shape, and filler dispersibility within the base material significantly affect the mechanical properties of the composites [Citation16]. Pavithran et al. emphasized the contribution of larger angles of spiral cellulose microfibrils in cell walls to a higher fracture toughness [Citation17]. Mueller et al. showed that higher fiber fineness in composites has been linked to enhanced impact performance [Citation18]. In a broader context, the influence of external temperature and water on the impact properties of LFs/LCNFs-reinforced thermoplastics has been thoroughly reviewed [Citation19]. Despite valuable insights into lignocellulose composites, the nanoscale implications of the impact energy remain relatively unexplored and warrant further research. This is because the impact property should be related not only to one parameter but also to several parameters that are connected to each other dependently. Thus, notable discrepancies exist between the theoretical predictions and the experimental results for composites. The complexity arises from various factors, including filler size [Citation20], interactions [Citation21], chemical properties [Citation22–24], and microscopic spatial arrangement (dispersion, aggregation, ligament, percolation), making them difficult to characterize [Citation25,Citation26]. This complexity raises questions regarding the accuracy and reliability of predictive models.

Recently, machine learning (ML) has emerged as a powerful tool for addressing the computational challenges enabled by the advancements in big data science [Citation27,Citation28]. ML methods have shown potential for addressing complex problems, particularly inorganic materials with disordered microstructures, such as glasses and alloys [Citation29–31]. These techniques excel in processing high-dimensional input vectors and can effectively uncover the intricate relationships, potentially contributing to figuring out the impact properties of LFs-/LCNFs-reinforced polymer composites related to several parameters. Nevertheless, the application of ML to polymer materials, particularly in polymer composites, has limitations [Citation32]. A primary hindrance is the time-consuming nature of data acquisition in this domain. Moreover, even when past data are available, their utility is restricted by their inadequacy because of substantial process variability inherent in material manufacturing. Consequently, exploring the potential of ML in the context of polymer composites is necessary to overcome these challenges and unlock new opportunities for the rapid and efficient development of polymer composites.

In this study, we aimed to develop a novel and versatile ML strategy to improve mechanical properties, especially impact property, for LFs-/LCNFs-reinforced PP composites, which is highly demanding as an alternative for conventional talc-/glass fiber-reinforced PP composites used in automobiles, daily goods, and others. shows the scheme of this study. Firstly, in (step i), we focused on the influence of natural supramolecular structures in each wood species related to chemical compositional characteristics and fiber size properties of LFs and LCNFs, where Fourier transform infrared (FTIR) and SSA data are effectively used as spectral and filler information for the developing ML, on the mechanical properties of the PP composites based on principal component analysis (PCA). Subsequently, in step (ii) in , the effectiveness of the ML model was verified by selecting and preparing promising LFs and LCNFs-PP composites. We emphasize that this study’s original contribution lies in employing IR spectra to predict the mechanical properties of plastic/wood composites in the process simply and successfully, which is a challenging task since there are various factors affecting the properties of raw materials and final products during the process. Moreover, it is possible to incorporate information about the material manufacturing differences using SSA, reflecting different fibrillation processes. Although the combination of IR with ML and PCA have been utilized to identify wood species [Citation33] or to evaluate the enzymatic efficiency for saccharification [Citation34], this study demonstrates another challenging linkage between raw materials and composites. This study offers a novel approach to constructing a physical property prediction model based on woody raw material information for polymer composites.

2. Experimental section

2.1 Basic data for prediction models

The data for the characterization of 14 kinds of lignocellulosic fibrillated fillers obtained by different mechanical processes, including dry pulverization (Dry-P) and subsequent wet disk milling for four and ten passes (Wet-DM-4, Wet-DM-10) (Table S1), and the mechanical properties, including tensile, flexural, and Izod impact properties of PP/lignocellulosic fibrillated fillers composites (Table S2), are based on the results obtained from the Development of Technologies for Manufacturing Processes of Chemicals Derived from Inedible Plants Project, commissioned by the New Energy and Industrial Technology Development Organization (NEDO), Japan (No. JPNP13006), which were collected before this study. lists the 14 raw materials of cellulose used in this study. Figure S1 shows the morphology of selected typical LCNFs using scanning electron microscope (SEM).

Table 1. Cellulosic raw materials used for prediction models.

2.2 FTIR spectroscopic data for prediction models

Attenuated total reflection (ATR) FTIR analyses of the 14 types of lignocellulosic fibrillated fillers after carrying out Dry-P, Wet-DM-4, and Wet-DM-10 in the dry state using a Frontier instrument (PerkinElmer Inc., USA). All spectra were obtained by accumulation of 32 scans in the 4000–600 cm−1 range. The resolution was 4 cm−1. All the IR spectra obtained for each sample (14 types × 3 processes = 42 samples) were transformed as follows.

Standard normal variate (SNV) transformation

SNV is used not only to eliminate the effects of uneven particle distribution, surface scattering, and varying particle sizes on the spectrum but also to remove the influence of optical path reflection on the diffuse reflectance spectra. SNV operates by subtracting the mean value of the entire spectrum to remove a constant offset term and then dividing it by the standard deviation of the entire spectrum, scaling all spectra to the same level. The SNV transformation was applied to the spectral data and is expressed as

(1) Xi,SNV=Xi,kXik=1m(Xi,kXi)2m1,(1)

whereXi,SNV is the SNV-transformed value of the i-th data point in a specific spectrum,Xi,k is the original intensity of the i-th data point in the k-th spectrum, Xi is the mean intensity value of the i-th data point across all m spectra,(k=1m(Xi,kXi)2) calculates the sum of the squared differences between each Xi,k and the mean Xi for all m spectra, where m denotes the total number of spectra, and (k=1m(Xi,kXi)2m1) is the standard deviation of the i-th data point across all m spectra.

2.3 PCA

PCA is crucial for studying trends, groupings, and outliers in large bilinear data structures. It is possible to determine the main variations in a multidimensional dataset by creating principal components (PC), which are new linear combinations of the fundamental latent structure of the original raw data in PCA. This enables the recognition of the characteristics and emphasizes the correlation with the physicochemical properties of the samples. This technique helps interpret ATR-FTIR spectra, in which the diversity and complexity of bands appear based on the source of the samples. The spectral data used in the PCA were based on the ATR-IR data obtained using the Savitzky–Golay (SG) filter for smoothing and secondary differentiation. The PCA and SG filters were computed using Python 3.10.

2.4 Autocorrelation matrix

Correlation analysis is a statistical method used to assess the strength and direction of the relationships between two variables. This relationship is quantified using the correlation coefficient, varying within a range of -1 to +1. A correlation coefficient of ±1 indicates a perfect relationship between the variables, with the sign ’+’ or ‘−’ denoting the direction of this relationship. The value of 0 indicates the absence of any relationship between the variables. The Pearson’s correlation coefficient is used under the assumption that the variables under consideration are normally distributed, making it the most applied measure of correlation in statistical analyses, and expressed as the following equation:

(2) rxy=nxiyixiyi(nxi2(xi)2)(nyi2(yi)2),(2)

where the Pearson correlation coefficient, denoted by rxy, quantifies the linear relationship between two variables x and y, n represents the number of observations, and xi and yi are the values of variables x and y for the i-th observation, respectively. Comprehensive computation of these coefficients was facilitated using Python 3.10, ensuring an exhaustive pairwise correlation analysis within the dataset.

2.5 Construction of prediction model by ML

Python 3.10 was used to construct the ML model for the Izod impact energy of the lignocellulosic fibrillated filler-reinforced PP composites based on experimental data (SSA in Table S1, Izod impact energy in Table S2, and PC1 and PC2 data from the PCA-analyzed second deviated IR spectra (Figure S2)), and that for the tensile breaking strain from SSA and CI in Table S1, tensile breaking strain in Table S2, and PC3 data from the PCA-analyzed second deviated IR spectra (Figure S2).

Multiple linear regression (MLR)

MLR is used to predict the linear relationship between a dependent variable and multiple independent variables. In MLR, the independent variables influence the dependent variable. Therefore, the independent variable can be used as a predictive factor when the relationship with the dependent variable is validated. Constant and regression coefficients were calculated for each variable to explain the relationship between independent and dependent variables. The general multiple regression equation is given by

(3) Y=β0+β1X1++βnXn+ε,(3)

where Y is the dependent variable, β0 is the constant, β1 to βn are the regression coefficients, X1 to Xn are the independent variables, andεis the error term. In the case of the Izod impact energy prediction model, SSA of the lignocellulosic fibrillated fillers listed in Table S1, PC1, and PC2 from IR were X1, X2, and X3, respectively, and the Izod impact energy listed in Table S2 for PP/lignocellulosic fibrillated fillers composites were Y. In the case of the tensile breaking strain prediction model, SSA and CI for the lignocellulosic fibrillated fillers listed in Table S1 and PC3 from the IR were X1, X2, and X3, and the tensile breaking strain listed in Table S2 for the PP/cellulosic fibrillated fillers was Y. The experimentally obtained data (14 types × 3 processes = 42 samples) were randomly divided into two groups: training data sets (70%; 29 data sets) and testing data sets (30%; 13 data sets) for predicting the Izod impact energy or tensile breaking strain.

In these analysis examples, they are given by

Yizodimpactenery=β0+β1XSSA+β2XPC1(IR)+β3XPC2(IR)+ε
Ytensilebreakingstrain=β0+β1XSSA+β2XCI+β3XPC3(IR)+ε

2.6 Discovering promising composites by data-driven approach

Approximately 50 wood plates (Table S3) purchased from Takada Seizaijo (Fukuoka, Japan) with dimensions of 135 ×180 ×12 mm (i.e., width × length × thickness) were used to evaluate the robustness of the Izod energy prediction models. These wood plates were subjected to ATR-FTIR measurements, and the IR spectra were converted to second-derivative spectra via SNV transformation. Subsequently, the PCA data with the average SSA, grouped in softwood and hardwood with the same fibrillation condition, were applied to the Izod impact energy prediction model (EquationEquation 3-i). Because of the inherent difficulty in accurately measuring the SSA for empirical demonstration, the average values from the dataset were used in the calculations. These included a SSA of 5 m2/g for all wood species with Dry-P, 20 m2/g for softwoods with Wet-DM-4, 12 m2/g for hardwoods with Wet-DM-4, 118 m2/g for softwoods with Wet-DM-10, and 46 m2/g for hardwoods with Wet-DM-10. For the sample S3, S24, and S43 for RUN1 (Table S3) and S9, S30, S33, and S34 (Table S3) for RUN2, the SSA values were determined by the BET (Brunauer–Emmett–Teller) method using a BELSORP-mini (MicrotracBEL Corp., Japan). The freeze-dried filler samples of approximately 150 mg were heated at 105 °C under vacuum for at least 3 h to remove adsorbed species. Nitrogen adsorption/desorption data were collected, and the BET equation was used for calculating SSA.

In RUN1, six specimens from three tree species (two processes) were selected based on predicted values, S3, S24, and S43 (Table S3 and ), and subjected to experiments as above. Subsequently, the additionally-obtained data including the results of izod impact energy and SSA were added to the dataset of the ML prediction model for RUN2. Then, RUN2 involved subjecting around 45 specimens, by subtracting the three specimens from RUN 1‘s validation tests from Table S3, to PCA and conducting multiple regressions similarly (). Then, four tree species (two processes) were selected based on predicted values, S9, S30, S33, and S34 (Table S3 and ), and subjected to experiments as above. Subsequently, the additionally-obtained data including the results of Izod impact energy and SSA were added to the dataset of the another ML prediction model.

Table 2. Results of two-round practical evaluation (RUN1 and RUN2) showing SSA, predicted and measured Izod impact energy values for the ML-selected LCNFs (Wet-DM-4)/PP composites as the experimental demonstration using multiple regression model, eq. 3-i.

2.7 Production of lignocellulosic fibrillated filler-reinforced PP composites

Isotactic PP (NOVATEC MA3, Japan Polypropylene Co., weight-average molecular weight: ca. 397,000; density: 0.900 g/cm3; melt flow rate (MFR): 11) and maleic acid-modified PP (MAPP, Kayabrid 006PP, Kayaku Akzo Chemical Co., Ltd., ca. 0.6 wt% MA; density: 0.950 g/cm3; MFR: 115) were used without purification. Fibrillation, compounding, injection molding, and Izod impact measurements were conducted similarly to the reported condition [Citation35], described in detail in the SI.

2.8 Fractography using SEM images

The morphological characteristics of the fractured surfaces of composite samples that underwent an Izod impact test were observed using a field emission SEM (FE-SEM, S-4800, Hitachi High-Technologies Co., Ltd., Tokyo, Japan). An acceleration voltage of 1.0-3.0 kV was used. The samples were coated with around 3 nm thick layer of osmium using an osmium plasma coater (NEOC-AN, Meiwa Fosis Co., Ltd., Tokyo, Japan).

3. Results and discussion

3.1 Dataset

In the dataset (see 2.1 Basic data for prediction models), Japanese cedar (JC), Chinese fir (CF), Japanese larch (JL), Sakhalin fir (SF), white birch (WB), eucalyptus nitens (EN), and moso bamboo (MB) are listed as raw materials (). The raw materials varied in terms of their provenance and age, classified as either juvenile wood (abbreviated as “J,” part: 1–15 years) or mature wood (abbreviated as “M”, over 16 years). Each material is abbreviated using a combination of the species name, provenance, and age category. For instance, juvenile wood of the JC species sourced from Ibaraki is designated as “JC-I-J.” Table S1 lists the characterization obtained from pristine wood materials with each pretreatment, where three kinds of filler, such as the fibrillated LFs (100-μm pass wood flour obtained by dry fine pulverization) and LCNFs obtained by the wet-DM method for 4- and 10-times treatments, are used for the PP composites. LCNFs are manufactured by mechanical processing from LFs without pulping or chemical processing. Table S2 summarizes the various mechanical properties obtained from the composites, where the filler content is 5 wt% in LFs, LCNFs-reinforced PP composites. In general, the continued disintegration of fibers results in notable trends within their mechanical properties: an increase in Young’s modulus is observed; in flexural tests, a similar upward trend is noted, with both the modulus of elasticity and maximum strength showing increases. These mechanical changes are influenced by a range of factors: the inherent properties of the filler, its reactivity with the compatibilizer MAPP, and the level of dispersion, to name a few. However, the Izod impact energy displays less predictable behavior. This unpredictability is particularly pronounced when considering impact performance, making it challenging to develop a predictive model for material properties based solely on these data. Consequently, implementing ML techniques is viewed as an effective solution to address these complexities. Wood comprises three major molecules (i.e., cellulose, hemicellulose, and lignin) with different properties and specific FTIR absorptions [Citation36].

shows the FTIR spectra of typical plant biomasses from softwood (JC-I-J), hardwood (EN-J), and glass (MB). The major bands of the three main components are listed in Table S4. The spectra obtained in this study () were subjected to SNV transformation as part of the experimental methodology to minimize measurement errors (). Evidently, the compositions of the respective source materials are highly similar, as indicated by their nearly identical IR spectra (). Therefore, the second derivative of the spectra was used as a band-narrowing/peak-sharpening method to identify hidden peaks (). We observed that even though the spectra in appear similar at first glance, differentiation becomes apparent upon conducting a second derivative analysis, particularly in the regions highlighted by the orange and gray bands, where odd numbers were highlighted in orange and even numbers in gray in reference to the band regions attributed in Table S4 for visual clarity. This finding is significant as cellulose and hemicellulose exhibit similar absorption bands, which tend to overlap as a broad single peak in the original data. However, the second derivative analysis allows for characterizing peak shapes and asymmetries, effectively distinguishing between these components. In the following section, we will show that ML made it possible to connect the different features of these second-deviation absorption bands with physical property data.

3.2 Data mining

3.2.1 Correlation coefficient

Data-driven approaches, such as statistical analysis and ML, are based on the correlations inherent in the dataset. Each composite dataset was evaluated using Pearson correlation coefficients to identify interesting correlation coefficients between pairs of properties (Table S1, S2), as shown in . These coefficients included some that were expected and others that were newly discovered. The Pearson correlation coefficients in show the blue patches for positive correlations, red patches for negative correlations, and white patches for uncorrelated coefficients. The sizes of the patches reflect the magnitude of the correlation coefficient. Collinearity analysis showed no highly correlated properties except those obtained from the same measurements. As expected, the physical property parameters calculated from the same measurement showed large and small correlations, such as the tensile stress–tensile breaking strain, tensile breaking strain–Young’s modulus, and flexural modulus–ultimate strength. Parameters derived from similar experiments, such as Young’s modulus- flexural modulus and tensile strength- ultimate strength, also showed predictable correlations.

The high positive correlation between the SSA and tensile breaking strain is a fascinating insight. There are several papers that focus on the correlation in-between SSA and properties, for example, an increase in SSA results in stronger surface energy [Citation37]. While this could imply easier aggregation effects, surface energy also influences polymer relaxation [Citation21,Citation38]. With careful preparation, cellulosic fillers could form strong interfacial strength with polar sites like MAPP, affecting dispersibility due to increased repulsion with PP. However, to the best of our knowledge, there were no specific studies focusing on correlation between the SSA and the strain of composites using CNFs, so the correlation between them remains unclear.

Furthermore, the amounts of sugar composition %xylose unit (%Xyl) and %mannose unit (%Man) differed between softwoods and hardwoods, exhibiting a correlation with mechanical properties, including impact energy. The %Xyl and impact energy show a highly positive correlation, whereas %Man is negatively correlated with Young’s modulus. These unexpected findings suggest significant differences between softwood and hardwood in the mechanical properties of the PP composites. It is noted that while some aspects regarding the constituent components and properties are described in the referenced paper [Citation39], there are still unclear areas. Thus, this study lies in the ability to capture trends conveniently from the correlation coefficient heatmap generated in this analysis.

Consequently, it is desirable to predict the material properties based on the chemical composition of fillers. However, component analysis, such as sugar composition, is time-consuming and costly. The IR method discussed in Section 3.2.1 offers a more convenient alternative. Moreover, it provides more information than the sugar composition analysis. Therefore, we proceeded to investigate a method for examining correlations using IR data based on PCA.

It should be noted about the limitation of the relatively small sample size (42 specimens), which may introduce errors in the analysis. However, as stated by Tian et al., increasing data set size reduces errors in polymer material design, with optimal accuracy achieved with 30 to 40 samples in some cases [Citation40]. To use the same facility and equipment, and accumulating high-precision data, we emphasize that it was possible to effectively utilize ML even with a small dataset. For future prospects, it is imperative to complement datasets to achieve more accurate predictions of material properties. One approach is to use artificial intelligence to conduct virtual experiments by generating data from techniques such as IR spectroscopy and SSA, thereby increasing the number of samples [Citation40].

3.2.2 PCA

PCA is a dimensionality-reduction algorithm that projects the original space of predictors onto a lower-dimensional subspace. Linear regression on high-dimensional data (millions per sample) is inappropriate for standard regression due to high variability in goodness of fit, model overfitting on the training set, and poor prediction accuracy on the test set (resulting in a usable model for that material only). By sacrificing some finer details by performing PCA, the new set of features provides a more practical representation of the data. Standard linear regression is easy to implement but has significant limitations in terms of predictive power because of the linearity assumption. shows a score plot for PCA and a loading plot for ATR-IR for the 42 different wood species. The summarized results of PCA for IR spectroscopy are illustrated in Figure S2. In , peaks correlating with PCs values were identified in regions labeled 2, 3, 4, 7, 8, 9, 11, 12, and 14 in the loading plot of the second derivative ATR-IR spectrum. Apparently, peaks correlating with lignin’s characteristic features were captured (regions 2, 3, 4). Furthermore, despite the structural similarities between cellulose and hemicellulose, the segregation of softwoods and hardwoods (including bamboo) along the PC1 direction, as illustrated in , suggests a stronger reflection of the hemicellulose components.

The results of PCA for IR spectroscopy are incorporated into to reinforce these intriguing correlations. Notably, PC1 successfully segregated the groups, with softwoods, distinguished by %Man or %Xyl, in the negative direction and hardwoods in the positive direction. The hardwoods-derived LFs and LCNFs/PP composites exhibited relatively high impact energies when color-coded by impact energy values. Furthermore, examining the information from PC1 revealed correlations with other sugar compositions, suggesting that IR contains a wealth of information. These findings imply correlations between compositional components and material properties and the potential for obtaining information beyond sugar composition analysis using IR data.

3.3 Prediction model

We developed a model to predict the impact test results for lignocellulosic fibrillated fillers-reinforced PP composites. We utilized three independent variables, namely the SSA and the principal component scores (PC1 and PC2), obtained from the second derivative IR spectra. To ensure the impartiality of the test set selection process, we randomly divided the entire sample of 42 specimens into two subsets. The training data subset comprised 70% of the total sample, and the test data subset accounted for the remaining 30% (randomly selected, as shown in Table S2). We used the training data subset to calculate the constant coefficients βi for the regression EquationEquation 3 to minimize the total squared error. We focus on developing predictive models for Izod impact energy and tensile breaking strain, which hold significance in composites. These predictive models are shown in . Initially, the multiple regression analysis for impact energy prediction yielded an R-squared (R2) value of approximately 0.70 in . presents a regression model for predicting the tensile breaking strain of the composites developed using a similar methodology. The model exhibits a high R2 value of 0.87, indicating substantial predictive accuracy. However, it should be noted that this high level of accuracy may partially stem from the distinct clarity between good and poor experimental results, which tends to facilitate the model’s ability to select higher values predominantly from the extreme ends of the data range. Thus, the missing measurement data within the range of 150-450% may play a significant role in the regression. By incorporating additional data, particularly in the range of 200-400, there is a promising opportunity to develop a truly predictive model that is applicable for practical use. Our findings provide a foundational understanding for future research, paving the way for further investigations that can build on the intricate details uncovered in this study.

3.4 Practical evaluation

This study used the multiple regression model to predict the impact energies of previously unknown 45 wood species listed in Table S3. Every wood species was investigated by ATR-FTIR, and the prediction model was applied. Finally, some woody fillers were chosen for practical evaluation in PP composites (). For RUN1, Gingo (S3), Japanese chestnut (S24), and Yellow birch (S43) were selected based on property predictions obtained from multiple regression analysis. Among these, S24 and S43 yielded measured values close to the predicted values.

The results of practical evaluation from the RUN1 were further incorporated into the dataset to construct the model for RUN2 (). For RUN2, Spruce (S9), Monarch birch (S30), Paulownia (S33), and Persimmon (S34) were selected based on new property predictions obtained from multiple regression analysis. Monarch birch and persimmons have emerged as promising candidates. Notably, the performance of S30, with a predicted value of 1.84 and 1.86 compared to an actual measurement of 2.43 and 2.39 for Dry-P and Wet-DM-4, respectively, is significant. This represents a significant improvement, exceeding 190%, in contrast to the measured value of 1.26 for the neat PP. However, it is essential to acknowledge that the predicted values tended to be consistently higher overall. The Izod impact energy values for PP in the dataset and recently experimentally obtained data were 1.05 and 1.26, respectively, suggesting that the different lot for PP pellets could affect the impact test. The improvement in the R2 value, mainly achieved by using IR and SSA in RUN1, supported the utility of this information. While the primarily dataset mainly focused on softwoods, the dataset became considerably enriched through the practical evaluation (RUN1 and 2), comprising 12 softwood species, 9 hardwood species, and 1 glass species, totaling 56 specimens (42 specimens in Table S1 + 14 species in ) across different processes. Although the cause of some discrepancy between predicted and measured impact energy values in is unclear, it suggests the existence of properties beyond those assessed using IR and the SSA used in the property predictions. Identifying and incorporating these additional properties into future models is challenging for ongoing research.

The fractography underwent an impact test using SEM images, as shown in . In MB with Wet-DM-4/PP composites, which is in the category of higher impact energy composites, there was significant fiber exposure in the fractured surface, along with observable holes due to fiber pullout (). Moreover, the MB fibers exposed in the fracture were longer at their bases (). In contrast, the JC-I-M (Dry-P)/PP composite sample, which is in the category of lower impact energy composites, exhibited less protruded fibers and holes from the fracture surface, indicating good adhesion properties between the fillers and PP (). The circled area represents the tip of a fiber in . It was thus inferred that fracture predominantly occurred within the matrix, particularly in areas with fewer adhesive interfaces at the fiber tips. In the case of Monarch Birch (S30)/PP composites, the number of fibers and holes observed on the fracture surface were somewhat noticeable (). The size and quantity of holes were greater than those in JC-I-M with Dry-P/PP composite. Upon examining magnified images, fiber breakage/tearing seems to be apparent in monarch birch-derived composite (), a phenomenon not often seen in MB Dry-P/W-DM-4, JCI-M Dry-P or JL-M W-DM-4 composite samples. This suggests that the fibers may have absorbed impact through tearing in monarch birch. The impact causes the fibers to detach moderately, thereby dispersing energy effectively. It is noteworthy that through ML, samples were efficiently extracted and analyzed, facilitating the acquisition of these important insights.

4. Conclusions

From the dataset for 42 wood samples/PP composites were used for establishing prediction models. The distinctive aspect of this study is the selection of descriptors using correlation coefficients, encompassing both IR spectroscopy and SSA data for lignocellulosic fibrillated fillers, integrated for regression analysis. In IR spectroscopy, second derivative spectra were employed to differentiate components like hemicellulose from cellulose despite their similar peaks. Furthermore, we enhanced the clarity of spectral features by applying PCA for dimension reduction. The explanatory variables for ML were refined using correlation coefficients. The prediction technique investigated a developed multiple regression analysis model constructed from SSA, PC1, and PC2 derived from IR spectra. This model demonstrated an R2 greater than 0.7, indicating sufficient accuracy for predicting impact energy. In essence, this approach streamlines selecting wood species, saving valuable time. While the test’s applicability varies based on its intended use, evaluating the physical properties of materials in controlled environments under extreme conditions benefits greatly from models encompassing a wide range of environmental conditions. Such models enable a more precise and comprehensive assessment.

In this study, we confirmed a remarkable enhancement in impact energy, reaching the highest levels observed. Specifically, compared to PP, monarch birch demonstrated an improvement of over 190% against neat PP, affirming the model’s efficacy. This consistency can be attributed to the strong correlation between the data and the formed structures. However, discrepancies were noted when the predicted values of the physical properties were compared with the actual measurements. It is hypothesized that this divergence was influenced by data not utilized in the current analysis, suggesting room for improvement.

Author Contributions

The manuscript was written with contributions from all authors. All authors have approved the final version of the manuscript.

Impact statement

This paper introduces a data-driven method to efficiently design lignocellulosic polymer composites with high impact energy, optimizing parameter set being based on specific surface areas and infrared spectroscopic data.

Figure 1. (a) Natural nanoarchitecture in LFs and LCNFs with relationship between mechanical properties. (b) Data-driven strategy in this study.

Figure 1. (a) Natural nanoarchitecture in LFs and LCNFs with relationship between mechanical properties. (b) Data-driven strategy in this study.

Figure 2. Spectrum analysis from (a) original, (b) SNV-transformed, and (c) SG-filtered second derivative spectra. Annotation 1-15 in Figure 2c (bottom) is referred to Table S4.

Figure 2. Spectrum analysis from (a) original, (b) SNV-transformed, and (c) SG-filtered second derivative spectra. Annotation 1-15 in Figure 2c (bottom) is referred to Table S4.

Figure 3. Heatmap of the correlation matrix generated by the Pearson correlation coefficient for mechanical properties, specific surface area (SSA), crystallinity index (CI), and sugar composition. The scale is set from -1 (red) to 1 (blue), and the squares sizes express the number scales.

Figure 3. Heatmap of the correlation matrix generated by the Pearson correlation coefficient for mechanical properties, specific surface area (SSA), crystallinity index (CI), and sugar composition. The scale is set from -1 (red) to 1 (blue), and the squares sizes express the number scales.

Figure 4. PCA plots for the 42 samples analyzed in this study. a) PCA score plot depicting PC1 versus PC2, b) PCA score plot depicting PC1 versus PC3, and c) a PC loading plot based on the ATR-FTIR spectra, which underwent preprocessing involving normalization and application of a Savitzky-Golay filter for second derivative transformation and smoothing. Shading in plots a) and b) demonstrates the grouping based on impact energy and tensile breaking strain, respectively. It should be noted that the colored ellipses in these plots serve solely for illustrative purposes.

Figure 4. PCA plots for the 42 samples analyzed in this study. a) PCA score plot depicting PC1 versus PC2, b) PCA score plot depicting PC1 versus PC3, and c) a PC loading plot based on the ATR-FTIR spectra, which underwent preprocessing involving normalization and application of a Savitzky-Golay filter for second derivative transformation and smoothing. Shading in plots a) and b) demonstrates the grouping based on impact energy and tensile breaking strain, respectively. It should be noted that the colored ellipses in these plots serve solely for illustrative purposes.

Figure 5. Relationship between the predictive and measured values from models for a) the impact energy (kJ/m2), and b) for the breaking strain. c) Relationship between the predictive and measured values from models for the impact energy (kJ/m2), in which the results of practical evaluation from the RUN 1 were incorporated into the dataset for the prediction model: blue circles: test data; black line: prediction model.

Figure 5. Relationship between the predictive and measured values from models for a) the impact energy (kJ/m2), and b) for the breaking strain. c) Relationship between the predictive and measured values from models for the impact energy (kJ/m2), in which the results of practical evaluation from the RUN 1 were incorporated into the dataset for the prediction model: blue circles: test data; black line: prediction model.

Figure 6. FE-SEM images for fractured surfaces of a, b) MB Wet-DM-4, c) JC-I-M Dry-P, and d,e) Monarch birch Wet-DM-4 reinforced PP composites. The images b) and e) are magnified views. The yellow arrows in Figures a), c), and e) indicate holes formed due to fibers detaching from the sample surface after the impact test. The yellow circle in Figure c) indicates the tip of a fiber in the PP matrix.

Figure 6. FE-SEM images for fractured surfaces of a, b) MB Wet-DM-4, c) JC-I-M Dry-P, and d,e) Monarch birch Wet-DM-4 reinforced PP composites. The images b) and e) are magnified views. The yellow arrows in Figures a), c), and e) indicate holes formed due to fibers detaching from the sample surface after the impact test. The yellow circle in Figure c) indicates the tip of a fiber in the PP matrix.

References

  • John M, Thomas S. Biofibres and biocomposites. Carbohydr Polym. 2008;71(3):343–364. doi: 10.1016/j.carbpol.2007.05.040
  • Faruk O, Bledzki AK, Fink H-P, et al. Biocomposites reinforced with natural fibers: 2000–2010. Prog Polym Sci. 2012;37(11):1552–1596. doi: 10.1016/j.progpolymsci.2012.04.003
  • Oksman K, Aitomäki Y, Mathew AP, et al. Review of the recent developments in cellulose nanocomposite processing. Compos Pt A Appl Sci Manuf. 2016;83:2–18.
  • Sharma A, Thakur M, Bhattacharya M, et al. Commercial application of cellulose nano-composites – A review. Biotechnol Rep. 2019;21. 21 10.1016/j.btre.2019.e00316
  • Kamel S. Nanotechnology and its applications in lignocellulosic composites, a mini review. eXPRESS Polym Lett. 2007;1(9):546–575. doi: 10.3144/expresspolymlett.2007.78
  • Choi S-M, Awaji H. Nanocomposites—a new material design concept. Sci Technol Adv Mat. 2016;6(1):2–10.
  • Kargarzadeh H, M. Sheltami R, Ahmad I, et al. Cellulose nanocrystal: A promising toughening agent for unsaturated polyester nanocomposite. Polymer. 2015;56:346–357.
  • Kalia S, Dufresne A, Cherian BM, et al. Cellulose-Based Bio- and Nanocomposites: A Review. Int J Polym Sci. 2011;2011:1–35. 10.1155/2011/837875
  • Scatolino MV, Silva DW, Bufalino L, et al. Influence of cellulose viscosity and residual lignin on water absorption of nanofibril films. Procedia Eng. 2017;200:155–161.10.1016/j.proeng.2017.07.023
  • Malucelli LC, Lacerda LG, Dziedzic M, et al. Preparation, properties and future perspectives of nanocrystals from agro-industrial residues: a review of recent research. Rev Environ Sci Bio/Technol. 2017;16(1):131–145. doi: 10.1007/s11157-017-9423-4
  • Hosseinaei O, Wang S, Enayati AA, et al. Effects of hemicellulose extraction on properties of wood flour and wood–plastic composites. Compos Pt A Appl Sci Manuf. 2012;43(4):686–694.
  • Hosseinaei O, Wang S, Taylor AM, et al. Effect of hemicellulose extraction on water absorption and mold susceptibility of wood–plastic composites. Int Biodeterior Biodegrad. 2012;71:29–35. 10.1016/j.ibiod.2011.12.015
  • Liu M, Meyer AS, Fernando D, et al. Effect of pectin and hemicellulose removal from hemp fibres on the mechanical properties of unidirectional hemp/epoxy composites. Compos Pt A Appl Sci Manuf. 2016;90:724–735.
  • Zhao Y, Sun H, Yang B, et al. Hemicellulose-Based Film: potential green films for food packaging. Polymers. 2020;12(8):1775. doi: 10.3390/polym12081775
  • Wambua P, Ivens J, Verpoest I. Natural fibres: can they replace glass in fibre reinforced plastics? Compos Sci Technol. 2003;63(9):1259–1264. doi: 10.1016/S0266-3538(03)00096-4
  • Delviawan A, Kojima Y, Kobori H, et al. The effect of wood particle size distribution on the mechanical properties of wood–plastic composite. J Wood Sci. 2019;65(1).10.1186/s10086-019-1846-9
  • Pavithran C, Mukherjee PS, Brahmakumar M. Coir-glass intermingled fibre hybrid composites. J Reinf Plast Compos. 1991;10(1):91–101. doi: 10.1177/073168449101000106
  • Mueller DH. Improving the Impact Strength of Natural Fiber Reinforced Composites by Specifically Designed Material and Process Parameters. Int Nonwovens J. 2018;os-13(4).
  • de Bruijn JCM. Natural fibre mat thermoplastic products from a processor’s point of view. Appl Compos Mater. 2000;7(5–6):415–420.
  • Deng S, Ma J, Guo Y, et al. One-step modification and nanofibrillation of microfibrillated cellulose for simultaneously reinforcing and toughening of poly(ε-caprolactone). Compos Sci Technol. 2018;157:168–177. 10.1016/j.compscitech.2017.10.029
  • Khoshkava V, Kamal MR. Effect of surface energy on dispersion and mechanical properties of polymer/nanocrystalline cellulose nanocomposites. Biomacromolecules. 2013;14(9):3155–3163. doi: 10.1021/bm400784j
  • Yuan H, Nishiyama Y, Wada M, et al.. Surface acylation of cellulose whiskers by drying aqueous emulsion. Biomacromolecules. 2006;7(3):696–700. doi: 10.1021/bm050828j
  • Goussé C, Chanzy H, Excoffier G, et al. Stable suspensions of partially silylated cellulose whiskers dispersed in organic solvents. Polymer. 2002;43(9):2645–2651. doi: 10.1016/S0032-3861(02)00051-4
  • Araki J, Wada M, Kuga S. Steric stabilization of a cellulose microcrystal suspension by poly(ethylene glycol) grafting. Langmuir. 2001;17(1):21–27. doi: 10.1021/la001070m
  • Jancar J, Douglas JF, Starr FW, et al. Current issues in research on structure–property relationships in polymer nanocomposites. Polymer. 2010;51(15):3321–3343. doi: 10.1016/j.polymer.2010.04.074
  • Pluta M, Paul MA, Alexandre M, et al. Plasticized polylactide/clay nanocomposites. I. The role of filler content and its surface organo‐modification on the physico‐chemical properties. J Polym Sci, Pt B Polym Phys. 2005;44(2):299–311.
  • Zhou T, Song Z, Sundmacher K. Big data creates new opportunities for materials research: a review on methods and applications of machine learning for materials design. Eng. 2019;5(6):1017–1026. doi: 10.1016/j.eng.2019.02.011
  • Carleo G, Cirac I, Cranmer K, et al. Machine learning and the physical sciences. Rev Mod Phys. 2019;91(4). 10.1103/RevModPhys.91.045002
  • Gates-Rector S, Blanton T. The powder diffraction file: a quality materials characterization database. Powder Diffr. 2019;34(4):352–360. doi: 10.1017/S0885715619000812
  • Saal JE, Kirklin S, Aykol M, et al.. Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD). JOM. 2013;65(11):1501–1509. doi: 10.1007/s11837-013-0755-4
  • Chaikittisilp W, Yamauchi Y, Ariga K. Material evolution with Nanotechnology, nanoarchitectonics, and materials informatics: what will be the next paradigm shift in nanoporous materials? Adv Mater. 2022;34(7):e2107212. doi: 10.1002/adma.202107212
  • Audus DJ, de Pablo JJ. Polymer informatics: opportunities and challenges. ACS Macro Lett. 2017;6(10):1078–1082. doi: 10.1021/acsmacrolett.7b00228
  • Sharma V, Yadav J, Kumar R, et al. On the rapid and non-destructive approach for wood identification using ATR-FTIR spectroscopy and chemometric methods. Vib Spectrosc. 2020;110. 110 10.1016/j.vibspec.2020.103097
  • Horikawa Y, Konakahara N, Imai T, et al. The structural changes in crystalline cellulose and effects on enzymatic digestibility. Polym Degrad Stab. 2013;98(11):2351–2356. doi: 10.1016/j.polymdegradstab.2013.08.004
  • Iwamoto S, Yamamoto S, Lee S-H, et al. Solid-state shear pulverization as effective treatment for dispersing lignocellulose nanofibers in polypropylene composites. Cellul. 2014;21(3):1573–1580. doi: 10.1007/s10570-014-0195-5
  • Horikawa Y, Hirano S, Mihashi A, et al. Prediction of lignin contents from infrared spectroscopy: Chemical Digestion and lignin/Biomass ratios of cryptomeria japonica. Appl Biochem Biotechnol. 2019;188(4):1066–1076. doi: 10.1007/s12010-019-02965-8
  • Wang B, Torres-Rendon JG, Yu J, et al. Aligned bioinspired cellulose nanocrystal-based nanocomposites with synergetic mechanical properties and improved hygromechanical performance. ACS Appl Mater Interfaces. 2015;7(8):4595–4607. doi: 10.1021/am507726t
  • Espino-Perez E, Bras J, Almeida G, et al. Designed cellulose nanocrystal surface properties for improving barrier properties in polylactide nanocomposites. Carbohydr Polym. 2018;183:267–277. 10.1016/j.carbpol.2017.12.005
  • Berglund J, Mikkelsen D, Flanagan BM, et al. Wood hemicelluloses exert distinct biomechanical contributions to cellulose fibrillar networks. Nat Commun. 2020;11(1):4692. doi: 10.1038/s41467-020-18390-z
  • Tian X, Beén F, Sun Y, et al. Identification of polymers with a Small Data Set of Mid-infrared Spectra: a comparison between machine learning and deep learning models. Environ Sci Technol Lett. 2023;10(11):1030–1035. doi: 10.1021/acs.estlett.2c00949
Supplemental material

Supplemental Material

Download PDF (1 MB)

Acknowledgments

The authors thank Dr. Takashi Endo (AIST, Japan) for his valuable comments in the initial stage of this study. The dataset developed by Project Grants (JPNP13006) from the New Energy and Industrial Technology Development Organization (NEDO) was acknowledged to establish the prediction model in this study. This work was partly supported by JSPS KAKENHI Grant-in-Aid for Early-Career Scientists Number 23K13785.

Disclosure Statement

The authors declare no competing financial interest.

Additional information

Funding

The work was supported by the Japan Society for the Promotion of Science [23K13785].