Multi-information Fusion Method for Traditional Chinese... : World Journal of Traditional Chinese Medicine (2025)

INTRODUCTION

The concept of constitution in traditional Chinese medicine (TCM) originates from Huangdi Neijing (Yellow Emperor’s Inner Canon). By observing the characteristics, such as body shape, skin color, and personality, people are classified into different constitution types, and the relationship between these types and health conditions is explored.[1-3] Professor Wang Qi published Theory of TCM Constitution in 1982, establishing the theoretical foundation of TCM constitution and contributing to its development. In 2009, he participated in the formulation of the “Classification and Judgment of TCM Constitution” standard,[4] which established nine constitution classifications and promoted the application of TCM constitution identification methods in the national public health service system. This event has furthered the development of TCM constitution in health management and disease prevention. Constitution identification methods primarily consist of questionnaire-based assessment and expert judgment. The questionnaire-based method collects the individual data through TCM constitution questionnaires, quantifies constitution characteristics through scoring and calculation, and exhibits good reliability and validity.[5,6] However, the extensive number of items in the questionnaire may lead to time-consuming completion, thus affecting efficiency and participation.[7] The expert judgment method relies on the expertise and experience of doctors for diagnosis. Nevertheless, it faces the challenges related to individual differences and the lack of standardized criteria. These shortcomings affect the quantification and objectivity of diagnoses.

In recent years, deep learning, a key technology in artificial intelligence, has provided technical support for the automation and efficiency improvement of TCM constitution identification due to its powerful learning ability and high accuracy. Research has focused on analyzing biological features, such as tongue and facial images. Tongue diagnosis, a core component of the four diagnostic methods in TCM, can reflect the condition of organs and vital energy, making it crucial for constitution identification. Zhou et al.[8,9] by fine-tuning the AlexNet network, classified three constitution types (qi deficiency, phlegm dampness, and damp heat), achieving an accuracy of 63%. The accuracy of the network improved to 77% through further optimization and a feature fusion strategy based on error weights. Hu et al.[10] constructed a comprehensive classification model using the Inception-v3 model, effectively distinguishing the nine TCM constitutions. Ma et al.[11] innovatively developed a complex perception classification model that focused on improving the accuracy of identifying nine constitution types. Li et al.[12] explored a hybrid deep learning approach combining Faster Region-based convolutional neural network (CNN), visual geometry group (VGG), and GoogLeNet models for tongue coating detection and constitution recognition. Experimental results showed a top-1 error rate of 39.58% in natural environments, validating the performance and practicality of the approach. In facial diagnosis, one of the four TCM diagnostic methods, doctors assess the health condition of internal organs by observing facial complexion. Huan and Wen[13,14] used pretrained VGG16 and NASNetMobile models combined with supervised learning and principal component analysis dimensionality reduction techniques to extract and aggregate features from 21,150 facial images. Through a multilevel and-scale approach, they achieved a classification accuracy of 69.61% on the test set. In addition, they trained the DenseNet-169 model by using transfer learning and integrated it with VGG-16, InceptionV3, and DenseNet-121. They achieved a high accuracy of 66.79% on a facial dataset containing 12,730 samples. This effort has contributed to the precision and scientific nature of TCM constitution classification.

TCM constitution assessment evaluates the individual health by analyzing external manifestations, such as tongue images, facial appearance, and pulse condition. However, a single data type cannot fully reflect one’s health status. The integration of multiple data sources, known as multi-information fusion learning, improves prediction accuracy by extracting features and leveraging their complementarity.[15] The concept of multi-information fusion has been widely applied in various fields. For example, in traffic management, multi-information fusion predicts traffic conditions by analyzing the historical data and weather information.[16] In social media analysis, it reveals online trends and user behavior through multidimensional data analysis.[17] In the medical field, it effectively classifies the diseases by combining clinical and imaging data, hence providing support for diagnosis.[18,19] Multi-information fusion technology integrates the data from different sources, thus enhancing input diversity, analysis accuracy, and prediction reliability. This technology has shown promising results in various domains, such as traffic and social media analysis, and demonstrates potential in medical diagnosis, indicating its crucial role in future research and applications. This study initially conducted the experiments using single data types to determine the baseline model and subsequently built a multi-information fusion classification model based on this optimal baseline model.

MATERIALS AND METHODS

Dataset

The data for this study were obtained from a project of the National Key R and D Program of China with the title “Research on the Assessment System and Service Model for the Health of Elderly Individuals Based on TCM Constitution Identification and Multimodal Techniques” (2020YFC2003100). The project primarily conducted a nationwide large-sample, multicenter epidemiological survey to validate and optimize the classification criteria and indicator system for the health status and TCM constitution of elderly individuals. It aimed to elucidate the TCM constitution characteristics of the elderly population and their correlation with diseases and health conditions. Data collection was performed by using the DS01-A Four Diagnostic Instrument manufactured by Shanghai DAOSH Medical Technology Co., Ltd. The study included participants aged 65 years and above. Tongue images, facial appearances, and pulse conditions were captured. Moreover, each patient completed the TCM Constitution Questionnaire (Elderly version), and their constitution category was determined as the sample’s class label on the basis of the criteria specified in the “National Basic Public Health Service Specification” (3rd edition).[20]

A total of 2305 cases of elderly constitution data, including 6763 images from the four diagnostic methods, were collected for this study. Given that this work only considered single constitutions, samples with mixed constitutions were excluded from the collected data. In addition, manual screening was conducted to remove low-quality, invalid, and incomplete data that could not be used to form the three types of data. Furthermore, the categories qi deficiency, damp heat, qi stagnation, blood stasis, and inherited special constitutions were discarded because they had limited importance for deep learning model classification due to their small sample sizes (<50 cases). Therefore, only samples from the peace, phlegm dampness, yang deficiency, and yin deficiency constitution categories were retained, resulting in a dataset of 699 valid elderly samples with a total of 2097 images. Each sample included the constitution identification results and tongue, facial, and pulse images, forming the dataset used in this study. The dataset was randomly divided into training, validation, and testing sets in a proportion of 6:2:2. The distribution of sample quantities for each category is shown in Table 1. Figure 1 illustrates the example images for each constitution category, wherein the pulse data presented were all derived from the pulse waveform of the left hand recorded continuously for 30 s.

Considering the presence of pixels with considerable interference in tongue image data, this study employed DeepLabv3+[21] for the uniform image segmentation processing of the tongue image dataset. Figure 2 depicts the architecture of the DeepLabv3+ model. In this model, the input raw image was first processed by an encoder to extract and compress its key features. Subsequently, the decoder reconstructed a detailed segmentation map based on these feature representations. The model further performed color restoration operations to ensure that the color of the segmented regions matched that of the original image and restored the color information that was lost during segmentation. By cropping the feature regions, the model output accurately segmented tongue images suitable for subsequent experiments. Notably, at this stage, DeepLabv3+ did not perform a classification task. Therefore, the number of classes was set to a single value. This setting simplified the model’s output, focusing on providing high-quality segmentation results and clear tongue images for subsequent analysis.

Single-type data classification model

This study selected four representative and state-of-the-art (SOTA) deep learning image classification algorithms for investigation: EfficientNetV2,[22] MobileViT,[23] Vision Transformer (ViT),[24] and Swin Transformer.[25] EfficientNetV2 network remarkably improves the utilization efficiency of computational resources while maintaining high accuracy through its efficient compound scaling strategy. MobileViT network integrates the feature extraction capability of ViT and lightweight design of MobileNet,[26] thus providing an optimized image classification solution for resource-constrained environments, such as mobile and edge computing devices. ViT extends the Transformer architecture to the field of image classification and enhances the model’s ability to capture global dependencies in images through self-attention mechanisms. Swin Transformer effectively handles large-scale images and multiscale features through its innovative hierarchical structure, demonstrating its potential in handling the complex visual tasks. These models share the ability to learn the hierarchical representations of image features and enhance global content understanding through attention mechanisms. Their strengths lie in achieving a balance between model performance and computational efficiency, hence adapting to datasets of different scales and diverse task requirements while handling multiscale information. These characteristics make these models widely applicable and exhibit outstanding performance in image classification tasks. They thus provide a solid foundation for the further research and application of deep learning in the field of image recognition.

EfficientNetV2

In contrast to EfficientNet,[27] EfficientNetV2 introduces Neural Architecture Search with training-aware architecture optimization, along with new operations, such a Fused-MBConv, to optimize model structure and achieve SOTA performance on multiple benchmark datasets. In addition, EfficientNetV2 employs an improved progressive learning strategy by gradually increasing image size and adjusting regularization strength during training, resulting in accelerated training without sacrificing accuracy. The nonuniform scaling strategy and improved regularization methods in EfficientNetV2 increase training speed by 5–11 times and reduce model parameters by 6.8 times, remarkably enhancing the model’s practicality and efficiency. These innovations enable EfficientNetV2 to maintain high accuracy while being highly suitable for applications with limited computational resources. The network architecture of EfficientNetV2-S is shown in Table 2.

Vision transformer

ViT is a successful application of the transformer[28] in the field of computer vision. Its network architecture is illustrated in Figure 3. The input image is divided into patches through an embedding layer and transformed into tokens, which are then mapped to a specific dimension through a linear layer. Position embeddings are added to provide spatial positional information for each patch to handle the sequential order. The Transformer Encoder is the core component of ViT. It is responsible for processing the serialized image data input. It includes self-attention mechanisms and feed-forward networks to capture global dependencies and enhance features. Layer normalization and residual connections ensure stable and efficient training, providing powerful feature representations for downstream tasks, such as classification. The multilayer perceptron (MLP) Head is the output component of ViT. It is responsible for transforming the features from the encoder into classification results. It consists of MLPs that include linear and nonlinear layers. In ViT, the class token represents global image information and generates a classification embedding vector through the MLP Head. These vectors are transformed into probability distributions through activation functions to predict the probability that an image belongs to different categories.

Swin transformer

The network architecture of Swin Transformer, as shown in Figure 4, consists of several components. Patch partition and linear embedding are similar to the embedding layer in ViT. Window Multi-Scale Attention (W-MSA) is a core component of the Swin Transformer block module. It performs local multihead self-attention operations. In contrast to traditional self-attention in Transformers, W-MSA limits the attention mechanism to operate within a local window, thus reducing computational complexity and maintaining computational locality. Although features within each window interact through the self-attention mechanism, no direct information exchange occurs between windows. Shifted W-MSA is another crucial component of Swin Transformer, which addresses the local limitations of W-MSA by enabling information exchange between different windows by cyclically shifting windows, allowing each window to access information from neighboring windows. This cyclic shifting operation helps capture a broad context while still maintaining computational efficiency.

MobileViT

The network architecture of MobileViT, as shown in Figure 5, integrates traditional convolutional layers, MobileNetv2 block (MV2), MobileViT block core modules, global pooling layers, and fully connected layers. The MV2 module aims to achieve efficient feature extraction while maintaining the model’s lightweight nature, making it suitable for deployment on mobile and edge devices. The MobileViT Block combines the efficiency of CNNs with the global perception of Transformers to enable efficient image processing on resource-constrained mobile devices. The “↓2” operation in Figure 5 denotes downsampling, which reduces the spatial dimensions of the input image by half, extracting additional abstract feature representations while reducing computational burden. The symbol “×?” indicates the number of repetitions for layer stacking, whereas “h = w=?” specifically represents the patch size. Through this hybrid architecture design, MobileViT effectively combines the spatial induction capability of CNNs with the global perception advantages of Transformers, offering a new perspective for efficient image classification models.

Multi-information fusion classification model

Concept of multi-information fusion

The challenge in multi-information fusion techniques lies in effectively integrating diverse data sources to improve analysis accuracy. Three main strategies based on different fusion positions exist: data, feature, and decision-level fusions.[29] Data-level fusion involves directly merging different datasets for model training. It is easy to implement and allows for the reuse of existing models. The structure of the multi-information fusion classification model using data-level fusion is shown in Figure 6. However, due to data heterogeneity, this approach may lead to information redundancy and difficulty in leveraging the complementary advantages of the data. This shortcoming can affect prediction accuracy and limit applicability.

The feature-level fusion strategy takes a fine-grained approach. It first independently extracts features from each type of data and then merges these features at an intermediate stage of the model. Finally, classification decisions are made on the basis of the fused features. The structure of the multi-information fusion classification model using feature-level fusion is illustrated in Figure 7. Compared to data-level fusion, feature-level fusion requires considerable adjustments to the model architecture, making it more complex to implement. However, its advantage lies in effectively preserving the unique information of each type of data, reducing redundancy, and fully leveraging the complementarity between different types of data. Feature-level fusion provides flexibility to choose fusion points within the model, making it the mainstream choice in current research.

The decision-level fusion strategy occurs at the final stage of the model, and its classification model structure is depicted in Figure 8. Similar to the ensemble of multiple single-type data classification models, decision-level fusion occurs after the independent decisions or classification results of each model have been generated. It leverages the decision logic and judgment power of different types of data models to improve the overall decision accuracy and reliability. However, it increases computational complexity and may not fully consider the interaction between data, thereby affecting the final classification performance.

We found that in the task of multi-information fusion for constitution identification classification using tongue, facial, and pulse image data, the feature-level fusion strategy showed remarkable advantages over data and decision-level fusion. Therefore, we decided to adopt the feature-level fusion strategy to construct the multi-information fusion classification network.

Model construction for multi-information fusion

Given the excellent performance of MobileViT in single-data type classification experiments (see specific experiments in Section 3.2), we chose MobileViT as the baseline model for constructing the multi-information fusion network. In the actual network structure design, the initial stage considers the uniqueness of different types of image data. First, different types of image data, such as tongue, facial, and pulse images, are independently input into the model. Each input image is processed through a 3 × 3 convolutional layer to capture the distinctive visual features of each data type. This step is crucial because it allows the model to focus on learning key features specific to each data type in the early stages, laying a solid foundation for subsequent feature fusion. Subsequently, the feature tensors obtained from the convolutional layers of different data types are concatenated using the concatenation operation, forming a unified feature representation to achieve feature integration. This joint feature representation is then fed into subsequent layers of the network, wherein further fusion and refinement of features occur to facilitate the final classification task.

Through this design, the multi-information fusion network not only preserves the unique information of each data type but also effectively combines these pieces of information in subsequent layers through feature fusion strategies, thereby improving the accuracy and robustness of the model for complex tasks in TCM constitution identification. Its advantage lies in its ability to fully leverage the complementarity of multiple data sources while maintaining the flexibility and efficiency of the model. The network structures for 2- and 3-input classification can be seen in Figures 9 and 10, respectively.

RESULTS

Experimental environment

The experiments reported in this paper were conducted by using the PyTorch[30] deep learning framework with the specific configuration shown in Table 3.

Baseline model confirmation experiment

Tongue, facial, and pulse images were individually input into four models, namely, EfficientNetV2, MobileViT, ViT, and Swin Transformer, for training to evaluate and compare the performances of different deep learning models on TCM image classification tasks. During training, key hyperparameters, such as initial learning rate, optimizer selection, batch size, and number of epochs, were adjusted to optimize the models’ classification accuracy. The optimal parameter configurations, as shown in Table 4, were determined by fine-tuning.

The experimental results were presented in Table 5. Regardless of the tongue, facial, or pulse image datasets, the MobileViT model consistently demonstrated higher accuracy compared with the other three models. This phenomenon could be attributed to the unique architecture design of MobileViT, which combined a lightweight convolutional network with Transformer’s self-attention mechanism, effectively capturing global dependencies in the images while maintaining model efficiency. Additionally, due to the rich and complex details presented in the data of the four TCM diagnostic methods, MobileViT might better balance the extraction of local features and integration of global information on these specific data than other models. Therefore, MobileViT was chosen as the baseline model for further research.

Multi-information fusion model experiment

For the two-input model classification experiment, tongue, facial, and pulse image data were combined pairwise to create three sets of input for training the two-input classification models and investigate the effect of different combinations of data types on the effectiveness of classification tasks for TCM constitution identification. For the three-input data classification experiment, tongue, facial, and pulse image data were simultaneously input into the network for training. In the experiments described in this section, the initial learning rate, optimizer selection, batch size, and number of iterations were adjusted to optimize the models’ classification accuracy. The optimal hyperparameter settings for each group experiment with specific data types, as shown in Table 6, were determined on the basis of the experiments described above. Subsequently, a comprehensive comparison was conducted for each group experiment to determine the best-performing data combination for the classification task. A comparison was also performed with the results of single-data type classification experiments described in Section 3.2 to compare the performance of single-type classification models with that of multitype models in classification tasks for TCM constitution identification.

Figure 11 displays the curves of accuracy and loss over iterations for different data combinations in their respective multi-information fusion models on the training set. Table 7 provides a detailed breakdown of the best accuracy values achieved by different data combinations in their respective multi-information fusion models. The results show that the combinations of tongue and pulse, tongue and facial, and facial and pulse data in the classification models achieved the highest accuracies of 67.49%, 68.03%, and 68.76%, respectively. Furthermore, the three-input classification model based on tongue, facial, and pulse data achieved the best classification performance with an optimal accuracy of 71.32%.

The research results demonstrated that in the two-input classification models, the combinations of tongue and pulse, tongue and facial, and facial and pulse data outperformed the performance of single-type data in the original models. Furthermore, when introducing the combination of three data types, namely, tongue, facial, and pulse data, the performance of the classification model further improved and surpassed that of all the two-input classification models. This result could be attributed to the fact that multi-information often contain information from different sources. Such information is complementary in certain aspects. The fusion model of multi-information can leverage redundant information from different data types during data processing, enhancing the model’s robustness against noise and missing data. Even if one type of data has defects or is missing, other types of data can still support the model in effective classification. Additionally, multi-information fusion expands the dimensionality of the feature space, enabling the model to learn additional complex feature representations. These rich feature representations contribute to enhancing capturing and distinguishing differences between different categories. In multi-information fusion learning, different types of data may provide mutual support during the feature learning stage, helping the model converge to the optimal solution rapidly. This complementarity can accelerate learning and potentially improve the model’s generalization ability. During actual training, one type of data may be incomplete or noisy due to various reasons (such as poor image quality or limitations in data collection conditions). The multi-information fusion classification network improves robustness against data missingness and noise by combining different types of data, thereby maintaining high classification performance even under adverse conditions.

DISCUSSION

To verify whether the network model has successfully learned discriminative image features, this study employed a method based on gradient-weighted class activation mapping[31] to analyze the model’s regions of interest for tongue, facial, and pulse images. The specific experimental results are displayed in Figure 12, showing the original input images and the corresponding heatmap output images generated by the model. It can be observed that the heatmap for facial data primarily focuses on the cardiopulmonary region. The heart governs blood vessels, responsible for propelling blood circulation within the vasculature, while the lungs govern qi, in charge of respiration and the dissemination and descent of qi and blood. In TCM theory, the sufficiency and harmony of qi and blood are the foundations for maintaining a healthy constitution. Therefore, the normal function of the heart and lungs directly affect an individual’s qi and blood status, thereby influencing the constitution. For instance, individuals with a phlegm dampness constitution may exhibit symptoms such as body heaviness and fatigue due to insufficient lung dissemination function, leading to impaired metabolism of body fluids.[32]

The heatmap of the tongue image data primarily focuses on the central part of the tongue, corresponding to the spleen and stomach regions. The spleen and stomach are regarded as the “fundamental basis of acquired constitution” and are considered the main source of the body’s production of qi and blood. The health status of the spleen and stomach directly affects an individual’s constitution and health condition. For instance, individuals with spleen and stomach deficiency may be more prone to exhibit characteristics of qi deficiency or phlegm dampness constitutions, such as easy fatigability, body heaviness, and indigestion, while those with robust spleen and stomach functions may possess a more balanced constitution, manifested by abundant energy, good appetite, and robust health.

The heatmap of the pulse image data covers almost the entire pulse wave area, with a slight bias towards the wave peaks. This may be due to different phases of the pulse wave reflecting various aspects of blood flow and cardiac function. The wave peaks typically represent the force of blood ejection during cardiac contraction, while the wave troughs are associated with blood reflux during cardiac relaxation. Therefore, the bias of the heatmap towards the wave peak area may imply that the cardiac contractile function and the force of blood ejection are given more attention by the classification model. This bias might also reflect the importance of certain specific frequency or rhythmic characteristics within the pulse data, which warrants further contemplation and research.[33]

In summary, the model we constructed demonstrates varying focal areas in the analysis of tongue, facial, and pulse wave imagery, thereby achieving a level of complementary learning for different types of data. Through multidimensional learning and analytical processes, the model is adept at integrating a range of physiological characteristics, which in turn enhances the precision and comprehensiveness of TCM constitution identification.

This work focuses on exploring the application of multi-information fusion in the classification of TCM constitution identification problems for the elderly population with the aim of enhancing the accuracy and efficiency of TCM diagnosis through modern technological means. This study first comprehensively reviews and summarizes the current practice of TCM constitution identification, points out the limitations of existing methods, and proposes an innovative approach using multi-information for constitution identification. On the basis of the research on the external signs of TCM constitution, this study collects three different key TCM diagnostic information: Tongue, facial, and pulse. Advanced deep learning algorithms are then introduced for experimental analysis. Furthermore, a multi-information fusion classification model is constructed. This model integrates multiple data types and effectively combines different diagnostic information, thus remarkably improving the accuracy of TCM constitution identification.

CONCLUSIONS

In conclusion, this study provides a scientific and systematic method for TCM constitution identification in the elderly population, offering new ideas for the modernization and intelligence of TCM diagnosis research. Through empirical research, this study validates the potential application of multi-information in TCM constitution identification, providing empirical and data support for future research in related fields. It also holds reference value for multi-information fusion analysis in other medical fields.

However, this study has limitations in terms of the data types considered given that it only focuses on tongue, facial, and pulse data. While these data are crucial in TCM constitution identification, they do not encompass other diagnostic information types, such as visual inspection, infrared imaging, sound, smell, and microscopy data. This limitation may constrain the model’s ability to capture a comprehensive understanding of the patient’s overall health status. Future research should consider expanding the range of data types to construct a comprehensive and precise model for TCM constitution identification.

Ethics approval and consent to participate

The human experiment has been approved by the Chinese Ethics Committee of Registering Clinical Trials (approval no. ChiECRCT20210494).

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

REFERENCES

1.Gao LX, Jia CH, Wang W. Recent advances in the study of ancient books on traditional Chinese medicine. World J Tradit Chin Med 2020;6:61–6.

  • Cited Here

2.Qian HN. The earliest panorama of Chinese medicine constitution categorization-unscrambling inner canon of Huangdi about the theory of yin and yang 25 peoples. China J Tradit Chin Med Pharm 2008;10:853–5.

  • Cited Here

3.Wang J. Study on the Health-Preserving Thoughts of Constitution in Huangdi Neijing and its Influence on Later Generations. Dissertation, Nanjing University of Chinese Medicine; 2017.

  • Cited Here

4.China Association of Chinese Medicine. Classification and determination of traditional Chinese medicine constitution (ZYYXH/T157-2009). World J Integr Tradit West Med 2009;4:303–4.

  • Cited Here

5.Zhu YB, Wang Q, Xue HS, Hideki O. Preliminary assessment on performance of constitution in Chinese medicine questionnaire. Chin J Clin Rehabil 2006;3:15–7.

  • Cited Here

6.Zhu YB, Wang Q, Hideki O. Reliability and Validity Evaluation of Traditional Chinese Medicine Constitution Questionnaire. Proceedings of the 5th National Academic Conference on Traditional Chinese Medicine Constitution, Organized by Beijing University of Chinese Medicine and the Subdivision of Constitution of the Chinese Association of Traditional Chinese Medicine. Beijing University of Chinese Medicine;Faculty of Medicine, Toyama University, Japan; 2007:7.

  • Cited Here

7.Hou M, Zhang YR, Yang WL, Wen CB, Luo Y. A Review on Identification of Traditional Chinese Medicine Constitution. Proceedings of the 5th China Conference on Traditional Chinese Medicine Information –Big Data Standardization and Intelligent Traditional Chinese Medicine. Chengdu University of Traditional Chinese Medicine; 2018:4.

  • Cited Here

8.Zhou H, Hu GQ, Zhang XF. Constitution Identification of Tongue Image based on CNN. 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI);2018:1–5.

  • Cited Here

9.Zhou H, Hu GQ, Zhang XF. Preliminary study on traditional Chinese medicine constitution classification method based on tongue image feature fusion. Beijing Biomed Eng 2020;39:221–6.

  • Cited Here

10.Hu JL, Ding YT, Kan HX. Tongue body constitution classification based on machine learning. J Jiamusi Univ (Nat Sci Ed) 2018;36:709–13.

  • Cited Here

11.Ma J, Wen G, Wang C, Jiang L. Complexity perception classification method for tongue constitution recognition. Artif Intell Med 2019;96:123–33.

  • Cited Here

12.Li HH, Wen GH, Zeng HB. Natural tongue physique identification using hybrid deep learning methods. Multimedia Tools Appl 2019;78:6847–68.

  • Cited Here

13.Huan EY, Wen GH. Multilevel and multiscale feature aggregation in deep networks for facial constitution classification. Comput Math Methods Med 2019;2019:1258782.

  • Cited Here

14.Huan EY, Wen GH. Transfer learning with deep convolutional neural network for constitution classification with face image. Multimedia Tools Appl 2020;79:11905–19.

  • Cited Here

15.Li XR. Prospects of multi-modal deep learning and its application in ophthalmic artificial intelligence. Med J Peking Union Med Coll Hosp 2021;12:602–7.

  • Cited Here

16.Du SD, Li TR, Gong X, Horng SJ. A hybrid method for traffic flow forecasting using multimodal deep learning. Int J Comput Intell Syst 2020;13:85–97.

  • Cited Here

17.Huang FR, Zhang XM, Li ZJ, Mei T, He YY, Zhao ZH. Learning social image embedding with deep multimodal attention networks. Proc Thematic Workshops ACM Multimedia 2017;2017:460–8.

  • Cited Here

18.Liu R, Pan D, Xu Y, Zeng H, He Z, Lin J, et al. Adeep learning-machine learning fusion approach for the classification of benign, malignant, and intermediate bone tumors. Eur Radiol 2022;32:1371–83.

  • Cited Here

19.Atrey K, Singh BK, Bodhey NK, Pachori RB. Mammography and ultrasound based dual modality classification of breast cancer using a hybrid deep learning approach. Biomed Signal Process Control 2023;86:104919.

  • Cited Here

20.National Health and Family Planning Commission of China. Notice of the National Health and Family Planning Commission of China on the Issuance of the “National Basic Public Health Service Specifications of China (Third Edition)”[EB/OL]; 2017 Available from: https://www.nhc.gov.cn/jws/s3578/201703/d20c37e23e1f4c7db7b8e25f34473e1b.shtml. [Last accessed on 2025 Jan 13].

  • Cited Here

21.Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV); 2018:801–18.

  • Cited Here

22.Tan M, Le Q. EfficientNetV2:Smaller models and faster training [C]// International Conference on Machine Learning. PMLR 2021:10096–10106.

  • Cited Here

23.Zou W, Xie K, Lin J. Light-weight deep learning method for active jamming recognition based on improved MobileViT. IET Radar, Sonar and Navigation 2023;17:1299–1311.

  • Cited Here

24.Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, et al. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 2022;45:87–110.

  • Cited Here

25.Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z, et al. Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021:10012–22.

  • Cited Here

26.Howard AG, Zhu ML, Chen B, Kalenichenko D, Wang WJ, Weyand T, et al. Mobilenets:Efficient convolutional neural networks for mobile vision applications[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition;2017:1501–1509.

  • Cited Here

27.Tan M, Le Q. EfficientNet:Rethinking model scaling for convolutional neural networks // International Conference on Machine Learning. PMLR 2019:6105–6114.

  • Cited Here

28.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017:6000–10.

  • Cited Here

29.Yang FW, Li ZQ, Zhang ZW, Tang Y, Zhao XS, Qu M, et al. Areview and prospects of body constitution classification and identification techniques in traditional Chinese medicine. Tianjin J Tradit Chin Med 2024;41:398–402.

  • Cited Here

30.Yuan J. Performance Analysis of Deep Learning Algorithms Implemented Using PyTorch in Image Recognition. Procedia Computer Science 2024;247:61–69.

  • Cited Here

31.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM:visual explanations from deep networks via gradient-based localization. International journal of computer vision 2020;128:336–359.

  • Cited Here

32.Liu W, Ge ZH, Li B. Study on relations between characteristics of traditional Chinese medicine body constitutions and syndromes in COPD patients. Zhongguo Zhong Yao Za Zhi 2013;38:3587–90.

  • Cited Here

33.Jagwani M, Sharma G, Sharma H, Kafle T, Tewani G, Nair P. Investigation of the test-retest reliability and inter-rater agreement of traditional Chinese medicine-based pulse diagnosis among Indian traditional Chinese medicine practitioners. World J Tradit Chin Med 2023;9:415–8.

  • Cited Here

Keywords:

Deep learning; facial image; multi-information fusion; pulse image; tongue image; traditional Chinese medicine constitution

© 2025 World Journal of Traditional Chinese Medicine
Multi-information Fusion Method for Traditional Chinese... : World Journal of Traditional Chinese Medicine (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Ms. Lucile Johns

Last Updated:

Views: 6615

Rating: 4 / 5 (61 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.