Multimodal Food Learning

Weiqing Mina,b, Xingjian Honga,b, Yuxin Liua,b, Mingyu Huanga,b, Ying Jina,b, Pengfei Zhoua,b, Leyi Xua,b, Yilin Wanga,b, Shuqiang Jianga,b,*

aThe Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

bUniversity of Chinese Academy of Sciences

@article{MMFLSurvey,
  title={Multimodal Food Learning},
  author={Min, Weiqing and Hong, Xingjian and Liu, Yuxin and Huang, Mingyu and Jin, Ying and Zhou, Pengfei and Xu, Leyi and Wang, Yilin and Jiang, Shuqiang and Rui, Yong},
  year={2025},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={https://doi.org/10.1145/3715143}
}

Abstract

Food-centered study has received more attention in the multimedia community for its profound impact on our survival, nutrition and health, pleasure and enjoyment. Our experience of food is typically multi-sensory: we see food objects, smell its odors, taste its flavors, feel its texture, and hear sounds when chewing. Therefore, multimodal food learning is vital in food-centered study, which aims to relate information from multiple food modalities to support various multimedia tasks, ranging from recognition, retrieval, generation, recommendation and interaction, enabling applications in different fields like healthcare and agriculture. However, there is no survey on this topic to our knowledge. To fill this gap, this paper formalizes multimodal food learning and comprehensively surveys its typical tasks, technical achievements, existing datasets and applications to provide the blueprint with researchers and practitioners. Based on the current state of the art, we identify both open research issues and promising research directions, such as multimodal food learning benchmark construction, multimodal food foundation model construction and multimodal diet estimation. We also point out that closer cooperation from researchers between multimedia and food science can handle some existing challenges and meanwhile open up more new opportunities to advance the fast development of multimodal food learning. This is the first comprehensive survey in this topic and we anticipate about 170 reviewed research articles can benefit academia and industry in this community and beyond.

Introduction

MMFL
Multisensory integration in food perception: how vision, smell, taste, touch, and hearing shape our food experiences

Food-centered study has received more attention in the multimedia community due to its profound impact on survival, nutrition and health, pleasure, and enjoyment. Our experience of food is typically multi-sensory, involving vision, smell, taste, touch, and sound.

Multimodal Food Learning (MMFL) aims to relate information from multiple food modalities to support tasks such as recognition, retrieval, generation, recommendation, and interaction, with applications in healthcare and agriculture. However, no survey has been conducted on this topic to date.

To fill this gap, this paper formalizes MMFL, comprehensively surveys its tasks, technical achievements, datasets, and applications, and identifies key research challenges and promising directions, such as benchmark construction, multimodal food foundation models, and multimodal diet estimation.

Overview of Multimodal Food Learning

Food is a highly salient category due to its importance in human survival. Food-related perception and cognition are complex, requiring multisensory integration of vision, taste, smell, touch, and hearing for tasks like food recognition and analysis.

Multimodal Food Learning (MMFL) aims to build food-oriented intelligent agents capable of understanding, reasoning, and learning from multiple food modalities. These modalities range from sensory data (images, smells, textures) to abstract representations such as language and molecular structures. Food-related multimodal data can be acquired from various sources, including websites, social media, cameras, hyperspectral devices, electronic noses, and electronic tongues.

Food data spans multiple scales, from macro-level information like dishes and ingredients, to micro-level molecular compositions, including proteins, fats, and carbohydrates. MMFL employs representation, translation, fusion, alignment, and co-learning methods, leveraging traditional machine learning, deep learning, and foundational models. These approaches support various multimedia food computing tasks, such as recognition, retrieval, generation, recommendation, and interaction, enabling applications in catering, healthcare, and culture.

overview
Overview of multimodal food learning

Paper List

  1. Eduardo Aguilar, Beatriz Remeseiro, Marc Bolaños, and Petia Radeva. 2018. Grab, pay, and eat: Semantic food detec- tion for smart restaurants. IEEE Trans. Multimed. 20, 12 (2018), 3266–3275. [Link]
  2. Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-László Barabási. 2011. Flavor network and the principles of food pairing. Scientific Reports 1, 1 (2011), 196. [Link]
  3. Kiyoharu Aizawa and Makoto Ogawa. 2015. FoodLog: Multimedia Tool for Healthcare Applications. IEEE Multimed. 22, 2 (2015), 4–8. [Link]
  4. Dario Allegra, Marios Anthimopoulos, Joachim Dehais, Ya Lu, Filippo Stanco, Giovanni Maria Farinella, and Stavroula Mougiakakou. 2017. A multimedia database for automatic meal assessment systems. In New Trends in Image Analysis and Processing. 471–478. [Link]
  5. Nicholas Bakalar. 2012. Sensory science: partners in flavour. Nature 486, 7403 (2012), S4–S5. [Link]
  6. Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2018), 423–443. [Link]
  7. Ann-Sophie Barwich and Elisabeth A Lloyd. 2022. More than meets the AI: The possibilities and limits of machine learning in olfaction. Frontiers in Neuroscience 16 (2022), 981294. [Link]
  8. Thoranna Bender, Simon Sørensen, Alireza Kashani, Kristjan Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, and Frederik Warburg. 2024. Learning to taste: A multimodal wine dataset. Advances in Neural Information Processing Systems 36 (2024). [Link]
  9. Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory D Abowd, and Irfan Essa. 2015. Leveraging context to support automated food recognition in restaurants. In IEEE Winter Conference on Applications of Computer Vision. 580–587. [Link]
  10. Da Cao, Zhiwang Yu, Hanling Zhang, Jiansheng Fang, Liqiang Nie, and Qi Tian. 2019. Video-Based Cross-Modal Recipe Retrieval. In Proceedings of the ACM International Conference on Multimedia. 1685–1693. [Link]
  11. Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. 2018. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. In The International ACMSIGIR Confer- ence on Research & Development in Information Retrieval. 35–44. [Link]
  12. Jingjing Chen,Chong-Wah Ngo, and Tat-Seng Chua. 2017. Cross-modal recipe retrieval with rich food attributes. In Proceedings of the ACM International Conference on Multimedia. 1771–1779. [Link]
  13. Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Pro- ceedings of the ACM International Conference on Multimedia. 32–41. [Link]
  14. Jingjing Chen, Lei Pang, and Chong-Wah Ngo. 2017. Cross-modal recipe retrieval: How to cook this dish?. In Multi- Media Modeling. Springer, 588–600. [Link]
  15. Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang. 2020. A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Trans. Image Process. 30 (2020), 1514–1526. [Link]
  16. Yu Chen, Ananya Subburathinam, Ching-Hua Chen, and Mohammed J Zaki. 2021. Personalized food recommendation as constrained question answering over a large-scale food knowledge graph. In Proceedings of the ACM Interna- tional Conference on Web Search and Data Mining. 544–552. [Link]
  17. Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, and Filip Ilievski. 2024. Fire: Food image to recipe generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 8184–8194. [Link]
  18. Manal Chokr and Shady Elbassuoni. 2017. Calories prediction from food images. In Proceedings of the AAAI Confer- ence on Artificial Intelligence. 4664–4669. [Link]
  19. Pengyu Chu, Zhaojian Li, Kyle Lammers, Renfu Lu, and Xiaoming Liu. 2021. Deep learning-based apple detection using a suppression mask R-CNN. Pattern Recognition Letters 147 (2021), 206–211. [Link]
  20. M Luisa Demattè, Nicola Pojer, Isabella Endrizzi, Maria Laura Corollaro, Emanuela Betta, Eugenio Aprea, Mathilde Charles, Franco Biasioli, Massimiliano Zampini, and Flavia Gasperi. 2014. Effects of the sound of the bite on apple perceived crispness and hardness. Food Quality and Preference 38 (2014), 58–64. [Link]
  21. Jialin Deng, Yan Wang, Carlos Velasco, Ferran Altarriba Altarriba Bertran, Rob Comber, Marianna Obrist, Kather- ine Isbister, Charles Spence, and Florian ‘Floyd’ Mueller. 2021. The future of human-food interaction. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. Article 100, 6 pages. [Link]
  22. Keisuke Doman, Cheng Ying Kuai, Tomokazu Takahashi, Ichiro Ide, and Hiroshi Murase. 2011. Video CooKing: Towards the synthesis of multimedia cooking recipes. In Advances in Multimedia Modeling: International Multimedia Modeling Conference. Springer, 135–145. [Link]
  23. Takumi Ege and Keiji Yanai. 2017. Image-based food calorie estimation using knowledge on food categories, ingredients and cooking directions. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017. 367–375. [Link]
  24. Takumi Ege and Keiji Yanai. 2019. Simultaneous estimation of dish locations and calories with multi-task learning. IEICE Trans. Inf. Syst. 102-D, 7 (2019), 1240–1246. [Link]
  25. David Elsweiler and Morgan Harvey. 2015. Towards automatic meal plan recommendations for balanced nutrition. In Proceedings of the ACM Conference on Recommender Systems. 313–316. [Link]
  26. David Elsweiler, Hanna Hauptmann, and Christoph Trattner. 2022. Food recommender systems. Recommender Systems Handbook 11 (2022), 871. [Link]
  27. David Elsweiler, Christoph Trattner, and Morgan Harvey. 2017. Exploiting food choice biases for healthier recipe recommendation. In Proceedings of the international ACMSIGIR conference on research and development in information retrieval. 575–584. [Link]
  28. Yulia Eskin and Alex Mihailidis. 2012. An intelligent nutritional assessment system. In AAAIFall Symposium: Arti- ficial Intelligence for Gerontechnology. [Link]
  29. Shaobo Fang, Zeman Shao, Deborah A. Kerr, Carol J. Boushey, and Fengqing Zhu. 2019. An end-to-end image-based automatic food energy estimation technique based on learned energy distribution images: Protocol and methodology. Nutrients 11, 4 (2019). [Link]
  30. Francesco Foroni and Raffaella I Rumiati. 2017. Food perception and categorization: From food/no-food to different types of food. In Handbook of Categorization in Cognitive Science. Elsevier, 271–287. [Link]
  31. Han Fu, Rui Wu, Chenghao Liu, and Jianling Sun. 2020. Mcen: Bridging cross-modal gap between cooking recipes and dish images with latent variable model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14570–14580. [Link]
  32. Mouzhi Ge, Francesco Ricci, and David Massimo. 2015. Health-aware food recommender system. In Proceedings of the ACM Conference on Recommender Systems. 333–334. [Link]
  33. Shivani Gowda,YifanHu, and MandyKorpusik. 2023. Multi-modal food classification in a diet tracking system with spoken and visual inputs. In IEEE International Conference on Acoustics, Speech and Signal Processing. 1–5. [Link]
  34. Lingling Guo, TingWang, Zhonghua Wu, Jianwu Wang, Ming Wang, Zequn Cui, Shaobo Ji, Jianfei Cai, Chuanlai Xu, and Xiaodong Chen. 2020. Portable food-freshness prediction platform based on colorimetric barcode combinatorics and deep convolutional neural networks. Advanced materials 32, 45 (2020), e2004805. [Link]
  35. Reiko Hamada, Koichi Miura, Ichiro Ide, Shin’ichi Satoh, Shuichi Sakai, and Hidehiko Tanaka. 2004. Multimedia integration for cooking video indexing. In Proceedings of the Pacific Rim Conference on Advances in Multimedia Infor- mation Processing. 657–664. [Link]
  36. Yuzhe Han, Qimin Cheng, Wenjin Wu, and Ziyang Huang. 2023. DPF-Nutrition: Food nutrition estimation via depth prediction and fusion. Foods 12, 23 (2023). [Link]
  37. Yue Han, Jiangpeng He, Mridul Gupta, Edward J Delp, and Fengqing Zhu. 2023. Diffusion model with clustering- based conditioning for food image generation. In Proceedings of the International Workshop on Multimedia Assisted Dietary Management. 61–69. [Link]
  38. Abdo Hassoun, Sandeep Jagtap, Hana Trollman, Guillermo Garcia-Garcia, Nour Alhaj Abdullah, Gulden Goksen, Farah Bader, Fatih Ozogul, Francisco J. Barba,Janna Cropotova, Paulo E.S. Munekata, and José M. Lorenzo. 2023. Food processing 4.0: Current and future developments spurred by the fourth industrial revolution. Food Control 145(2023), 109507. [Link]
  39. Steven Haussmann, Oshani Seneviratne, Yu Chen, Yarden Ne’eman, James Codella, Ching-Hua Chen, Deborah L McGuinness, and Mohammed J Zaki. 2019. FoodKG: a semantics-driven knowledge graph for food recommendation. In Proceedings of the International Semantic Web Conference. 146–162. [Link]
  40. Luis Herranz, Shuqiang Jiang, and Ruihan Xu. 2016. Modeling restaurant context for food recognition. IEEE Trans. Multimed. 19, 2 (2016), 430–440. [Link]
  41. Luis Herranz, Weiqing Min, and Shuqiang Jiang. 2018. Food recognition and recipe analysis: integrating visual content, context and external knowledge. [Link]
  42. Qingyi Wei Hongbin Pu and Da-Wen Sun. 2023. Recent advances in muscle food safety evaluation: Hyperspectral imaging analyses and applications. Critical Reviews in Food Science and Nutrition 63, 10 (2023), 1297–1313. [Link]
  43. Xu Huang,Jin Liu, Zhizhong Zhang, and Yuan Xie. 2023. Improving cross-modal recipe retrieval with component- aware prompted CLIP embedding. In Proceedings of the ACM International Conference on Multimedia. 529–537. [Link]
  44. Arun Pandiyan Indiran, Humaira Fatima, Sampriti Chattopadhyay, Sureshkumar Ramadoss, and Yashwanth Rad- hakrishnan. 2024. UmamiPreDL: Deep learning model for umami taste prediction of peptides using BERT and CNN. Computational Biology and Chemistry 111 (2024), 108116. [Link]
  45. Celestine Iwendi, Suleman Khan, Joseph Henry Anajemba, Ali Kashif Bashir, and Fazal Noor. 2020. Realizing an efficient IoMT-assisted patient diet recommendation system through machine learning model. IEEE Access 8 (2020), 28462–28474. [Link]
  46. Gokce Iymen, Gizem Tanriver, Yusuf Ziya Hayirlioglu, and Onur Ergen. 2020. Artificial intelligence-based identification of butter variations as a model study for detecting food adulteration. Innovative Food Science & Emerging Technologies 66 (2020), 102527. [Link]
  47. Huizhuo Ji, Dandan Pu, Wenjing Yan, Qingchuan Zhang, Min Zuo, and Yuyu Zhang. 2023. Recent advances and application of machine learning in food flavor prediction and regulation. Trends in Food Science & Technology 138 (2023), 738–751. [Link]
  48. Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food detection and recognition using convolutional neural network. In Proceedings of the ACM International Conference on Multimedia. 1085–1088. [Link]
  49. Diclehan Karakaya, Oguzhan Ulucan, and Mehmet Turkan. 2020. Electronic nose and its applications: A survey. International Journal of Automation and Computing 17, 2 (2020), 179–209. [Link]
  50. Andreas Keller, Richard C Gerkin, Yuanfang Guan, Amit Dhurandhar, Gabor Turu, Bence Szalai, Joel D Mainland, Yusuke Ihara, Chung Wen Yu, Russ Wolfinger, et al. 2017. Predicting human olfactory perception from chemical features of odor molecules. Science 355, 6327 (2017), 820–826. [Link]
  51. Keigo Kitamura, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2008. Food log by analyzing food images. In Proceedings of the ACM International Conference on Multimedia. 999–1000. [Link]
  52. Xing Lan, JiayiLyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, and Jian Xue. 2023. FoodSAM: Any food segmen- tation. IEEE Trans. Multimed. (2023), 1–14. [Link]
  53. Brian K. Lee, Emily J. Mayhew, Benjamin Sanchez-Lengeling, Jennifer N. Wei, Wesley W. Qian, Kelsie A. Little, Matthew Andres, Britney B. Nguyen, Theresa Moloy, Jacob Yasonik, Jane K. Parker, Richard C. Gerkin, Joel D. Main- land, and Alexander B. Wiltschko. 2023. A principal odor map unifies diverse tasks in olfactory perception. Science 381, 6661 (2023), 999–1006. [Link]
  54. Ki-Seung Lee. 2019. Joint audio-ultrasound food recognition for noisy environments. IEEE J. Biomed. Health Informat. 24, 5 (2019), 1477–1489. [Link]
  55. Tianhao Li, Wensong Wei, Shujuan Xing, Weiqing Min, Chunjiang Zhang, and Shuqiang Jiang. 2023. Deep Learning-Based Near-Infrared Hyperspectral Imaging for Food Nutrition Estimation. Foods 12, 17 (2023). [Link]
  56. Hyoyoung Lim, Xiaolei Huang, Samuel Miller, Joshua Edelmann, Timothy Euken, and Stephen Voida. 2019. Smart cook: making cooking easier with multimodal learning. In Adjunct Proceedings of the ACM International Joint Con- ference on Pervasive and Ubiquitous Computing and Proceedings of the ACM International Symposium on Wearable Computers. 129–132. [Link]
  57. Hod Lipson and Salah Sukkarieh. 2023. Robots may transform the way we produce and prepare food. Nature Reviews Bioengineering 1, 11 (2023), 795–798. [Link]
  58. Qi Liu, Yue Zhang, Zhenguang Liu, Ye Yuan, Li Cheng, and Roger Zimmermann. 2018. Multi-modal multi-task learning for automatic dietary assessment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. [Link]
  59. Tao Liu, Yanbing Chen, Dongqi Li, Tao Yang, and Jianhua Cao. 2020. Electronic tongue recognition with feature specificity enhancement. Sensors 20, 3 (2020). [Link]
  60. Yuxin Liu, Weiqing Min, Shuqiang Jiang, and Yong Rui. 2024. Convolution-enhanced bi-branch adaptive transformer with cross-task interaction for food category and ingredient recognition. IEEE Trans. Image Process. 33 (2024), 2572– 2586. [Link]
  61. Zhiming Liu, Kai Niu, and Zhiqiang He. 2023. ML-CookGAN: Multi-label generative adversarial network for food image generation. ACM Trans. Multim. Comput. Commun. Appl. 19, 2s (2023), 1–21. [Link]
  62. Frank P.-W. Lo, Jianing Qiu, Zeyu Wang, Junhong Chen, Bo Xiao, Wu Yuan, Stamatia Giannarou, Gary Frost, and Benny Lo. 2024. Dietary assessment with multimodal ChatGPT: A systematic analysis. IEEE J. Biomed. Health Informat. (2024), 1–11. [Link]
  63. Ya Lu, Thomai Stathopoulou, and Stavroula Mougiakakou. 2021. Partially supervised multi-task network for single- view dietary assessment. In The International Conference on Pattern Recognition. IEEE, 8156–8163. [Link]
  64. Ya Lu, Thomai Stathopoulou, Maria F. Vasiloglou, Stergios Christodoulidis, Beat Blum, Thomas Walser, Vinzenz Meier, Zeno Stanga, and Stavroula G. Mougiakakou. 2019. An artificial intelligence-based system for nutrient intake assessment of hospitalised patients. In Conference of the IEEE Engineering in Medicine and Biology Society. 5696–5699. [Link]
  65. Ya Lu, ThomaiStathopoulou, Maria F Vasiloglou, Stergios Christodoulidis, Zeno Stanga, and StavroulaMougiakakou. 2020. An artificial intelligence-based system to assess nutrient intake for hospitalised patients. IEEE Trans. Multimed. 23 (2020), 1136–1147. [Link]
  66. Ya Lu, Thomai Stathopoulou, Maria F Vasiloglou, Lillian F Pinault, Colleen Kiley, Elias K Spanakis, and Stavroula Mougiakakou. 2020. goFOODTM: an artificial intelligence system for dietary assessment. Sensors 20, 15 (2020), 4283. [Link]
  67. Peihua Ma, Chun Pong Lau, Ning Yu, An Li, Ping Liu, Qin Wang, and Jiping Sheng. 2021. Image-based nutrient estimation for Chinese dishes using deep learning. Food Research International 147 (2021), 110437. [Link]
  68. Vikram Maharshi, Sumit Sharma, Rahul Prajesh, Samaresh Das, Ajay Agarwal, and Bhaskar Mitra. 2022. A novel sensor for fruit ripeness estimation using lithography free approach. IEEE Sensors Journal 22, 22 (2022), 22192–22199. [Link]
  69. Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nick Johnston, Andrew Rabinovich, and Kevin Murphy. 2015. What’s Cookin’? Interpreting cooking videos using text, speech and vision. [Link]
  70. Willow Mandil, Vishnu Rajendran, Kiyanoush Nazari, and Amir Ghalamzan-Esfahani. 2023. Tactile-sensing technologies: Trends, challenges and outlook in agri-food manipulation. Sensors 23, 17 (2023), 7362. [Link]
  71. Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 187–203. [Link]
  72. Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. 2018. You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Trans. Multimed. 20, 4 (2018), 950–964. [Link]
  73. Weiqing Min, Shuqiang Jiang, and Ramesh Jain. 2019. Food recommendation: Framework, existing solutions, and challenges. IEEE Trans. Multimed. 22, 10 (2019), 2659–2671. [Link]
  74. Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019. A survey on food computing. ACM Comput. Surv. 52, 5 (2019), 1–36. [Link]
  75. Weiqing Min, Shuqiang Jiang, Jitao Sang, Huayang Wang, Xinda Liu, and Luis Herranz. 2017. Being a supercook: Joint food attributes and multimodal content modeling for recipe retrieval and exploration. IEEE Trans. Multimed. 19, 5 (2017), 1100–1113. [Link]
  76. Weiqing Min, Shuqiang Jiang, Shuhui Wang, Jitao Sang, and Shuhuan Mei. 2017. A delicious recipe analysis frame-work for exploring multi-modal recipes with various attributes. In Proceedings of the ACM International Conference on Multimedia. 402–410. [Link]
  77. Weiqing Min, Chunlin Liu, Leyi Xu, and Shuqiang Jiang. 2022. Applications of knowledge graphs for food science and industry. Patterns 3, 5 (2022), 100484. [Link]
  78. Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2020. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In Proceedings of the ACM International Conference on Multimedia. 393–401. [Link]
  79. Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2023. Large scale visual food recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8 (2023), 9932–9949. [Link]
  80. Mark Mirtchouk, Christopher Merck, and Samantha Kleinberg. 2016. Automated estimation of food type and amount consumed from body-worn audio and motion sensors. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. 451–462. [Link]
  81. Koichi Miura, Reiko Hamada, Ichiro Ide, Shuichi Sakai, and Hidehiko Tanaka. 2003. Associating cooking video segments with preparation steps. In Proceedings of the International Conference on Image and Video Retrieval. 174– 183. [Link]
  82. Tatsuya Miyazaki, Gamhewage C De Silva, and Kiyoharu Aizawa. 2011. Image-based calorie content estimation for dietary assessment. In IEEE International Symposium on Multimedia. 363–368. [Link]
  83. Niall Murray, Brian Lee, Yuansong Qiao, and Gabriel-Miro Muntean. 2016. Olfaction-Enhanced Multimedia: A Survey of Application Domains, Displays, and Research Challenges. ACM Comput. Surv. 48, 4 (2016). [Link]
  84. Austin Myers, Nick Johnston, VivekRathod, Anoop Korattikara, Alex Gorban, NathanSilberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin Murphy. 2015. Im2Calories: Towards an automated mobile vision food diary. In IEEE International Conference on Computer Vision. 1233–1241. [Link]
  85. Saeejith Nair, Chi en Amy Tai, Yuhao Chen, and Alexander Wong. 2023. NutritionVerse-Synth: An open access synthetically generated 2D food scene dataset for dietary intake estimation. [Link]
  86. Yiu-Kai Ng and Meilan Jin. 2017. Personalized recipe recommendations for toddlers based on nutrient intake and food preferences. In Proceedings of the International Conference on Management of Digital Ecosystems. 243–250. [Link]
  87. Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, and Shinsuke Mori. 2024. Recipe generation from unsegmented cooking videos. ACM Trans. Multim. Comput. Commun. Appl. (2024). [Link]
  88. Umang Nyati, Sneha Rawat, Devika Gupta, Niyati Aggrawal, and Anuja Arora. 2021. Characterize ingredient network for recipe suggestion. International Journal of Information Technology 13 (2021), 2323–2330. [Link]
  89. National Institutes of Health (NIH) Nutrition Research TaskForce. 2020. 2020–2030 Strategic Plan for NIH Nutrition Research. [Link]
  90. Koichi Okamoto and Keiji Yanai. 2016. An automatic calorie estimation system of food images on a smartphone. In Proceedings of the International Workshop on Multimedia Assisted Dietary Management. 63–70. [Link]
  91. David Amat Olóndriz, Ponç Palau Puigdevall, and Adrià Salvador Palau. 2022. FooDI-ML: A large multi-language dataset of food, drinks and groceries images and descriptions. [Link]
  92. Siyuan Pan, Ling Dai,Xuhong Hou, HuatingLi, and Bin Sheng. 2020. ChefGAN: Food image generation from recipes. In Proceedings of the ACM International Conference on Multimedia. 4244–4252. [Link]
  93. Dim P Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, and Antonio Torralba. 2022. Learning program representations for food images and cooking recipes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16559–16569. [Link]
  94. Dim P Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2019. How to make a pizza: Learning a compositional layer-based GAN model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8002–8011. [Link]
  95. Vasileios Papapanagiotou, Christos Diou, Janet van den Boer, Monica Mars, and AnastasiosDelopoulos. 2021. Recognition of food-texture attributes using an in-ear microphone. In International Conference on Pattern Recognition. Springer, 558–570. [Link]
  96. Vasiliki Pitsilou, George Papadakis, and Dimitrios Skoutas. 2024. Using LLMs to extract food entities from cooking recipes. In IEEE International Conference on Data Engineering Workshops. 21–28. [Link]
  97. Parisa Pouladzadeh and Shervin Shirmohammadi. 2017. Mobile multi-food recognition using deep learning. ACM Transactions on Multimedia Computing Communications and Applications 13, 3s (2017), 36:1–36:21. [Link]
  98. Parisa Pouladzadeh,Abdulsalam Yassine, and Shervin Shirmohammadi. 2015. FooDD: An image-based food detection dataset for calorie measurement. In International Conference on Multimedia Assisted Dietary Management, Vol. 1. [Link]
  99. Joan Peracaula Prat. 2020. A multimodal deep learning approach for food tray recognition. [Link]
  100. Jianing Qiu, Frank P.-W. Lo, Xiao Gu, Modou L. Jobarteh, Wenyan Jia, Tom Baranowski, Matilda Steiner-Asiedu, Alex K. Anderson, Megan A. McCrory, Edward Sazonov, Mingui Sun, Gary Frost, and Benny Lo. 2024. Egocentric image captioning for privacy-preserved passive dietary intake monitoring. IEEE Trans. Cybern. 54, 2 (2024), 679–692. [Link]
  101. Viprav B Raju, Masudul H Imtiaz, and Edward Sazonov. 2023. Food image segmentation using multi-modal imaging sensors with color and thermal data. Sensors 23, 2 (2023), 560. [Link]
  102. Nimesha Ranasinghe and Ellen Yi-Luen Do. 2016. Digital lollipop: Studying electrical stimulation on the human tongue to simulate taste sensations. ACM Trans. Multim. Comput. Commun. Appl. 13, 1, Article 5 (2016), 22 pages. [Link]
  103. Nimesha Ranasinghe, Kuan-Yi Lee, Gajan Suthokumar, and Ellen Yi-Luen Do. 2014. The sensation of taste in the future of immersive media. In Proceedings of the ACM International Workshop on Immersive Media Experiences. 7–12. [Link]
  104. NimeshaRanasinghe, ThiNgoc Tram Nguyen, Yan Liangkun, Lien-Ya Lin, David Tolley, and Ellen Yi-Luen Do. 2017. Vocktail: A virtual cocktail for pairing digital taste, smell, and color sensations. In Proceedings of the ACM International Conference on Multimedia. 1139–1147. [Link]
  105. Madhu Raut, Keyur Prabhu, Rachita Fatehpuria, Shubham Bangar, and Sunita Sahu. 2018. A personalized diet recommendation system using fuzzy ontology. Int. J. Eng. Sci. Invention 7, 3 (2018), 51–55.
  106. Edwaldo Soares Rodrigues, Débora Maria Barroso Paiva, and Álvaro Rodrigues Pereira Júnior. 2021. Recipe analysis for knowledge discovery of gastronomic dishes. Knowledge and Information Systems 63, 8 (2021), 2075–2108. [Link]
  107. Markus Rokicki, Christoph Trattner, and Eelco Herder. 2018. The impact of recipe features, social cues and demographics on estimating the healthiness of online recipes. In Proceedings of the international AAAI conference on web and social media, Vol. 12. [Link]
  108. Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez, Isabel Espinosa- Salinas, Gala Freixer, Enrique Carrillo de Santa Pau,AnaRamírez de Molina, and Javier Ortega-Garcia. 2024. Leveraging automatic personalised nutrition: food image recognition benchmark and dataset based on nutrition taxonomy. Multimedia Tools and Applications (2024), 1–22. [Link]
  109. Ali Rostami. 2024. An integrated framework for contextual personalized LLM-based food recommendation. Ph. D. Dissertation. UC Irvine. [Link]
  110. Ali Rostami, Vaibhav Pandey, Nitish Nag, Vesper Wang, and Ramesh Jain. 2020. Personal food model. In Proceedings of the ACM International Conference on Multimedia. 4416–4424. [Link]
  111. Robin Ruede, Verena Heusser, Lukas Frank, Alina Roitberg, Monica Haurilet, and Rainer Stiefelhagen. 2021. Multi-task learning for calorie prediction on a novel large-scale recipe dataset enriched with nutritional information. In International Conference on Pattern Recognition. IEEE, 4001–4008. [Link]
  112. Raffaella I Rumiati and Francesco Foroni. 2016. We are what we eat: How food is represented in our mind/brain. Psychonomic bulletin & review 23 (2016), 1043–1054. [Link]
  113. Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Ossia, Hamid R Rabiee, Hamed Haddadi, Yelena Mejova, Mirco Mu- solesi, Emiliano De Cristofaro, and Gianluca Stringhini. 2017. Kissing cuisines: Exploring worldwide culinary habits on the web. In Proceedings of the International Conference on World Wide Web Companion. 1013–1021. [Link]
  114. Amaia Salvador, Michal Drozdzal, Xavier Giró-iNieto, and Adriana Romero. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462. [Link]
  115. Amaia Salvador, Erhan Gundogdu, Loris Bazzani, and Michael Donoser. 2021. Revamping cross-modal recipe retrieval with hierarchical transformers and self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15475–15484. [Link]
  116. Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3020–3028. [Link]
  117. Juliane R Sempionatto, Victor Ruiz-Valdepenas Montiel, Eva Vargas, Hazhir Teymourian, and Joseph Wang. 2021. Wearable and mobile sensors for personalized nutrition. ACS sensors 6, 5 (2021), 1745–1760. [Link]
  118. Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. 2023. Vision-based food nutrition estimation via RGB-D fusion network. Food Chemistry 424 (2023), 136309. [Link]
  119. Mustafa Shukor, Guillaume Couairon, Asya Grechka, and Matthieu Cord. 2022. Transformer decoders with multi- modal regularization for cross-modal food retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4567–4578. [Link]
  120. Mustafa Shukor, Nicolas Thome, and Matthieu Cord. 2024. Vision and structured-language pretraining for cross-modal food retrieval. Computer Vision and Image Understanding (2024), 104071. [Link]
  121. Esha Singh, Anu Bompelli, Ruyuan Wan, Jiang Bian, Serguei Pakhomov, and Rui Zhang. 2022. A conversational agent system for dietary supplements use. BMC medical informatics and decision making 22 (2022), 153. [Link]
  122. Sulfayanti F Situju, Hironori Takimoto, Suzuka Sato, Hitoshi Yamauchi, Akihiro Kanagawa, and Armin Lawi. 2019. Food constituent estimation for lifestyle disease prevention by multi-task CNN. Appl. Artif. Intell. 33, 8 (2019), 732– 746. [Link]
  123. Jiajun Song, Zhuo Li, Weiqing Min, and Shuqiang Jiang. 2024. Towards food image retrieval via generalization-oriented sampling and loss function design. ACM Trans. Multim. Comput. Commun. Appl. 20, 1 (2024), 13:1–13:19. [Link]
  124. Yu Sugiyama and Keiji Yanai. 2021. Cross-modal recipe embeddings by disentangling recipe contents and dish styles. In Proceedings of the ACM International Conference on Multimedia. 2501–2509. [Link]
  125. Yusuke Tahara and Kiyoshi Toko. 2013. Electronic tongues–a review. IEEE Sensors Journal 13, 8 (2013), 3001–3011. [Link]
  126. Hongwei Tan, Yifan Zhou, Quanzheng Tao, Johanna Rosen, and Sebastiaan van Dijken. 2021. Bioinspired multisensory neural network with crossmodal integration and recognition. Nature Communications 12, 1 (2021), 1120. [Link]
  127. Yongting Tao and Jun Zhou. 2017. Automatic apple recognition based on the fusion of color and 3D feature for robotic fruit picking. Comput. Electron. Agric. 142 (2017), 388–396. [Link]
  128. Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, and Jack Sim. 2021. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8903–8911. [Link]
  129. Raciel Yera Toledo, Ahmad A Alzahrani, and Luis Martinez. 2019. A food recommender system considering nutritional information and user preferences. IEEE Access 7 (2019), 96695–96711. [Link]
  130. Christoph Trattner and David Elsweiler. 2019. Food recommendations. In Collaborative recommendations: Algorithms, practical challenges and applications. World Scientific, 653–685. [Link]
  131. Justus V. Verhagen. 2007. The neurocognitive bases of human multimodal food perception: Consciousness. Brain Research Reviews 53, 2 (2007), 271–286. [Link]
  132. JustusV. Verhagen and Lina Engelen. 2006. The neurocognitive bases of human multimodal food perception: Sensory integration. Neuroscience & Biobehavioral Reviews 30, 5 (2006), 613–650. [Link]
  133. Gautham Vinod, Jiangpeng He, Zeman Shao, and Fengqing Zhu. 2024. Food portion estimation via 3D object scaling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3741–3749. [Link]
  134. Binglu Wang, Tianci Bu, Zaiyi Hu, Le Yang, Yongqiang Zhao, and Xuelong Li. 2024. Coarse-to-fine nutrition prediction. IEEE Trans. Multimed. 26 (2024), 3651–3662. [Link]
  135. Hao Wang, Guosheng Lin, Steven CH Hoi, and Chunyan Miao. 2020. Structure-aware generation network for recipe generation from images. In Europeon Conference on Computer Vision. 359–374. [Link]
  136. Hao Wang, Guosheng Lin, Steven CH Hoi, and Chunyan Miao. 2021. Cycle-consistent inverse GAN for text-to-image synthesis. In Proceedings of the ACM International Conference on Multimedia. 630–638. [Link]
  137. Hao Wang, Guosheng Lin, Steven CH Hoi, and Chunyan Miao. 2022. Learning structural representations for recipe generation and food retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2022), 3363–3377. [Link]
  138. Hao Wang, Doyen Sahoo, Chenghao Liu, Ee-peng Lim, and Steven CH Hoi. 2019. Learning cross-modal embeddings with adversarial networks for cooking recipes and food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11572–11581. [Link]
  139. Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-peng Lim, and Steven CH Hoi. 2021. Cross-modal food retrieval: learning a joint embedding of food images and recipes with semantic consistency and attention mechanism. IEEE Trans. Multimed. 24 (2021), 2515–2525. [Link]
  140. Jing Wang, Yuanjie Zheng, Junxia Wang, Xiao Xiao, Jing Sun, and Sujuan Hou. 2024. RD-FGM: A novel model for high-quality and diverse food image generation and ingredient classification. Expert Systems with Applications (2024), 124720. [Link]
  141. Lanjun Wang, Chenyu Zhang, An-An Liu, Bo Yang, Mingwang Hu, Xinran Qiao, Lei Wang, Jianlin He, and Qiang Liu. 2024. Toward chinese food understanding: a cross-modal ingredient-level benchmark. IEEE Trans. Multimed.(2024), 1–15. [Link]
  142. W. Wang, L. Duan, H. Jiang, P. Jing, X. Song, and L. Nie. 2021. Market2Dish: Health-aware Food Recommendation. ACM Trans. Multim. Comput. Commun. Appl. 17, 1 (2021), 33:1–33:19. [Link]
  143. Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frederic Precioso. 2015. Recipe recognition with large multimodal food dataset. In IEEE International Conference on Multimedia & Expo Workshops. IEEE, 1–6. [Link]
  144. Zhiling Wang, Weiqing Min, Zhuo Li, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. 2022. Ingredient-guided region discovery and relationship modeling for food category-ingredient prediction. In IEEE Trans. Image Process., Vol. 31. 5214–5226. [Link]
  145. Florian Weidner, Jana E. Maier, and Wolfgang Broll. 2023. Eating, smelling, and seeing: Investigating multisensory integration and (in)congruent stimuli while eating in VR. IEEE Trans. Visual. Comput. Graphics 29, 5 (2023), 2423– 2433. [Link]
  146. Min Wen, Jiajun Song, Weiqing Min, Weimin Xiao, LinHan, and Shuqiang Jiang. 2023. Multi-state Ingredient Recog- nition via Adaptive Multi-centric Network. IEEE Trans. Ind. Informat. 20, 4 (2023), 5692–5701. [Link]
  147. Jianlong Wu, Liangming Pan, Jingjing Chen, and Yu-Gang Jiang. 2022. Ingredient-enriched recipe generation from cooking videos. In Proceedings of the International Conference on Multimedia Retrieval. 249–257. [Link]
  148. Wen Wu and Jie Yang. 2009. Fast food recognition from videos of eating for calorie estimation. In IEEE International Conference on Multimedia and Expo. 1210–1213. [Link]
  149. Xiongwei Wu, Sicheng Yu, Ee-Peng Lim, and Chong-Wah Ngo. 2024. OVFoodSeg: Elevating open-vocabulary food image segmentation via image-informed textual representation. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. 4144–4153. [Link]
  150. Jiahua Xiao, Yantao Ji, and Xing Wei. 2023. Hyperspectral Image Denoising with Spectrum Alignment. In Proceedings of the 31st ACM International Conference on Multimedia. 5495–5503. [Link]
  151. Zhongwei Xie, Ling Liu, Yanzhao Wu, Luo Zhong, and Lin Li. 2021. Learning text-image joint embedding for efficient cross-modal retrieval with deep feature engineering. ACM Trans. Inf. Syst. 40, 4 (2021), 1–27. [Link]
  152. Mengling Xu, Jie Wang, Ming Tao, Bing-Kun Bao, and Changsheng Xu. 2024. CookGALIP: Recipe controllable generative adversarial CLIPs with sequential ingredient prompts for food image generation. IEEE Trans. Multimed. (2024). [Link]
  153. Ruihan Xu, Luis Herranz, Shuqiang Jiang,Shuang Wang, Xinhang Song, and Ramesh Jain. 2015. Geolocalized mod- eling for dish recognition. IEEE Trans. Multimed. 17, 8 (2015), 1187–1199. [Link]
  154. Semih Yagcioglu, Aykut Erdem, Erkut Erdem, and Nazli Ikizler-Cinbis. 2018. RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. [Link]
  155. Yoko Yamakata, Akihisa Ishino, Akiko Sunto, Sosuke Amano, and Kiyoharu Aizawa. 2022. Recipe-oriented food logging for nutritional management. In Proceedings of the ACM International Conference on Multimedia. Association for Computing Machinery, 6898–6904. [Link]
  156. Zhongqi Yang, Elahe Khatibi,NitishNagesh, Mahyar Abbasian, ImanAzimi, Ramesh Jain, and Amir M Rahmani. 2024. ChatDiet: Empowering personalized nutrition-oriented food recommender chatbots through an LLM-augmented framework. Smart Health 32 (2024), 100465. [Link]
  157. Mengyang Zhang, Guohui Tian, Ying Zhang, and Peng Duan. 2021. Reinforcement learning for logic recipe generation: Bridging gaps from images to plans. IEEE Trans. Multimed. 24 (2021), 352–365. [Link]
  158. Weishan Zhang, Yuanjie Zhang, Jia Zhai, Dehai Zhao, Liang Xu, Jiehan Zhou, Zhongwei Li, and Su Yang. 2018. Multi-source data fusion using deep learning for smart refrigerators. Computers in Industry 95 (2018), 15–21. [Link]
  159. Yixin Zhang, Xin Zhou, Qianwen Meng, Fanglin Zhu, Yonghui Xu, Zhiqi Shen, and Lizhen Cui. 2024. Multi-modal food recommendation using clustering and self-supervised learning. [Link]
  160. Zhen Zhang, Jun Zhou, Zhenghong Yan, Kai Wang, Jiamin Mao, and Zizhen Jiang. 2021. Hardness recognition of fruits and vegetables based on tactile array information of manipulator. Comput. Electron. Agric. 181 (2021), 105959. [Link]
  161. Luowei Zhou, Chenliang Xu, and Jason Corso. 2018. Towards automatic learning of procedures from web instruc- tional videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. [Link]
  162. Lei Zhou, Chu Zhang, Fei Liu, Zhengjun Qiu, and Yong He. 2019. Application of deep learning in food: a review. Comprehensive Reviews in Food Science and Food Safety 18, 6 (2019), 1793–1811. [Link]
  163. Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying Jin, MingyuHuang, Xiangyang Li, Shuhuan Mei, and Shuqiang Jiang. 2024. FoodSky: A Food-oriented large language model that passes the chef and dietetic examination. [Link]
  164. Pengfei Zhou, Weiqing Min, Jiajun Song, Yang Zhang, and Shuqiang Jiang. 2024. Synthesizing knowledge-enhanced features for real-world zero-shot food detection. IEEE Trans. Image Process. 33 (2024), 1285–1298. [Link]
  165. Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying Jin, and Shuqiang Jiang. 2023. SeeDS: Semantic separable diffusion synthesizer for zero-shot food detection. In Proceedings of the ACM International Conference on Multimedia. 8157–8166. [Link]
  166. Xin Zhou and Zhiqi Shen. 2023. A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In Proceedings of the ACM International Conference on Multimedia. 935–943. [Link]
  167. Bin Zhu and Chong-Wah Ngo. 2020. CookGAN: Causality based text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5519–5527. [Link]
  168. Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Yanbin Hao. 2019. R2GAN: Cross-modal recipe retrieval with generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11477–11486. [Link]