Vision-Based Fruit Recognition via Multi-Scale Attention CNN

Weiqing Min^a,b, Zhiling Wang^a,b, Jiahao Yanga^a,b, Chunlin Liu^a,b, Shuqiang Jiang^a,b,*

^aThe Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

^bUniversity of Chinese Academy of Sciences

Multi-Scale Attention Network (MSANet) explores attention from different layers of CNNs and aggregates various visual attentional features from different levels into final comprehensive representations. Extensive evaluations on four fruit benchmark datasets demonstrate that the method achieves state-of-the-art recognition performance.

Download

The following files are provided:

simplified_main_model.py       
simplified_train.py  
cbam_model.py   
shared_lib.py   
senet154-c7b49a05.pth

To download all files as a zip package, click here.

Usage

Train and Test

To train a model, simply run:

python simplified_train.py --model MODEL --dataset DATASET_NAME --pathModelParams root/to/your/model.ckpt --mode train

where the training model can be set by --model option: MODEL represents MSANet, MODEL_CBAM represents SENet(CBAM), and MODEL_BASE represents SENet154, respectively. Also, the dataset can be set by --dataset option, the detail of which can be seen in Config dataset; and the model path is set by --pathModelParams.

To test the trained model, simply change --mode from train to test, like this:

python simplified_train.py --model MODEL --dataset DATASET_NAME --pathModelParams root/to/your/model.ckpt --mode test

If necessary, the following arguments can also be set:

--lr : learning rate
--epoch : number of epochs
--kfold : run the model on the K-th fold dataset. Specified when train/evaluate with K-fold cross-evaluation.
--trainBatchSize ：training batch size
--testBatchSize ：testing batch size
--loadModel ：load model parameters
--weightpath ：init weight path
--print_freq ：print frequency
--no-ema : do not apply EMA

Config dataset

Dataset metadata is stored in shared_lib.py. To add your dataset, first, organize your dataset like this:

|-shared/path/of/images
    |---cls1
        |---img1.jpg
        |---img2.jpg
        |---...
        |---imgN.jpg
    |---cls2
        |---...
    |---cls3
        |---...
    ......
    |---clsM
        |---...

Also, format your training and test text files like this:

 cls1/img1.jpg 1
 cls1/img2.jpg 1
 ...
 cls1/imgN.jpg 1
 cls2/img1.jpg 2
 ...
 clsM/imgL.jpg M

When everything is done, add your dataset to dataset_metalist like this:

dataset_metalist=dict(
    DATASET_NAME = dict(
        NUM_CLASSES = M,
        DIR_TRAIN_IMAGES='path/to/the/training.txt',
        DIR_TEST_IMAGES='path/to/the/test.txt',
        IMAGE_PREFIX='shared/path/of/images'
    )
)

where DIR_TRAIN_IMAGES and DIR_TEST_IMAGES places your training and test files, and IMAGE_PREFIX is the root directory of your dataset. Later, you can add --dataset DATASET_NAME to the command line to use it.

We also provide the mirroring file of common fruit datasets, like Fru92 and FruVeg81. You can click the corresponding hyperlinks to download them.

Load Model

To utilize MSANet, SENet154(CBAM), and SENet154 in your own code, you can import them from simplified_main_model.py, and cbam_model.py. Specifically, run:

from simplified_main_model import MODEL as MSANet
from simplified_main_model import MODEL_BASE as SENet154
from cbam_model import MODEL_CBAM as SENet154_CBAM

Citation

If you would like to refer to our paper, please cite it:

@article{min2023vision,
    title={Vision-based fruit recognition via multi-scale attention CNN},
    author={Min, Weiqing and Wang, Zhiling and Yang, Jiahao and Liu, Chunlin and Jiang, Shuqiang},
    journal={Computers and Electronics in Agriculture},
    volume={210},
    pages={107911},
    year={2023}
}