[add]上传训练benchmark by z00560161

2020-10-19 20:22:23 +08:00
parent 22b83024f5
commit 82522e2f61
1225 changed files with 345421 additions and 0 deletions
@@ -0,0 +1,141 @@
+#  YOLOv3_TensorFlow训练说明
+
+### 1. 介绍
+YOLOv3是基于第三方TensorFlow开源代码，使用darknet-53作为主干网络，同时支持单尺度与多尺度训练。包含训练集和验证集两部分，可选用包括COCO2014、COCO2017等， 本文档以COCO2014数据集为例，说明yolov3训练操作步骤。
+
+### 2. 运行环境
+Python版本: 3.7.5
+主要python三方库:
+- tensorflow >= 1.15.0 (satisfied with NPU)
+
+- opencv-python
+
+  1、直接pip  install opencv-python
+
+  2、如果直接使用pip install opencv-python无法正常安装三方库，则采用离线安装方法安装。
+
+      (1)'解压opencv包'
+        
+      (2)'进入解压后的opencv包  cd opencv'
+        
+      (3)'mkdir -p build'
+        
+      (4)'cd build'
+        
+      (5)'cmake -D BUILD_opencv_python3=yes -D BUILD_opencv_python2=no -D          PYTHON3_EXECUTABLE=/usr/local/python3.7.5/bin/python3.7m -D PYTHON3_INCLUDE_DIR=/usr/local/python3.7.5/include/python3.7m -D PYTHON3_LIBRARY=/usr/local/python3.7.5/lib/libpython3.7m.so -D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/local/python3.7.5/lib/python3.7/site-packages/numpy/core/include -D PYTHON3_PACKAGES_PATH=/usr/local/python3.7.5/lib/python3.7/site-packages -D PYTHON_DEFAULT_EXECUTABLE=/usr/local/python3.7.5/bin/python3.7m ..'
+        
+      (5)'make -j4'
+      (6)'make install'
+
+   说明：cmake -D 后参数匹配当前环境
+
+- tqdm          安装方式：pip  install  tqdm
+
+- pycocotools     安装方式：pip  install pycocotools
+
+  说明： 评测的时候需要用到三方库pycocotools
+
+### 3. 数据集预处理
+#### 3.1 修改coco_dataset_path的值
+在yolov3/tensorflow/code下对coco_minival_anns.py和coco_trainval_anns.py中coco_dataset_path的值改为当前环境的数据集路径， 如/opt/dataset/coco2014。
+
+#### 3.2 运行脚本
+```
+python3.7 coco_minival_anns.py
+python3.7 coco_trainval_anns.py
+```
+生成训练和验证样本标注文件coco2014_trainval.txt和coco2014_minival.txt，请将这2个文件放置到yolov3/tensorflow/code/data下。
+生成的txt文件内容示例如下：
+```
+0 xxx/xxx/a.jpg 1920 1080 0 453 369 473 391 1 588 245 608 268
+1 xxx/xxx/b.jpg 1920 1080 1 466 403 485 422 2 793 300 809 320
+...
+```
+
+### 4. 准备预训练模型
+#### 4.1 下载预训练模型
+请从链接https://pjreddie.com/media/files/yolov3.weights下载darknet框架下的预训练模型。
+
+#### 4.2  模型转换
+使用train/atlas_benchmark-master/object_detection/yolov3/tensorflow/code下的convert_weight.py将预处理模型转换为TensorFlow框架的ckpt文件：
+在convert_weight.py中将weight_path修改为下载下的预训练模型文件的路径，save_path的值修改为命名的转换为TensorFlow框架的ckpt文件的路径； 如
+```
+weight_path = '../yolov3-tf2/data/darknet53.conv.74'
+save_path = './data/darknet_weights/darknet53.ckpt'
+```
+然后执行
+```
+python3.7 convert_weight.py
+
+```
+注意：save_path中ckpt文件的路径不是在train/atlas_benchmark-master/object_detection/yolov3/tensorflow/code/data/darknet_weights/下时， 请将其手动移至该路径；
+
+### 5. 模型训练
+#### 5.1 训练参数配置
+在train/yaml/YoLoV3.yaml中修改相应配置， 配置项含义:
+```
+mode: yolov3的单尺度或者多尺度模式，值为single或者 multi
+data_url:数据集路径
+runmode: 运行模式，是训练还是评测，值为train或者evaluate
+ckpt_path: 评测时要用到的ckpt文件的路径， 仅在evaluate时用到
+total_epoches: 跑多少个epoch，
+save_epoch: 多少epoch保存一次ckpt文件
+device_group_1p: 跑1p时的device_id
+device_group_2p: 跑2p时的device_id
+device_group_4p: 跑4p时的device_id
+mpirun_ip: 仅集群场景时需要配置, 格式ip1:卡数量1,ip2:卡数量2
+docker_image: docker镜像名称:版本号
+```
+YoLoV3.yaml中配置项示例：
+```
+mode: single
+data_url: /opt/npu/dataset
+runmode: train
+ckpt_path: /home/benchmark-master720/train/atlas_benchmark-master/object_detection/yolov3/tensorflow/result/TrainingJob-20200724115042
+total_epoches: 1
+save_epoch: 3
+device_group_1p: 0
+device_group_2p: 0 1
+device_group_4p: 0 1 2 3
+mpirun_ip: 90.90.176.152:8,90.90.176.154:8
+docker_image: mpirun3:latest
+```
+
+#### 5.2 训练脚本启动
+当前路径为benchmark包的train文件夹下
+```
+bash benchmark.sh -e YoLoV3 -hw 1p              # host侧1p
+bash benchmark.sh -e YoLoV3 -hw 8p              # host侧8p
+bash benchmark.sh -e YoLoV3 -hw 1p -docker      # docker侧1p
+bash benchmark.sh -e YoLoV3 -hw 8p -docker      # docker侧8p
+bash benchmark.sh -e YoLoV3 -ct                 # host侧集群
+bash benchmark.sh -e YoLoV3 -ct -docker         # docker侧集群
+```
+
+#### 5.3 训练日志
+日志在benchmark包的train路径下reuslt中找到YoLoV3的文件夹里。
+```
+./result/tf_yolov3/TrainingJob-2020xxxxxxxxxx/train_${device_id}.log
+./result/TrainingJob-2020xxxxxxxxxx/train_${device_id}.log
+./result/tensorflow/yolov3t/TrainingJob-2020xxxxxxxxxx/device_id/hw_yolov3.log
+```
+
+### 6. 模型评测
+将train/yaml/YoLoV3.yaml中ckpt_path的值改为训练产生的日志的路径， runmode的值改为evaluate，如5.1中示例；
+然后运行与训练时相同的脚本，结果参看见train.log。
+
+
+### 7. 训练结果参考
+
+| Model                 | Npu_nums | mAP      | FPS       |
+| :-------------------- | :------: | :------: | :------:  |
+| single_scale          | 8        |    30.0  | 740       |
+| multi_scale           | 8        |    31.0  | 340       |
+| single_scale          | 1        |    ----  | 96        |
+| multi_scale           | 1        |    ----  | 44        |
+
+
+
+-------
+
+
@@ -0,0 +1,13 @@
+
+# dirs
+.idea/
+__pycache__/
+tmp*/
+
+# fils
+*.pyc
+*.log
+*.out
+
+data/darknet_weights/*.ckpt*
+
@@ -0,0 +1,140 @@
+#  YOLOv3_TensorFlow
+
+### 1. Introduction
+This is npu implementation of [YOLOv3](https://pjreddie.com/media/files/papers/YOLOv3.pdf) using TensorFlow modified from [YOLOv3_TensorFlow](https://github.com/wizyoung/YOLOv3_TensorFlow).   
+
+### 2. Requirements
+Python version: 3.7.5  
+Main Python Packages:
+- tensorflow >= 1.15.0 (satisfied with NPU)
+- opencv-python
+- tqdm
+
+### 3. Weights convertion
+The pretrained darknet53 weights file can be downloaded [here](https://pjreddie.com/media/files/darknet53.conv.74).        
+Place this weights file under directory `./data/darknet_weights/` and then run:
+```python
+python3 convert_weight.py
+```
+Then the converted TensorFlow checkpoint file will be saved to `./data/darknet_weights/` directory.  
+In this repo, conerted weight file may be contained. 
+
+### 4. Training
+#### 4.1 Data preparation 
+0. dataset
+To compare with official implement, for example, we use [get_coco_dataset.sh](https://github.com/pjreddie/darknet/blob/master/scripts/get_coco_dataset.sh) to prepare our dataset.
+
+1. annotation file
+- ATTENTION: you can use easy tricks to fit default setting
+    - ln -s ${real_dataset_path} /opt/npu/dataset/coco 
+Using script generate `coco2014_trainval.txt/coco2014_minival.txt` files under `./data/` directory.
+```python
+python3 coco_trainval_anns.py
+python3 coco_minival_anns.py
+```   
+One line for one image, in the format like `image_index image_absolute_path img_width img_height box_1 box_2 ... box_n`.    
+Box_x format: 
+- `label_index x_min y_min x_max y_max`. (The origin of coordinates is at the left top corner, left top => (xmin, ymin), right bottom => (xmax, ymax).)       
+-  `image_index` is the line index which starts from zero. `label_index` is in range [0, class_num - 1].
+
+For example:
+```
+0 xxx/xxx/a.jpg 1920 1080 0 453 369 473 391 1 588 245 608 268
+1 xxx/xxx/b.jpg 1920 1080 1 466 403 485 422 2 793 300 809 320
+...
+```
+
+(2)  class_names file:
+Generate the `data.names` file under `./data/` directory. Each line represents a class name.     
+For example:     
+```
+bird
+person
+bike
+...
+```
+
+The COCO dataset class names file is placed at `./data/coco.names`.
+
+(3) prior anchor file:
+
+Using the kmeans algorithm to get the prior anchors:
+
+```
+python get_kmeans.py
+```
+
+Then you will get 9 anchors and the average IoU. Save the anchors to a txt file.
+
+The COCO dataset anchors offered by YOLO's author is placed at `./data/yolo_anchors.txt`, you can use that one too.
+
+The yolo anchors computed by the kmeans script is on the resized image scale.  The default resize method is the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+
+#### 4.2 Training
+1. single scale
+Using `npu_train_*p_single.sh`. The hyper-parameters and the corresponding annotations can be found in `args_single.py`:
+
+```shell
+bash npu_train_1p_single.sh 
+or 
+bash npu_train_8p_single.sh
+```
+
+2. multi scale
+Using `npu_train_*p_multi.sh`. The hyper-parameters and the corresponding annotations can be found in `args_multi.py`:
+
+```shell
+bash npu_train_1p_multi.sh 
+or 
+bash npu_train_8p_multi.sh
+```
+
+Check the `args.py` for more details. You should set the parameters yourself in your own specific task.
+
+3. training details
+     1. nohup.out -- training task main_log
+     2. ./training/t1/D0/train_0.log -- training host log
+     3. training/t1/D0/training/train.log -- training perf log
+
+### 5. Evaluation
+
+Using `eval.sh` to evaluate the validation or test dataset. The parameters are as following:
+
+```shell
+bash eval.sh
+```
+
+Check the `eval.py` for more details. You could set the parameters yourself. 
+
+You will get the mAP metrics results using official cocoapi.
+Using `tail -f eval_*.out` to watching results of models.
+
+
+### 6. Training result
+
+| Model                 | Npu_nums | mAP      | FPS       |
+| :-------------------- | :------: | :------: | :------:  |
+| single_scale          | 8        |    30.0  | 740       |
+| multi_scale           | 8        |    31.0  | 340       |
+| single_scale          | 1        |    ----  | 96        |
+| multi_scale           | 1        |    ----  | 44        |
+
+
+
+
+-------
+
+### Credits:
+
+I referred to many fantastic repos during the implementation:
+
+[YunYang1994/tensorflow-yolov3](https://github.com/YunYang1994/tensorflow-yolov3)
+
+[qqwweee/keras-yolo3](https://github.com/qqwweee/keras-yolo3)
+
+[eriklindernoren/PyTorch-YOLOv3](https://github.com/eriklindernoren/PyTorch-YOLOv3)
+
+[pjreddie/darknet](https://github.com/pjreddie/darknet)
+
+[dmlc/gluon-cv](https://github.com/dmlc/gluon-cv/tree/master/scripts/detection/yolo)
+
@@ -0,0 +1,110 @@
+# coding: utf-8
+# This file contains the parameter used in train.py
+
+from __future__ import division, print_function
+
+from utils.misc_utils import parse_anchors, read_class_names
+import math
+import os
+
+
+save_dir =          './training/'  # The directory of the weights to save.
+log_dir =           './training/logs/'  # The directory to store the tensorboard log files.
+progress_log_path = './training/train.log'  # The path to record the training progress.
+# save_dir = os.path.join(work_path, save_dir)
+# log_dir = os.path.join(work_path, log_dir)
+# progress_log_path = os.path.join(work_path, progress_log_path)
+
+if not os.path.exists(save_dir):
+    os.makedirs(save_dir)
+if not os.path.exists(log_dir):
+    os.makedirs(log_dir)
+
+
+work_path = os.path.realpath(__file__+"/..")
+### Some paths
+train_file =        os.path.realpath(os.path.join(work_path, './data/coco2014_trainval.txt'))  # The path of the training txt file.
+val_file =          os.path.realpath(os.path.join(work_path, './data/coco2014_minival.txt'))  # The path of the validation txt file.
+restore_path =      os.path.realpath(os.path.join(work_path, './data/darknet_weights/darknet53.ckpt'))  # The path of the weights to restore.
+anchor_path =       os.path.realpath(os.path.join(work_path, './data/yolo_anchors.txt'))  # The path of the anchor txt file.
+class_name_path =   os.path.realpath(os.path.join(work_path, './data/coco.names'))  # The path of the class names.
+
+### Distribution setting
+num_gpus=int(os.environ['RANK_SIZE'])
+iterations_per_loop=10
+
+### Training releated numbersls
+
+batch_size = 16
+img_size = [608, 608]  # Images will be resized to `img_size` and fed to the network, size format: [width, height]
+letterbox_resize = True  # Whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+total_epoches = 200
+train_evaluation_step = 1000  # Evaluate on the training batch after some steps.
+val_evaluation_epoch = 2  # Evaluate on the whole validation dataset after some epochs. Set to None to evaluate every epoch.
+save_epoch = 10  # Save the model after some epochs.
+batch_norm_decay = 0.99  # decay in bn ops
+weight_decay = 5e-4  # l2 weight decay
+global_step = 0  # used when resuming training
+
+### tf.data parameters
+num_threads = 8  # Number of threads for image processing used in tf.data pipeline.
+prefetech_buffer = batch_size * 4  # Prefetech_buffer used in tf.data pipeline.
+
+### Learning rate and optimizer
+optimizer_name = 'momentum'  # Chosen from [sgd, momentum, adam, rmsprop]
+save_optimizer = True  # Whether to save the optimizer parameters into the checkpoint file.
+learning_rate_base = 75e-4
+learning_rate_base_batch_size = 64
+learning_rate_init = learning_rate_base * ((batch_size * num_gpus) / learning_rate_base_batch_size)
+lr_type = 'piecewise'  # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
+lr_decay_epoch = 5  # Epochs after which learning rate decays. Int or float. Used when chosen `exponential` and `cosine_decay_restart` lr_type.
+lr_decay_factor = 0.96  # The learning rate decay factor. Used when chosen `exponential` lr_type.
+lr_lower_bound = 1e-6  # The minimum learning rate.
+# only used in piecewise lr type
+pw_boundaries = [80, 90]  # epoch based boundaries
+pw_values = [learning_rate_init, learning_rate_init*0.1, learning_rate_init*0.01]
+
+### Load and finetune
+# Choose the parts you want to restore the weights. List form.
+# restore_include: None, restore_exclude: None  => restore the whole model
+# restore_include: None, restore_exclude: scope  => restore the whole model except `scope`
+# restore_include: scope1, restore_exclude: scope2  => if scope1 contains scope2, restore scope1 and not restore scope2 (scope1 - scope2)
+# choise 1: only restore the darknet body
+# restore_include = ['yolov3/darknet53_body']
+restore_exclude = None
+# choise 2: restore all layers except the last 3 conv2d layers in 3 scale
+restore_include = None
+# restore_exclude = ['yolov3/yolov3_head/Conv_14', 'yolov3/yolov3_head/Conv_6', 'yolov3/yolov3_head/Conv_22']
+# restore_exclude = None
+# Choose the parts you want to finetune. List form.
+# Set to None to train the whole model.
+# update_part = ['yolov3/yolov3_head']
+update_part = None
+
+### other training strategies
+multi_scale_train = True  # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
+use_label_smooth = False # Whether to use class label smoothing strategy.
+use_focal_loss = False  # Whether to apply focal loss on the conf loss.
+use_mix_up = False  # Whether to use mix up data augmentation strategy.
+use_warm_up = True  # whether to use warm up strategy to prevent from gradient exploding.
+warm_up_epoch = min(total_epoches*0.1, 3)  # Warm up training epoches. Set to a larger value if gradient explodes.
+
+### some constants in validation
+# nms
+nms_threshold = 0.5  # iou threshold in nms operation
+score_threshold = 0.001  # threshold of the probability of the classes in nms operation, i.e. score = pred_confs * pred_probs. set lower for higher recall.
+nms_topk = 100  # keep at most nms_topk outputs after nms
+# mAP eval
+eval_threshold = 0.5  # the iou threshold applied in mAP evaluation
+use_voc_07_metric = False  # whether to use voc 2007 evaluation metric, i.e. the 11-point metric
+
+### parse some params
+anchors = parse_anchors(anchor_path)
+classes = read_class_names(class_name_path)
+class_num = len(classes)
+train_img_cnt = len(open(train_file, 'r').readlines())
+val_img_cnt = len(open(val_file, 'r').readlines())
+train_batch_num = int(float(train_img_cnt) / batch_size / num_gpus)
+
+lr_decay_freq = int(train_batch_num * lr_decay_epoch)
+pw_boundaries = [float(i) * train_batch_num + global_step for i in pw_boundaries]
@@ -0,0 +1,105 @@
+# coding: utf-8
+# This file contains the parameter used in train.py
+
+from __future__ import division, print_function
+
+from utils.misc_utils import parse_anchors, read_class_names
+import math
+import os
+
+save_dir =          './training/'  # The directory of the weights to save.
+log_dir =           './training/logs/'  # The directory to store the tensorboard log files.
+progress_log_path = './training/train.log'  # The path to record the training progress.
+
+if not os.path.exists(save_dir):
+    os.makedirs(save_dir)
+if not os.path.exists(log_dir):
+    os.makedirs(log_dir)
+
+
+work_path = os.path.realpath(__file__+"/..")
+### Some paths
+train_file =        os.path.realpath(os.path.join(work_path, './data/coco2014_trainval.txt'))  # The path of the training txt file.
+val_file =          os.path.realpath(os.path.join(work_path, './data/coco2014_minival.txt'))  # The path of the validation txt file.
+restore_path =      os.path.realpath(os.path.join(work_path, './data/darknet_weights/darknet53.ckpt'))  # The path of the weights to restore.
+anchor_path =       os.path.realpath(os.path.join(work_path, './data/yolo_anchors.txt'))  # The path of the anchor txt file.
+class_name_path =   os.path.realpath(os.path.join(work_path, './data/coco.names'))  # The path of the class names.
+
+### Distribution setting
+num_gpus=int(os.environ['RANK_SIZE'])
+iterations_per_loop=10
+
+### Training releated numbersls
+
+batch_size = 32
+img_size = [416, 416]  # Images will be resized to `img_size` and fed to the network, size format: [width, height]
+letterbox_resize = True  # Whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+total_epoches = 200
+train_evaluation_step = 1000  # Evaluate on the training batch after some steps.
+val_evaluation_epoch = 2  # Evaluate on the whole validation dataset after some epochs. Set to None to evaluate every epoch.
+save_epoch = 10  # Save the model after some epochs.
+batch_norm_decay = 0.99  # decay in bn ops
+weight_decay = 5e-4  # l2 weight decay
+global_step = 0  # used when resuming training
+
+### tf.data parameters
+num_threads = 8  # Number of threads for image processing used in tf.data pipeline.
+prefetech_buffer = batch_size * 4   # Prefetech_buffer used in tf.data pipeline.
+
+### Learning rate and optimizer
+optimizer_name = 'momentum'  # Chosen from [sgd, momentum, adam, rmsprop]
+save_optimizer = True  # Whether to save the optimizer parameters into the checkpoint file.
+learning_rate_base = 5e-3
+learning_rate_base_batch_size = 64
+learning_rate_init = learning_rate_base * ((batch_size * num_gpus) / learning_rate_base_batch_size)
+lr_type = 'piecewise'  # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
+lr_decay_epoch = 5  # Epochs after which learning rate decays. Int or float. Used when chosen `exponential` and `cosine_decay_restart` lr_type.
+lr_decay_factor = 0.96  # The learning rate decay factor. Used when chosen `exponential` lr_type.
+lr_lower_bound = 1e-6  # The minimum learning rate.
+# only used in piecewise lr type
+pw_boundaries = [80, 90]  # epoch based boundaries
+pw_values = [learning_rate_init, learning_rate_init*0.1, learning_rate_init*0.01]
+
+### Load and finetune
+# Choose the parts you want to restore the weights. List form.
+# restore_include: None, restore_exclude: None  => restore the whole model
+# restore_include: None, restore_exclude: scope  => restore the whole model except `scope`
+# restore_include: scope1, restore_exclude: scope2  => if scope1 contains scope2, restore scope1 and not restore scope2 (scope1 - scope2)
+# choise 1: only restore the darknet body
+# restore_include = ['yolov3/darknet53_body']
+restore_exclude = None
+# choise 2: restore all layers except the last 3 conv2d layers in 3 scale
+restore_include = None
+# restore_exclude = ['yolov3/yolov3_head/Conv_14', 'yolov3/yolov3_head/Conv_6', 'yolov3/yolov3_head/Conv_22']
+# Choose the parts you want to finetune. List form.
+# Set to None to train the whole model.
+# update_part = ['yolov3/yolov3_head']
+update_part = None
+
+### other training strategies
+multi_scale_train = False  # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
+use_label_smooth = False # Whether to use class label smoothing strategy.
+use_focal_loss = False  # Whether to apply focal loss on the conf loss.
+use_mix_up = False  # Whether to use mix up data augmentation strategy.
+use_warm_up = True  # whether to use warm up strategy to prevent from gradient exploding.
+warm_up_epoch = min(total_epoches*0.1, 3)  # Warm up training epoches. Set to a larger value if gradient explodes.
+
+### some constants in validation
+# nms
+nms_threshold = 0.5  # iou threshold in nms operation
+score_threshold = 0.001  # threshold of the probability of the classes in nms operation, i.e. score = pred_confs * pred_probs. set lower for higher recall.
+nms_topk = 100  # keep at most nms_topk outputs after nms
+# mAP eval
+eval_threshold = 0.5  # the iou threshold applied in mAP evaluation
+use_voc_07_metric = False  # whether to use voc 2007 evaluation metric, i.e. the 11-point metric
+
+### parse some params
+anchors = parse_anchors(anchor_path)
+classes = read_class_names(class_name_path)
+class_num = len(classes)
+train_img_cnt = len(open(train_file, 'r').readlines())
+val_img_cnt = len(open(val_file, 'r').readlines())
+train_batch_num = int(float(train_img_cnt) / batch_size / num_gpus)
+
+lr_decay_freq = int(train_batch_num * lr_decay_epoch)
+pw_boundaries = [float(i) * train_batch_num + global_step for i in pw_boundaries]
@@ -0,0 +1,113 @@
+import json,cv2
+from collections import defaultdict
+
+ban_path = './data/5k.txt'
+with open(ban_path, 'r')as f:
+    ban_list = f.read().split('\n')[:-1]
+    ban_list = [i.split('/')[-1] for i in ban_list]
+
+name_box_id = defaultdict(list)
+id_name = dict()
+
+coco_dataset_path = '/opt/npu/dataset/coco/coco2014'
+
+f = open(
+    coco_dataset_path + "/annotations/instances_train2014.json",
+    encoding='utf-8')
+data = json.load(f)
+annotations = data['annotations']
+for ant in annotations:
+    id = ant['image_id']
+    name = coco_dataset_path + '/train2014/COCO_train2014_%012d.jpg' % id
+    cat = ant['category_id']
+
+    if cat >= 1 and cat <= 11:
+        cat = cat - 1
+    elif cat >= 13 and cat <= 25:
+        cat = cat - 2
+    elif cat >= 27 and cat <= 28:
+        cat = cat - 3
+    elif cat >= 31 and cat <= 44:
+        cat = cat - 5
+    elif cat >= 46 and cat <= 65:
+        cat = cat - 6
+    elif cat == 67:
+        cat = cat - 7
+    elif cat == 70:
+        cat = cat - 9
+    elif cat >= 72 and cat <= 82:
+        cat = cat - 10
+    elif cat >= 84 and cat <= 90:
+        cat = cat - 11
+
+    name_box_id[name].append([ant['bbox'], cat])
+
+
+
+
+f = open(
+    coco_dataset_path + "/annotations/instances_val2014.json",
+    encoding='utf-8')
+data = json.load(f)
+annotations = data['annotations']
+for ant in annotations:
+    id = ant['image_id']
+    name = coco_dataset_path + '/val2014/COCO_val2014_%012d.jpg' % id
+    cat = ant['category_id']
+
+    if cat >= 1 and cat <= 11:
+        cat = cat - 1
+    elif cat >= 13 and cat <= 25:
+        cat = cat - 2
+    elif cat >= 27 and cat <= 28:
+        cat = cat - 3
+    elif cat >= 31 and cat <= 44:
+        cat = cat - 5
+    elif cat >= 46 and cat <= 65:
+        cat = cat - 6
+    elif cat == 67:
+        cat = cat - 7
+    elif cat == 70:
+        cat = cat - 9
+    elif cat >= 72 and cat <= 82:
+        cat = cat - 10
+    elif cat >= 84 and cat <= 90:
+        cat = cat - 11
+
+    name_box_id[name].append([ant['bbox'], cat])
+    
+
+
+
+
+
+f = open('data/coco2014_minival.txt', 'w')
+ii = 0
+for idx, key in enumerate(name_box_id.keys()):
+    if key.split('/')[-1] not in ban_list:
+        continue
+
+    print('5k', key.split('/')[-1])
+
+    f.write('%d '%ii)
+    ii += 1
+    f.write(key)
+
+    img = cv2.imread(key)
+    h,w,c = img.shape
+
+    f.write(' %d %d'%(w,h))
+
+    box_infos = name_box_id[key]
+    for info in box_infos:
+        x_min = int(info[0][0])
+        y_min = int(info[0][1])
+        x_max = x_min + int(info[0][2])
+        y_max = y_min + int(info[0][3])
+
+        box_info = " %d %d %d %d %d" % (
+            int(info[1]), x_min, y_min, x_max, y_max
+        )
+        f.write(box_info)
+    f.write('\n')
+f.close()
@@ -0,0 +1,113 @@
+import json,cv2
+from collections import defaultdict
+
+ban_path = './data/5k.txt'
+with open(ban_path, 'r')as f:
+    ban_list = f.read().split('\n')[:-1]
+    ban_list = [i.split('/')[-1] for i in ban_list]
+
+name_box_id = defaultdict(list)
+id_name = dict()
+
+coco_dataset_path = '/opt/npu/dataset/coco/coco2014'
+
+f = open(
+    coco_dataset_path + "/annotations/instances_train2014.json",
+    encoding='utf-8')
+data = json.load(f)
+annotations = data['annotations']
+for ant in annotations:
+    id = ant['image_id']
+    name = coco_dataset_path + '/train2014/COCO_train2014_%012d.jpg' % id
+    cat = ant['category_id']
+
+    if cat >= 1 and cat <= 11:
+        cat = cat - 1
+    elif cat >= 13 and cat <= 25:
+        cat = cat - 2
+    elif cat >= 27 and cat <= 28:
+        cat = cat - 3
+    elif cat >= 31 and cat <= 44:
+        cat = cat - 5
+    elif cat >= 46 and cat <= 65:
+        cat = cat - 6
+    elif cat == 67:
+        cat = cat - 7
+    elif cat == 70:
+        cat = cat - 9
+    elif cat >= 72 and cat <= 82:
+        cat = cat - 10
+    elif cat >= 84 and cat <= 90:
+        cat = cat - 11
+
+    name_box_id[name].append([ant['bbox'], cat])
+
+
+
+
+f = open(
+    coco_dataset_path + "/annotations/instances_val2014.json",
+    encoding='utf-8')
+data = json.load(f)
+annotations = data['annotations']
+for ant in annotations:
+    id = ant['image_id']
+    name = coco_dataset_path + '/val2014/COCO_val2014_%012d.jpg' % id
+    cat = ant['category_id']
+
+    if cat >= 1 and cat <= 11:
+        cat = cat - 1
+    elif cat >= 13 and cat <= 25:
+        cat = cat - 2
+    elif cat >= 27 and cat <= 28:
+        cat = cat - 3
+    elif cat >= 31 and cat <= 44:
+        cat = cat - 5
+    elif cat >= 46 and cat <= 65:
+        cat = cat - 6
+    elif cat == 67:
+        cat = cat - 7
+    elif cat == 70:
+        cat = cat - 9
+    elif cat >= 72 and cat <= 82:
+        cat = cat - 10
+    elif cat >= 84 and cat <= 90:
+        cat = cat - 11
+
+    name_box_id[name].append([ant['bbox'], cat])
+    
+
+
+
+
+
+f = open('data/coco2014_trainval.txt', 'w')
+ii = 0
+for idx, key in enumerate(name_box_id.keys()):
+    if key.split('/')[-1] in ban_list:
+        continue
+
+    print('trainval', key.split('/')[-1])
+
+    f.write('%d '%ii)
+    ii += 1
+    f.write(key)
+
+    img = cv2.imread(key)
+    h,w,c = img.shape
+
+    f.write(' %d %d'%(w,h))
+
+    box_infos = name_box_id[key]
+    for info in box_infos:
+        x_min = int(info[0][0])
+        y_min = int(info[0][1])
+        x_max = x_min + int(info[0][2])
+        y_max = y_min + int(info[0][3])
+
+        box_info = " %d %d %d %d %d" % (
+            int(info[1]), x_min, y_min, x_max, y_max
+        )
+        f.write(box_info)
+    f.write('\n')
+f.close()
@@ -0,0 +1,38 @@
+# coding: utf-8
+# for more details about the yolo darknet weights file, refer to
+# https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe
+
+from __future__ import division, print_function
+
+import os
+import sys
+import tensorflow as tf
+import numpy as np
+
+from model import yolov3
+from utils.misc_utils import parse_anchors, load_weights
+
+num_class = 80
+img_size = 416
+weight_path = '../yolov3-tf2/data/darknet53.conv.74'
+save_path = './data/darknet_weights/darknet53.ckpt'
+anchors = parse_anchors('./data/yolo_anchors.txt')
+
+model = yolov3(80, anchors)
+with tf.Session() as sess:
+    inputs = tf.placeholder(tf.float32, [1, img_size, img_size, 3])
+
+    with tf.variable_scope('yolov3'):
+        feature_map = model.forward(inputs)
+
+    saver = tf.train.Saver(var_list=tf.global_variables(scope='yolov3'))
+
+    load_ops = load_weights(tf.global_variables(scope='yolov3'), weight_path)
+
+    sess.run(tf.global_variables_initializer())
+    sess.run(load_ops)
+    saver.save(sess, save_path=save_path)
+    print('TensorFlow model checkpoint has been saved to {}'.format(save_path))
+
+
+
@@ -0,0 +1,80 @@
+person
+bicycle
+car
+motorbike
+aeroplane
+bus
+train
+truck
+boat
+traffic light
+fire hydrant
+stop sign
+parking meter
+bench
+bird
+cat
+dog
+horse
+sheep
+cow
+elephant
+bear
+zebra
+giraffe
+backpack
+umbrella
+handbag
+tie
+suitcase
+frisbee
+skis
+snowboard
+sports ball
+kite
+baseball bat
+baseball glove
+skateboard
+surfboard
+tennis racket
+bottle
+wine glass
+cup
+fork
+knife
+spoon
+bowl
+banana
+apple
+sandwich
+orange
+broccoli
+carrot
+hot dog
+pizza
+donut
+cake
+chair
+sofa
+pottedplant
+bed
+diningtable
+toilet
+tvmonitor
+laptop
+mouse
+remote
+keyboard
+cell phone
+microwave
+oven
+toaster
+sink
+refrigerator
+book
+clock
+vase
+scissors
+teddy bear
+hair drier
+toothbrush
@@ -0,0 +1 @@
+10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90,  156,198,  373,326
@@ -0,0 +1,220 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+import numpy as np
+import argparse
+import cv2
+
+from utils.misc_utils import parse_anchors, read_class_names
+from utils.nms_utils import gpu_nms, cpu_nms
+from utils.plot_utils import get_color_table, plot_one_box
+from utils.data_aug import letterbox_resize
+
+from model import yolov3
+from tqdm import trange
+import json
+import os,time
+
+# npu modified
+from npu_bridge.estimator import npu_ops
+from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
+from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig
+from npu_bridge.estimator.npu import util
+
+'''
+coco weight from official checked 
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.309
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.555
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.311
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.337
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.460
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.273
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.430
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.465
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.270
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.629
+
+'''
+
+parser = argparse.ArgumentParser(description="YOLO-V3 test single image test procedure.")
+parser.add_argument("--annotation_txt", type=str, default='../code/data/coco2014_minival.txt',
+                    help="The path of the input image. Or annotation label txt.")
+parser.add_argument("--anchor_path", type=str, default="../code/data/yolo_anchors.txt",
+                    help="The path of the anchor txt file.")
+parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
+                    help="Resize the input image with `new_size`, size format: [width, height]")
+parser.add_argument("--max_test", type=int, default=-1,
+                    help="max step for test")
+parser.add_argument("--score_thresh", type=float, default=1e-3,
+                    help="score_threshold for test")
+parser.add_argument("--nms_thresh", type=float, default=0.5,
+                    help="iou_threshold for test")
+parser.add_argument("--max_boxes", type=int, default=100,
+                    help="max_boxes for test")
+parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=True,
+                    help="Whether to use the letterbox resize.")
+parser.add_argument("--class_name_path", type=str, default="../code/data/coco.names",
+                    help="The path of the class names.")
+parser.add_argument("--restore_path", type=str, default="../code/data/darknet_weights/yolo3.ckpt",
+                    # parser.add_argument("--restore_path", type=str, default="./training_s2/checkpoint_dir/model.ckpt-45800",
+                    help="The path of the weights to restore.")
+parser.add_argument("--save_img", type=bool, default=False,
+                    help="whether to save detected-result image")
+parser.add_argument("--save_json", type=bool, default=False,
+                    help="whether to save detected-result cocolike json")
+parser.add_argument("--save_json_path", type=str, default="../result/result.json",
+                    help="The path of the result.json.")
+args = parser.parse_args()
+
+args.anchors = parse_anchors(args.anchor_path)
+args.classes = read_class_names(args.class_name_path)
+args.num_class = len(args.classes)
+
+color_table = get_color_table(args.num_class)
+cat_id_to_real_id = \
+    {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16,
+     18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30,
+     35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44,
+     50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58,
+     64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72,
+     82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}
+real_id_to_cat_id = {cat_id_to_real_id[i]: i for i in cat_id_to_real_id}
+
+
+def get_default_dict():
+    return {"image_id": -1, "category_id": -1, "bbox": [], "score": 0}
+
+
+eval_path = args.annotation_txt
+with open(eval_path, 'r')as f:
+    eval_file_list = f.read().split('\n')[:-1]
+    print(len(eval_file_list))
+eval_file_dict = {}
+for i in eval_file_list:
+    tmp_list = i.split(' ')
+    idx = int(tmp_list[0])
+    path = tmp_list[1]
+    w = float(tmp_list[2])
+    h = float(tmp_list[3])
+    bbox_len = len(tmp_list[4:]) // 5
+    bbox = []
+    for bbox_idx in range(bbox_len):
+        label, x1, y1, x2, y2 = tmp_list[4:][bbox_idx * 5:bbox_idx * 5 + 5]
+        bbox.append([label, x1, y1, x2, y2])
+    eval_file_dict[idx] = {
+        'path': path,
+        'w': w,
+        'h': h,
+        'bbox': bbox
+    }
+
+config = tf.ConfigProto()
+custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
+custom_op.name = "NpuOptimizer"
+custom_op.parameter_map["use_off_line"].b = True  # training on Ascend chips
+config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
+
+json_out = []
+with tf.Session(config=config) as sess:
+# with tf.Session() as sess:
+    input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data')
+    yolo_model = yolov3(args.num_class, args.anchors)
+    with tf.variable_scope('yolov3'):
+        pred_feature_maps = yolo_model.forward(input_data, False)
+    pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
+
+    pred_scores = pred_confs * pred_probs
+
+    # boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=100, score_thresh=args.score_thresh, nms_thresh=0.5)
+
+    saver = tf.train.Saver()
+    if args.restore_path.find('.ckpt') < 0 and args.restore_path.find('model-') < 0:
+        with open(os.path.join(args.restore_path, 'checkpoint'), 'r')as f:
+            tmp_checkpoint = f.readline()
+            tmp_checkpoint = tmp_checkpoint.replace('"', '').split(':')[1].strip()
+            args.restore_path = os.path.join(args.restore_path, tmp_checkpoint)
+            print('tmp_checkpoint: ', tmp_checkpoint)
+            # input()
+
+    saver.restore(sess, args.restore_path)
+
+    if args.max_test > 0:
+        test_len = min(args.max_test, len(eval_file_dict.keys()))
+    else:
+        test_len = len(eval_file_dict.keys())
+    for test_idx in trange(test_len):
+        img_path = eval_file_dict[test_idx]['path']
+        img_ori = cv2.imread(img_path)
+        if args.letterbox_resize:
+            img, resize_ratio, dw, dh = letterbox_resize(img_ori, args.new_size[0], args.new_size[1])
+        else:
+            height_ori, width_ori = img_ori.shape[:2]
+            img = cv2.resize(img_ori, tuple(args.new_size))
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        img = np.asarray(img, np.float32)
+        img = img[np.newaxis, :] / 255.
+
+        # boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
+        # print('bbox: ',boxes_)
+        t = time.time()
+        boxes_, scores_ = sess.run([pred_boxes, pred_scores], feed_dict={input_data: img})
+        # print("FPS: ", 1/(time.time() - t))
+        boxes_, scores_, labels_ = cpu_nms(boxes_, scores_, args.num_class, args.max_boxes, args.score_thresh, args.nms_thresh)
+        # print('bbox: ', boxes_)
+
+        # try:
+        #     boxes_[:, [0, 2]] = (boxes_[:, [0, 2]] - dw) / resize_ratio
+        # except:
+        #     print("boxes_: ", boxes_)
+        #     continue
+
+        # print("boxes_: ", boxes_)
+        # rescale the coordinates to the original image
+        if args.letterbox_resize:
+            boxes_[:, [0, 2]] = (boxes_[:, [0, 2]] - dw) / resize_ratio
+            boxes_[:, [1, 3]] = (boxes_[:, [1, 3]] - dh) / resize_ratio
+        else:
+            boxes_[:, [0, 2]] *= (width_ori / float(args.new_size[0]))
+            boxes_[:, [1, 3]] *= (height_ori / float(args.new_size[1]))
+
+        if args.save_img:
+            # print("box coords:")
+            # print(boxes_)
+            # print('*' * 30)
+            # print("scores:")
+            # print(scores_)
+            # print('*' * 30)
+            # print("labels:")
+            # print(labels_)
+            for i in range(len(boxes_)):
+                x0, y0, x1, y1 = boxes_[i]
+                plot_one_box(img_ori, [x0, y0, x1, y1],
+                             label=args.classes[labels_[i]] + ', {:.2f}%'.format(scores_[i] * 100),
+                             color=color_table[labels_[i]])
+            cv2.imwrite('tmp/%d_detection_result.jpg' % test_idx, img_ori)
+            print('%d done' % test_idx)
+
+        if args.save_json:
+            for i in range(len(boxes_)):
+                x0, y0, x1, y1 = boxes_[i]
+                bw = x1 - x0
+                bh = y1 - y0
+                s = scores_[i]
+                c = labels_[i]
+                t_dict = get_default_dict()
+                t_dict['image_id'] = int(img_path.split('/')[-1].split('.')[0].split('_')[-1])
+                t_dict['category_id'] = real_id_to_cat_id[int(c) + 1]
+                t_dict['bbox'] = [int(i) for i in [x0, y0, bw, bh]]
+                t_dict['score'] = float(s)
+                json_out.append(t_dict)
+
+if args.save_json:
+    with open(args.save_json_path, 'w')as f:
+        json.dump(json_out, f)
+    print('output json saved to: ', args.save_json_path)
+    eval_coco = os.path.realpath(__file__ + "/../eval_coco.py")
+    os.system('python3.7 %s %s' % (eval_coco, args.save_json_path))
@@ -0,0 +1,61 @@
+
+#export CUDA_VISIBLE_DEVICES=''
+#export CUDA_VISIBLE_DEVICES=7
+
+
+
+# setting main path
+MAIN_PATH=$(dirname $(readlink -f $0))
+
+## set env
+#export PYTHONPATH=/usr/local/Ascend/ops/op_impl/built-in/ai_core/tbe/:$MAIN_PATH/../../../
+#export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/x86_64-linux-gnu
+#PATH=$PATH:$HOME/bin
+#export PATH=$PATH:/usr/local/Ascend/fwkacllib/ccec_compiler/bin:$PATH
+#export ASCEND_OPP_PATH=/usr/local/Ascend/opp
+
+# set env
+export ASCEND_HOME=/usr/local/Ascend
+export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/
+export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/te:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/topi:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/hccl:/usr/local/Ascend/ascend-toolkit/latest/tfplugin/python/site-packages:$currentDir
+export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin
+export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
+
+export DDK_VERSION_FLAG=1.60.T49.0.B201
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+
+export JOB_ID=10087
+export FUSION_TENSOR_SIZE=1000000000
+#export SLOG_PRINT_TO_STDOUT=1
+#export DUMP_GE_GRAPH=2
+#export DUMP_GRAPH_LEVEL=3
+
+
+
+for((RANK_ID=0;RANK_ID<8;RANK_ID++));
+do
+
+export RANK_ID=$RANK_ID
+export RANK_SIZE=1
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+
+su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[debug]\" --device "$RANK_ID
+
+RESTORE_PATH=./training/t1/D$RANK_ID/training/
+
+nohup python3.7 eval.py \
+--save_json True \
+--score_thresh 0.0001 \
+--nms_thresh 0.55 \
+--max_boxes 100 \
+--restore_path $RESTORE_PATH \
+--max_test 10000 \
+--save_json_path eval_res_D$RANK_ID.json > eval_$RANK_ID.out &
+
+
+done
+
+
@@ -0,0 +1,57 @@
+#-*- coding:utf-8 -*-
+# import matplotlib.pyplot as plt
+from pycocotools.coco import COCO 
+from pycocotools.cocoeval import COCOeval 
+import numpy as np 
+import pylab,json
+import sys
+# pylab.rcParams['figure.figsize'] = (10.0, 8.0)
+
+def get_img_id(file_name): 
+    ls = [] 
+    myset = [] 
+    annos = json.load(open(file_name, 'r')) 
+    for anno in annos: 
+      ls.append(anno['image_id']) 
+    myset = {}.fromkeys(ls).keys() 
+    return myset
+
+
+'''
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.317
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.562
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.321
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.162
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.448
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.278
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.438
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.464
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.497
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.625
+'''
+
+if __name__ == '__main__': 
+    annType = ['segm', 'bbox', 'keypoints']#set iouType to 'segm', 'bbox' or 'keypoints'
+    annType = annType[1] # specify type here
+    cocoGt_file = '/opt/npu/dataset/coco/coco2014/annotations/instances_val2014.json'
+    cocoGt = COCO(cocoGt_file)#取得标注集中coco json对象
+    # print(list(cocoGt.anns.items())[:10])
+    # print(cocoGt.anns[318219])
+    # input()
+    # cocoDt_file = 'result.json'
+    cocoDt_file = sys.argv[1]
+
+    imgIds = get_img_id(cocoDt_file) 
+    # print(len(imgIds))
+    cocoDt = cocoGt.loadRes(cocoDt_file)#取得结果集中image json对象
+    imgIds = sorted(imgIds)#按顺序排列coco标注集image_id
+    # print(imgIds)
+    # input()
+    # imgIds = imgIds[0:5000]#标注集中的image数据
+    cocoEval = COCOeval(cocoGt, cocoDt, annType) 
+    cocoEval.params.imgIds = imgIds#参数设置
+    cocoEval.evaluate()#评价
+    cocoEval.accumulate()#积累
+    cocoEval.summarize()#总结
@@ -0,0 +1,155 @@
+# coding: utf-8
+# This script is modified from https://github.com/lars76/kmeans-anchor-boxes
+
+from __future__ import division, print_function
+
+import numpy as np
+
+def iou(box, clusters):
+    """
+    Calculates the Intersection over Union (IoU) between a box and k clusters.
+    param:
+        box: tuple or array, shifted to the origin (i. e. width and height)
+        clusters: numpy array of shape (k, 2) where k is the number of clusters
+    return:
+        numpy array of shape (k, 0) where k is the number of clusters
+    """
+    x = np.minimum(clusters[:, 0], box[0])
+    y = np.minimum(clusters[:, 1], box[1])
+    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
+        raise ValueError("Box has no area")
+
+    intersection = x * y
+    box_area = box[0] * box[1]
+    cluster_area = clusters[:, 0] * clusters[:, 1]
+
+    iou_ = np.true_divide(intersection, box_area + cluster_area - intersection + 1e-10)
+    # iou_ = intersection / (box_area + cluster_area - intersection + 1e-10)
+
+    return iou_
+
+
+def avg_iou(boxes, clusters):
+    """
+    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
+    param:
+        boxes: numpy array of shape (r, 2), where r is the number of rows
+        clusters: numpy array of shape (k, 2) where k is the number of clusters
+    return:
+        average IoU as a single float
+    """
+    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])
+
+
+def translate_boxes(boxes):
+    """
+    Translates all the boxes to the origin.
+    param:
+        boxes: numpy array of shape (r, 4)
+    return:
+    numpy array of shape (r, 2)
+    """
+    new_boxes = boxes.copy()
+    for row in range(new_boxes.shape[0]):
+        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
+        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
+    return np.delete(new_boxes, [0, 1], axis=1)
+
+
+def kmeans(boxes, k, dist=np.median):
+    """
+    Calculates k-means clustering with the Intersection over Union (IoU) metric.
+    param:
+        boxes: numpy array of shape (r, 2), where r is the number of rows
+        k: number of clusters
+        dist: distance function
+    return:
+        numpy array of shape (k, 2)
+    """
+    rows = boxes.shape[0]
+
+    distances = np.empty((rows, k))
+    last_clusters = np.zeros((rows,))
+
+    np.random.seed()
+
+    # the Forgy method will fail if the whole array contains the same rows
+    clusters = boxes[np.random.choice(rows, k, replace=False)]
+
+    while True:
+        for row in range(rows):
+            distances[row] = 1 - iou(boxes[row], clusters)
+
+        nearest_clusters = np.argmin(distances, axis=1)
+
+        if (last_clusters == nearest_clusters).all():
+            break
+
+        for cluster in range(k):
+            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
+
+        last_clusters = nearest_clusters
+
+    return clusters
+
+
+def parse_anno(annotation_path, target_size=None):
+    anno = open(annotation_path, 'r')
+    result = []
+    for line in anno:
+        s = line.strip().split(' ')
+        img_w = int(s[2])
+        img_h = int(s[3])
+        s = s[4:]
+        box_cnt = len(s) // 5
+        for i in range(box_cnt):
+            x_min, y_min, x_max, y_max = float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4])
+            width = x_max - x_min
+            height = y_max - y_min
+            assert width > 0
+            assert height > 0
+            # use letterbox resize, i.e. keep the original aspect ratio
+            # get k-means anchors on the resized target image size
+            if target_size is not None:
+                resize_ratio = min(target_size[0] / img_w, target_size[1] / img_h)
+                width *= resize_ratio
+                height *= resize_ratio
+                result.append([width, height])
+            # get k-means anchors on the original image size
+            else:
+                result.append([width, height])
+    result = np.asarray(result)
+    return result
+
+
+def get_kmeans(anno, cluster_num=9):
+
+    anchors = kmeans(anno, cluster_num)
+    ave_iou = avg_iou(anno, anchors)
+
+    anchors = anchors.astype('int').tolist()
+
+    anchors = sorted(anchors, key=lambda x: x[0] * x[1])
+
+    return anchors, ave_iou
+
+
+if __name__ == '__main__':
+    # target resize format: [width, height]
+    # if target_resize is speficied, the anchors are on the resized image scale
+    # if target_resize is set to None, the anchors are on the original image scale
+    target_size = [416, 416]
+    annotation_path = "train.txt"
+    anno_result = parse_anno(annotation_path, target_size=target_size)
+    anchors, ave_iou = get_kmeans(anno_result, 9)
+
+    anchor_string = ''
+    for anchor in anchors:
+        anchor_string += '{},{}, '.format(anchor[0], anchor[1])
+    anchor_string = anchor_string[:-2]
+
+    print('anchors are:')
+    print(anchor_string)
+    print('the average iou is:')
+    print(ave_iou)
+
@@ -0,0 +1,32 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "1",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "1",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                }
+           ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0"
+    ],
+    "para_plane_nic_num": "1",
+    "status": "completed"
+}
@@ -0,0 +1,43 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "2",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "2",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1"
+    ],
+    "para_plane_nic_num": "2",
+    "status": "completed"
+}
@@ -0,0 +1,65 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "4",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "4",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3"
+    ],
+    "para_plane_nic_num": "4",
+    "status": "completed"
+}
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,88 @@
+# coding: utf-8
+# This file contains the parameter used in train.py
+
+from __future__ import division, print_function
+
+from utils.misc_utils import parse_anchors, read_class_names
+import math
+
+### Some paths
+train_file = './data/my_data/train.txt'  # The path of the training txt file.
+val_file = './data/my_data/val.txt'  # The path of the validation txt file.
+restore_path = './data/darknet_weights/yolov3.ckpt'  # The path of the weights to restore.
+save_dir = './checkpoint/'  # The directory of the weights to save.
+log_dir = './data/logs/'  # The directory to store the tensorboard log files.
+progress_log_path = './data/progress.log'  # The path to record the training progress.
+anchor_path = './data/yolo_anchors.txt'  # The path of the anchor txt file.
+class_name_path = './data/voc.names'  # The path of the class names.
+
+### Training releated numbers
+batch_size = 6
+img_size = [416, 416]  # Images will be resized to `img_size` and fed to the network, size format: [width, height]
+letterbox_resize = False  # Whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+total_epoches = 100
+train_evaluation_step = 100  # Evaluate on the training batch after some steps.
+val_evaluation_epoch = 1  # Evaluate on the whole validation dataset after some steps. Set to None to evaluate every epoch.
+save_epoch = 10  # Save the model after some epochs.
+batch_norm_decay = 0.99  # decay in bn ops
+weight_decay = 5e-4  # l2 weight decay
+global_step = 0  # used when resuming training
+
+### tf.data parameters
+num_threads = 10  # Number of threads for image processing used in tf.data pipeline.
+prefetech_buffer = 5  # Prefetech_buffer used in tf.data pipeline.
+
+### Learning rate and optimizer
+optimizer_name = 'momentum'  # Chosen from [sgd, momentum, adam, rmsprop]
+save_optimizer = False  # Whether to save the optimizer parameters into the checkpoint file.
+learning_rate_init = 1e-4
+lr_type = 'piecewise'  # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
+lr_decay_epoch = 5  # Epochs after which learning rate decays. Int or float. Used when chosen `exponential` and `cosine_decay_restart` lr_type.
+lr_decay_factor = 0.96  # The learning rate decay factor. Used when chosen `exponential` lr_type.
+lr_lower_bound = 1e-6  # The minimum learning rate.
+# piecewise params
+pw_boundaries = [25, 40]  # epoch based boundaries
+pw_values = [learning_rate_init, 3e-5, 1e-4]
+
+### Load and finetune
+# Choose the parts you want to restore the weights. List form.
+# restore_include: None, restore_exclude: None  => restore the whole model
+# restore_include: None, restore_exclude: scope  => restore the whole model except `scope`
+# restore_include: scope1, restore_exclude: scope2  => if scope1 contains scope2, restore scope1 and not restore scope2 (scope1 - scope2)
+# choise 1: only restore the darknet body
+# restore_include = ['yolov3/darknet53_body']
+# restore_exclude = None
+# choise 2: restore all layers except the last 3 conv2d layers in 3 scale
+restore_include = None
+restore_exclude = ['yolov3/yolov3_head/Conv_14', 'yolov3/yolov3_head/Conv_6', 'yolov3/yolov3_head/Conv_22']
+# Choose the parts you want to finetune. List form.
+# Set to None to train the whole model.
+update_part = None
+
+### other training strategies
+multi_scale_train = True  # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
+use_label_smooth = True # Whether to use class label smoothing strategy.
+use_focal_loss = True  # Whether to apply focal loss on the conf loss.
+use_mix_up = True  # Whether to use mix up data augmentation strategy. 
+use_warm_up = True  # whether to use warm up strategy to prevent from gradient exploding.
+warm_up_epoch = 3  # Warm up training epoches. Set to a larger value if gradient explodes.
+
+### some constants in validation
+# nms
+nms_threshold = 0.45  # iou threshold in nms operation
+score_threshold = 0.01 # threshold of the probability of the classes in nms operation, i.e. score = pred_confs * pred_probs. set lower for higher recall.
+nms_topk = 150  # keep at most nms_topk outputs after nms
+# mAP eval
+eval_threshold = 0.5  # the iou threshold applied in mAP evaluation
+use_voc_07_metric = False  # whether to use voc 2007 evaluation metric, i.e. the 11-point metric
+
+### parse some params
+anchors = parse_anchors(anchor_path)
+classes = read_class_names(class_name_path)
+class_num = len(classes)
+train_img_cnt = len(open(train_file, 'r').readlines())
+val_img_cnt = len(open(val_file, 'r').readlines())
+train_batch_num = int(math.ceil(float(train_img_cnt) / batch_size))
+
+lr_decay_freq = int(train_batch_num * lr_decay_epoch)
+pw_boundaries = [float(i) * train_batch_num + global_step for i in pw_boundaries]
@@ -0,0 +1,140 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+import numpy as np
+import argparse
+from tqdm import trange
+
+from utils.data_utils import get_batch_data
+from utils.misc_utils import parse_anchors, read_class_names, AverageMeter
+from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu, get_preds_gpu, voc_eval, parse_gt_rec
+from utils.nms_utils import gpu_nms
+
+from model import yolov3
+
+#################
+# ArgumentParser
+#################
+parser = argparse.ArgumentParser(description="YOLO-V3 eval procedure.")
+# some paths
+parser.add_argument("--eval_file", type=str, default="./data/my_data/val.txt",
+                    help="The path of the validation or test txt file.")
+
+parser.add_argument("--restore_path", type=str, default="./data/checkpoint_whole_finetune_no_letterbox/best_model_Epoch_32_step_91046_mAP_0.8754_loss_2.2147_lr_3e-05",
+                    help="The path of the weights to restore.")
+
+parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
+                    help="The path of the anchor txt file.")
+
+parser.add_argument("--class_name_path", type=str, default="./data/voc.names",
+                    help="The path of the class names.")
+
+# some numbers
+parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416],
+                    help="Resize the input image to `img_size`, size format: [width, height]")
+
+parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=False,
+                    help="Whether to use the letterbox resize.")
+
+parser.add_argument("--num_threads", type=int, default=10,
+                    help="Number of threads for image processing used in tf.data pipeline.")
+
+parser.add_argument("--prefetech_buffer", type=int, default=5,
+                    help="Prefetech_buffer used in tf.data pipeline.")
+
+parser.add_argument("--nms_threshold", type=float, default=0.45,
+                    help="IOU threshold in nms operation.")
+
+parser.add_argument("--score_threshold", type=float, default=0.01,
+                    help="Threshold of the probability of the classes in nms operation.")
+
+parser.add_argument("--nms_topk", type=int, default=150,
+                    help="Keep at most nms_topk outputs after nms.")
+
+parser.add_argument("--use_voc_07_metric", type=lambda x: (str(x).lower() == 'true'), default=False,
+                    help="Whether to use the voc 2007 mAP metrics.")
+
+args = parser.parse_args()
+
+# args params
+args.anchors = parse_anchors(args.anchor_path)
+args.classes = read_class_names(args.class_name_path)
+args.class_num = len(args.classes)
+args.img_cnt = len(open(args.eval_file, 'r').readlines())
+
+# setting placeholders
+is_training = tf.placeholder(dtype=tf.bool, name="phase_train")
+handle_flag = tf.placeholder(tf.string, [], name='iterator_handle_flag')
+pred_boxes_flag = tf.placeholder(tf.float32, [1, None, None])
+pred_scores_flag = tf.placeholder(tf.float32, [1, None, None])
+gpu_nms_op = gpu_nms(pred_boxes_flag, pred_scores_flag, args.class_num, args.nms_topk, args.score_threshold, args.nms_threshold)
+
+##################
+# tf.data pipeline
+##################
+val_dataset = tf.data.TextLineDataset(args.eval_file)
+val_dataset = val_dataset.batch(1)
+val_dataset = val_dataset.map(
+    lambda x: tf.py_func(get_batch_data, [x, args.class_num, args.img_size, args.anchors, 'val', False, False, args.letterbox_resize], [tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]),
+    num_parallel_calls=args.num_threads
+)
+val_dataset.prefetch(args.prefetech_buffer)
+iterator = val_dataset.make_one_shot_iterator()
+
+image_ids, image, y_true_13, y_true_26, y_true_52 = iterator.get_next()
+image_ids.set_shape([None])
+y_true = [y_true_13, y_true_26, y_true_52]
+image.set_shape([None, args.img_size[1], args.img_size[0], 3])
+for y in y_true:
+    y.set_shape([None, None, None, None, None])
+
+##################
+# Model definition
+##################
+yolo_model = yolov3(args.class_num, args.anchors)
+with tf.variable_scope('yolov3'):
+    pred_feature_maps = yolo_model.forward(image, is_training=is_training)
+loss = yolo_model.compute_loss(pred_feature_maps, y_true)
+y_pred = yolo_model.predict(pred_feature_maps)
+
+saver_to_restore = tf.train.Saver()
+
+with tf.Session() as sess:
+    sess.run([tf.global_variables_initializer()])
+    saver_to_restore.restore(sess, args.restore_path)
+
+    print('\n----------- start to eval -----------\n')
+
+    val_loss_total, val_loss_xy, val_loss_wh, val_loss_conf, val_loss_class = \
+        AverageMeter(), AverageMeter(), AverageMeter(), AverageMeter(), AverageMeter()
+    val_preds = []
+
+    for j in trange(args.img_cnt):
+        __image_ids, __y_pred, __loss = sess.run([image_ids, y_pred, loss], feed_dict={is_training: False})
+        pred_content = get_preds_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, __image_ids, __y_pred)
+
+        val_preds.extend(pred_content)
+        val_loss_total.update(__loss[0])
+        val_loss_xy.update(__loss[1])
+        val_loss_wh.update(__loss[2])
+        val_loss_conf.update(__loss[3])
+        val_loss_class.update(__loss[4])
+
+    rec_total, prec_total, ap_total = AverageMeter(), AverageMeter(), AverageMeter()
+    gt_dict = parse_gt_rec(args.eval_file, args.img_size, args.letterbox_resize)
+    print('mAP eval:')
+    for ii in range(args.class_num):
+        npos, nd, rec, prec, ap = voc_eval(gt_dict, val_preds, ii, iou_thres=0.5, use_07_metric=args.use_voc_07_metric)
+        rec_total.update(rec, npos)
+        prec_total.update(prec, nd)
+        ap_total.update(ap, 1)
+        print('Class {}: Recall: {:.4f}, Precision: {:.4f}, AP: {:.4f}'.format(ii, rec, prec, ap))
+
+    mAP = ap_total.average
+    print('final mAP: {:.4f}'.format(mAP))
+    print("recall: {:.3f}, precision: {:.3f}".format(rec_total.average, prec_total.average))
+    print("total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format(
+        val_loss_total.average, val_loss_xy.average, val_loss_wh.average, val_loss_conf.average, val_loss_class.average
+    ))
@@ -0,0 +1,20 @@
+aeroplane
+bicycle
+bird
+boat
+bottle
+bus
+car
+cat
+chair
+cow
+diningtable
+dog
+horse
+motorbike
+person
+pottedplant
+sheep
+sofa
+train
+tvmonitor
@@ -0,0 +1,96 @@
+# coding: utf-8
+
+import xml.etree.ElementTree as ET
+import os
+
+names_dict = {}
+cnt = 0
+f = open('./voc_names.txt', 'r').readlines()
+for line in f:
+    line = line.strip()
+    names_dict[line] = cnt
+    cnt += 1
+
+voc_07 = '/data/VOCdevkit/VOC2007'
+voc_12 = '/data/VOCdevkit/VOC2012'
+
+anno_path = [os.path.join(voc_07, 'Annotations'), os.path.join(voc_12, 'Annotations')]
+img_path = [os.path.join(voc_07, 'JPEGImages'), os.path.join(voc_12, 'JPEGImages')]
+
+trainval_path = [os.path.join(voc_07, 'ImageSets/Main/trainval.txt'),
+                 os.path.join(voc_12, 'ImageSets/Main/trainval.txt')]
+test_path = [os.path.join(voc_07, 'ImageSets/Main/test.txt')]
+
+
+def parse_xml(path):
+    tree = ET.parse(path)
+    img_name = path.split('/')[-1][:-4]
+    
+    height = tree.findtext("./size/height")
+    width = tree.findtext("./size/width")
+
+    objects = [img_name, width, height]
+
+    for obj in tree.findall('object'):
+        difficult = obj.find('difficult').text
+        if difficult == '1':
+            continue
+        name = obj.find('name').text
+        bbox = obj.find('bndbox')
+        xmin = bbox.find('xmin').text
+        ymin = bbox.find('ymin').text
+        xmax = bbox.find('xmax').text
+        ymax = bbox.find('ymax').text
+
+        name = str(names_dict[name])
+        objects.extend([name, xmin, ymin, xmax, ymax])
+    if len(objects) > 1:
+        return objects
+    else:
+        return None
+
+test_cnt = 0
+def gen_test_txt(txt_path):
+    global test_cnt
+    f = open(txt_path, 'w')
+
+    for i, path in enumerate(test_path):
+        img_names = open(path, 'r').readlines()
+        for img_name in img_names:
+            img_name = img_name.strip()
+            xml_path = anno_path[i] + '/' + img_name + '.xml'
+            objects = parse_xml(xml_path)
+            if objects:
+                objects[0] = img_path[i] + '/' + img_name + '.jpg'
+                if os.path.exists(objects[0]):
+                    objects.insert(0, str(test_cnt))
+                    test_cnt += 1
+                    objects = ' '.join(objects) + '\n'
+                    f.write(objects)
+    f.close()
+
+
+train_cnt = 0
+def gen_train_txt(txt_path):
+    global train_cnt
+    f = open(txt_path, 'w')
+
+    for i, path in enumerate(trainval_path):
+        img_names = open(path, 'r').readlines()
+        for img_name in img_names:
+            img_name = img_name.strip()
+            xml_path = anno_path[i] + '/' + img_name + '.xml'
+            objects = parse_xml(xml_path)
+            if objects:
+                objects[0] = img_path[i] + '/' + img_name + '.jpg'
+                if os.path.exists(objects[0]):
+                    objects.insert(0, str(train_cnt))
+                    train_cnt += 1
+                    objects = ' '.join(objects) + '\n'
+                    f.write(objects)
+    f.close()
+
+
+gen_train_txt('train.txt')
+gen_test_txt('val.txt')
+
@@ -0,0 +1,32 @@
+# coding: utf-8
+
+# This script is used to remove the optimizer parameters in the saved checkpoint files.
+# These parameters are useless in the forward process. 
+# Removing them will shrink the checkpoint size a lot.
+
+import sys
+sys.path.append('..')
+
+import os
+import tensorflow as tf
+from model import yolov3
+
+# params
+ckpt_path = ''
+class_num = 20
+save_dir = 'shrinked_ckpt'
+if not os.path.exists(save_dir):
+    os.makedirs(save_dir)
+
+image = tf.placeholder(tf.float32, [1, 416, 416, 3])
+yolo_model = yolov3(class_num, None)
+with tf.variable_scope('yolov3'):
+    pred_feature_maps = yolo_model.forward(image)
+
+saver_to_restore = tf.train.Saver()
+saver_to_save = tf.train.Saver()
+
+with tf.Session() as sess:
+    sess.run(tf.global_variables_initializer())
+    saver_to_restore.restore(sess, ckpt_path)
+    saver_to_save.save(sess, save_dir + '/shrinked')
@@ -0,0 +1,457 @@
+# coding=utf-8
+# for better understanding about yolov3 architecture, refer to this website (in Chinese):
+# https://blog.csdn.net/leviopku/article/details/82660381
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+
+slim = tf.contrib.slim
+
+from utils.layer_utils import conv2d, darknet53_body, yolo_block, upsample_layer
+
+
+class yolov3(object):
+
+    def __init__(self, class_num, anchors, use_label_smooth=False, use_focal_loss=False, batch_norm_decay=0.999,
+                 weight_decay=5e-4, use_static_shape=True,
+                 img_size=(416, 416), batch_size=None):
+
+        # self.anchors = [[10, 13], [16, 30], [33, 23],
+        #                 [30, 61], [62, 45], [59, 119],
+        #                 [116, 90], [156, 198], [373, 326]]
+        self.class_num = class_num
+        self.anchors = anchors
+        self.batch_norm_decay = batch_norm_decay
+        self.use_label_smooth = use_label_smooth
+        self.use_focal_loss = use_focal_loss
+        self.weight_decay = weight_decay
+        # inference speed optimization
+        # if `use_static_shape` is True, use tensor.get_shape(), otherwise use tf.shape(tensor)
+        # static_shape is slightly faster
+        self.use_static_shape = use_static_shape
+        self.batch_size = batch_size
+        # self.img_size = (416, 416)
+        self.img_size = img_size
+        self.featrue_map_shape_base = [32, 16, 8]
+        self.featrue_map_shape = [(self.img_size[0] // i, self.img_size[1] // i) for i in self.featrue_map_shape_base]
+
+    def forward(self, inputs, is_training=False, reuse=False):
+        # the input img_size, form: [height, weight]
+        # it will be used later
+        # self.img_size = tf.shape(inputs)[1:3]
+        # self.featrue_map_shape = [(self.img_size[0]//i, self.img_size[1]//i) for i in self.featrue_map_shape_base]
+        # set batch norm params
+        batch_norm_params = {
+            'decay': self.batch_norm_decay,
+            'epsilon': 1e-05,
+            'scale': True,
+            'is_training': is_training,
+            'fused': None,  # Use fused batch norm if possible.
+        }
+
+        with slim.arg_scope([slim.conv2d, slim.batch_norm], reuse=reuse):
+            with slim.arg_scope([slim.conv2d],
+                                normalizer_fn=slim.batch_norm,
+                                normalizer_params=batch_norm_params,
+                                biases_initializer=None,
+                                activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1),
+                                weights_regularizer=slim.l2_regularizer(self.weight_decay)):
+                with tf.variable_scope('darknet53_body'):
+                    route_1, route_2, route_3 = darknet53_body(inputs)
+
+                with tf.variable_scope('yolov3_head'):
+                    inter1, net = yolo_block(route_3, 512)
+                    feature_map_1 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
+                                                stride=1, normalizer_fn=None,
+                                                activation_fn=None, biases_initializer=tf.zeros_initializer())
+                    feature_map_1 = tf.identity(feature_map_1, name='feature_map_1')
+
+                    inter1 = conv2d(inter1, 256, 1)
+                    inter1 = upsample_layer(inter1,
+                                            route_2.get_shape().as_list() if self.use_static_shape else tf.shape(
+                                                route_2))
+                    concat1 = tf.concat([inter1, route_2], axis=3)
+
+                    inter2, net = yolo_block(concat1, 256)
+                    feature_map_2 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
+                                                stride=1, normalizer_fn=None,
+                                                activation_fn=None, biases_initializer=tf.zeros_initializer())
+                    feature_map_2 = tf.identity(feature_map_2, name='feature_map_2')
+
+                    inter2 = conv2d(inter2, 128, 1)
+                    inter2 = upsample_layer(inter2,
+                                            route_1.get_shape().as_list() if self.use_static_shape else tf.shape(
+                                                route_1))
+                    concat2 = tf.concat([inter2, route_1], axis=3)
+
+                    _, feature_map_3 = yolo_block(concat2, 128)
+                    feature_map_3 = slim.conv2d(feature_map_3, 3 * (5 + self.class_num), 1,
+                                                stride=1, normalizer_fn=None,
+                                                activation_fn=None, biases_initializer=tf.zeros_initializer())
+                    feature_map_3 = tf.identity(feature_map_3, name='feature_map_3')
+
+            return feature_map_1, feature_map_2, feature_map_3
+
+    def reorg_layer(self, feature_map, anchors):
+        '''
+        feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned
+            from `forward` function
+        anchors: shape: [3, 2]
+        '''
+        # NOTE: size in [h, w] format! don't get messed up!
+        grid_size = feature_map.get_shape().as_list()[1:3] if self.use_static_shape else tf.shape(feature_map)[
+                                                                                         1:3]  # [13, 13]
+        # the downscale ratio in height and weight
+        # ratio = tf.cast(self.img_size / grid_size, tf.float32)
+        ratio = tf.cast([self.img_size[0] / grid_size[0], self.img_size[1] / grid_size[1]], tf.float32)
+        # rescale the anchors to the feature_map
+        # NOTE: the anchor is in [w, h] format!
+        rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors]
+
+        feature_map = tf.reshape(feature_map, [-1, grid_size[0], grid_size[1], 3, 5 + self.class_num])
+
+        # split the feature_map along the last dimension
+        # shape info: take 416x416 input image and the 13*13 feature_map for example:
+        # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y]
+        # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height]
+        # conf_logits: [N, 13, 13, 3, 1]
+        # prob_logits: [N, 13, 13, 3, class_num]
+
+        # box_centers, box_sizes, conf_logits, prob_logits = tf.split(feature_map, [2, 2, 1, self.class_num], axis=-1)
+        box_centers = feature_map[..., :2]
+        box_sizes = feature_map[..., 2:4]
+        conf_logits = feature_map[..., 4:5]
+        prob_logits = feature_map[..., 5:]
+
+        # conf_logits = tf.expand_dims(conf_logits, -1)
+
+        box_centers = tf.nn.sigmoid(box_centers)
+
+        # use some broadcast tricks to get the mesh coordinates
+        grid_x = tf.range(grid_size[1], dtype=tf.int32)
+        grid_y = tf.range(grid_size[0], dtype=tf.int32)
+        grid_x, grid_y = tf.meshgrid(grid_x, grid_y)
+        x_offset = tf.reshape(grid_x, (-1, 1))
+        y_offset = tf.reshape(grid_y, (-1, 1))
+        x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
+        # shape: [13, 13, 1, 2]
+        x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32)
+
+        # get the absolute box coordinates on the feature_map 
+        box_centers = box_centers + x_y_offset
+        # rescale to the original image scale
+        box_centers = box_centers * ratio[::-1]
+
+        # avoid getting possible nan value with tf.clip_by_value
+        box_sizes = tf.exp(box_sizes) * rescaled_anchors
+        # box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 100) * rescaled_anchors
+        # rescale to the original image scale
+        box_sizes = box_sizes * ratio[::-1]
+
+        # shape: [N, 13, 13, 3, 4]
+        # last dimension: (center_x, center_y, w, h)
+        boxes = tf.concat([box_centers, box_sizes], axis=-1)
+
+        # shape:
+        # x_y_offset: [13, 13, 1, 2]
+        # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale
+        # conf_logits: [N, 13, 13, 3, 1]
+        # prob_logits: [N, 13, 13, 3, class_num]
+        return x_y_offset, boxes, conf_logits, prob_logits
+
+    def predict(self, feature_maps):
+        '''
+        Receive the returned feature_maps from `forward` function,
+        the produce the output predictions at the test stage.
+        '''
+        feature_map_1, feature_map_2, feature_map_3 = feature_maps
+
+        feature_map_anchors = [(feature_map_1, self.anchors[6:9]),
+                               (feature_map_2, self.anchors[3:6]),
+                               (feature_map_3, self.anchors[0:3])]
+        reorg_results = [self.reorg_layer(feature_map, anchors) for (feature_map, anchors) in feature_map_anchors]
+
+        def _reshape(result):
+            x_y_offset, boxes, conf_logits, prob_logits = result
+            grid_size = x_y_offset.get_shape().as_list()[:2] if self.use_static_shape else tf.shape(x_y_offset)[:2]
+            boxes = tf.reshape(boxes, [-1, grid_size[0] * grid_size[1] * 3, 4])
+            conf_logits = tf.reshape(conf_logits, [-1, grid_size[0] * grid_size[1] * 3, 1])
+            prob_logits = tf.reshape(prob_logits, [-1, grid_size[0] * grid_size[1] * 3, self.class_num])
+            # shape: (take 416*416 input image and feature_map_1 for example)
+            # boxes: [N, 13*13*3, 4]
+            # conf_logits: [N, 13*13*3, 1]
+            # prob_logits: [N, 13*13*3, class_num]
+            return boxes, conf_logits, prob_logits
+
+        boxes_list, confs_list, probs_list = [], [], []
+        for result in reorg_results:
+            boxes, conf_logits, prob_logits = _reshape(result)
+            confs = tf.sigmoid(conf_logits)
+            probs = tf.sigmoid(prob_logits)
+            boxes_list.append(boxes)
+            confs_list.append(confs)
+            probs_list.append(probs)
+
+        # collect results on three scales
+        # take 416*416 input image for example:
+        # shape: [N, (13*13+26*26+52*52)*3, 4]
+        boxes = tf.concat(boxes_list, axis=1)
+        # shape: [N, (13*13+26*26+52*52)*3, 1]
+        confs = tf.concat(confs_list, axis=1)
+        # shape: [N, (13*13+26*26+52*52)*3, class_num]
+        probs = tf.concat(probs_list, axis=1)
+
+        # center_x, center_y, width, height = tf.split(boxes, [1, 1, 1, 1], axis=-1)
+
+        # center_x = tf.expand_dims(boxes[..., 0], 2)
+        # center_y = tf.expand_dims(boxes[..., 1], 2)
+        # width = tf.expand_dims(boxes[..., 2],    2)
+        # height = tf.expand_dims(boxes[..., 3],   2)
+
+        center_x = boxes[..., 0:1]
+        center_y = boxes[..., 1:2]
+        width = boxes[..., 2:3]
+        height = boxes[..., 3:]
+
+        x_min = center_x - width / 2
+        y_min = center_y - height / 2
+        x_max = center_x + width / 2
+        y_max = center_y + height / 2
+
+        boxes = tf.concat([x_min, y_min, x_max, y_max], axis=-1)
+
+        return boxes, confs, probs
+
+    def loss_layer(self, feature_map_i, y_true, anchors, feature_map_shape_i, gt_box_i):
+        '''
+        calc loss function from a certain scale
+        input:
+            feature_map_i: feature maps of a certain scale. shape: [N, 13, 13, 3*(5 + num_class)] etc.
+            y_true: y_ture from a certain scale. shape: [N, 13, 13, 3, 5 + num_class + 1] etc.
+            anchors: shape [9, 2]
+        '''
+
+        # size in [h, w] format! don't get messed up!
+        # grid_size = tf.shape(feature_map_i)[1:3]
+        grid_size = tf.shape(feature_map_i)[1:3]
+        # the downscale ratio in height and weight
+        ratio = tf.cast(self.img_size / grid_size, tf.float32)
+        # N: batch_size
+        N = tf.cast(tf.shape(feature_map_i)[0], tf.float32)
+
+        x_y_offset, pred_boxes, pred_conf_logits, pred_prob_logits = self.reorg_layer(feature_map_i, anchors)
+
+        ###########
+        # get mask
+        ###########
+
+        # shape: take 416x416 input image and 13*13 feature_map for example:
+        # [N, 13, 13, 3, 1]
+        object_mask = y_true[..., 4:5]
+
+        # the calculation of ignore mask if referred from
+        # https://github.com/pjreddie/darknet/blob/master/src/yolo_layer.c#L179
+        # ignore_mask = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
+        # def loop_cond(idx, ignore_mask):
+        #     return tf.less(idx, tf.cast(N, tf.int32))
+        # def loop_body(idx, ignore_mask=None):
+        #     # shape: [13, 13, 3, 4] & [13, 13, 3]  ==>  [V, 4]
+        #     # V: num of true gt box of each image in a batch
+        #     valid_true_boxes = tf.boolean_mask(y_true[idx, ..., 0:4], tf.cast(object_mask[idx, ..., 0], 'bool'))
+        #     # shape: [13, 13, 3, 4] & [V, 4] ==> [13, 13, 3, V]
+        #     iou = self.box_iou(pred_boxes[idx], valid_true_boxes)
+        #     # shape: [13, 13, 3]
+        #     best_iou = tf.reduce_max(iou, axis=-1)
+        #     # shape: [13, 13, 3]
+        #     ignore_mask_tmp = tf.cast(best_iou < 0.5, tf.float32)
+        #     # finally will be shape: [N, 13, 13, 3]
+        #     # ignore_mask = ignore_mask.write(idx, ignore_mask_tmp)
+        #     if ignore_mask is None:
+        #         ignore_mask = tf.expand_dims(ignore_mask_tmp, 0)
+        #     else:
+        #         ignore_mask = tf.concat([ignore_mask, tf.expand_dims(ignore_mask_tmp, 0)], 0)
+        #     print(idx, ignore_mask)
+        #     return idx + 1, ignore_mask
+        # ignore_mask = None
+        # _, ignore_mask = tf.while_loop(cond=loop_cond, body=loop_body, loop_vars=[0, ignore_mask])
+        # ignore_mask = ignore_mask.stack()
+
+        iou = self.box_iou(pred_boxes, gt_box_i)  # [N, 13, 13, 3, 16]
+        best_iou = tf.reduce_max(iou, axis=-1)  # [N, 13, 13, 3]
+        ignore_mask = tf.cast(best_iou < 0.5, tf.float32)  # [N, 13, 13, 3]
+        # shape: [N, 13, 13, 3, 1]
+        ignore_mask = tf.expand_dims(ignore_mask, -1)
+        ignore_mask = tf.stop_gradient(ignore_mask)
+
+        # shape: [N, 13, 13, 3, 2]
+        pred_box_xy = pred_boxes[..., 0:2]
+        pred_box_wh = pred_boxes[..., 2:4]
+
+        # get xy coordinates in one cell from the feature_map
+        # numerical range: 0 ~ 1
+        # shape: [N, 13, 13, 3, 2]
+        print(y_true[..., 0:2], ratio[::-1], x_y_offset)
+        true_xy = y_true[..., 0:2] / ratio[::-1] - x_y_offset
+        pred_xy = pred_box_xy / ratio[::-1] - x_y_offset
+
+        # get_tw_th
+        # numerical range: 0 ~ 1
+        # shape: [N, 13, 13, 3, 2]
+        true_tw_th = y_true[..., 2:4] / anchors
+        pred_tw_th = pred_box_wh / anchors
+        # for numerical stability
+        true_tw_th = tf.where(condition=tf.equal(true_tw_th, 0),
+                              x=tf.ones_like(true_tw_th), y=true_tw_th)
+        pred_tw_th = tf.where(condition=tf.equal(pred_tw_th, 0),
+                              x=tf.ones_like(pred_tw_th), y=pred_tw_th)
+        true_tw_th = tf.log(tf.clip_by_value(true_tw_th, 1e-9, 1e9))
+        pred_tw_th = tf.log(tf.clip_by_value(pred_tw_th, 1e-9, 1e9))
+
+        # box size punishment: 
+        # box with smaller area has bigger weight. This is taken from the yolo darknet C source code.
+        # shape: [N, 13, 13, 3, 1]
+        box_loss_scale = 2. - (y_true[..., 2:3] / tf.cast(self.img_size[1], tf.float32)) * (
+                y_true[..., 3:4] / tf.cast(self.img_size[0], tf.float32))
+
+        ############
+        # loss_part
+        ############
+        # mix_up weight
+        # mix_w = y_true[..., self.class_num+5]
+        # [N, 13, 13, 3, 1]
+        # mix_w = y_true[..., -1:]
+        mix_w = y_true[..., 85:]
+        # mix_w = tf.expand_dims(mix_w, -1)
+        # shape: [N, 13, 13, 3, 1]
+        xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy) * object_mask * box_loss_scale * mix_w) / N
+        wh_loss = tf.reduce_sum(tf.square(true_tw_th - pred_tw_th) * object_mask * box_loss_scale * mix_w) / N
+
+        # shape: [N, 13, 13, 3, 1]
+        conf_pos_mask = object_mask
+        conf_neg_mask = (1 - object_mask) * ignore_mask
+        conf_loss_pos = conf_pos_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask,
+                                                                                logits=pred_conf_logits)
+        conf_loss_neg = conf_neg_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask,
+                                                                                logits=pred_conf_logits)
+        # TODO: may need to balance the pos-neg by multiplying some weights
+        conf_loss = conf_loss_pos + conf_loss_neg
+        if self.use_focal_loss:
+            alpha = 1.0
+            gamma = 2.0
+            # TODO: alpha should be a mask array if needed
+            focal_mask = alpha * tf.pow(tf.abs(object_mask - tf.sigmoid(pred_conf_logits)), gamma)
+            conf_loss *= focal_mask
+        conf_loss = tf.reduce_sum(conf_loss * mix_w) / N
+
+        # shape: [N, 13, 13, 3, 1]
+        # whether to use label smooth
+        if self.use_label_smooth:
+            delta = 0.01
+            label_target = (1 - delta) * y_true[..., 5:(5 + self.class_num)] + delta * 1. / self.class_num
+        else:
+            label_target = y_true[..., 5:(5 + self.class_num)]
+        class_loss = object_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=label_target,
+                                                                           logits=pred_prob_logits) * mix_w
+        class_loss = tf.reduce_sum(class_loss) / N
+
+        return xy_loss, wh_loss, conf_loss, class_loss
+
+    def box_iou(self, pred_boxes, valid_true_boxes):
+        '''
+        param:
+            pred_boxes: [13, 13, 3, 4], (center_x, center_y, w, h)
+            valid_true: [1, 16, 4]
+        '''
+        # valid_true_boxes = tf.expand_dims(valid_true_boxes, -2)
+
+        # [13, 13, 3, 2]
+        pred_box_xy = pred_boxes[..., 0:2]
+        pred_box_wh = pred_boxes[..., 2:4]
+
+        # shape: [13, 13, 3, 1, 2]
+        pred_box_xy = tf.expand_dims(pred_box_xy, -2)
+        pred_box_wh = tf.expand_dims(pred_box_wh, -2)
+
+        print('##################pred_box_wh', pred_box_wh)
+
+        # [V, 2]
+        # N,H,W,A,C = valid_true_boxes.shape
+        # valid_true_boxes = tf.gather(valid_true_boxes, tf.where(object_mask))
+        # print(valid_true_boxes, object_mask)
+        # print(valid_true_boxes)
+        # input()
+        # valid_true_boxes = tf.reshape(valid_true_boxes, (self.batch_size, 1, 1, 3, -1, 4))
+
+        # x = tf.reshape(valid_true_boxes[..., 0], (self.batch_size, 3, -1))
+        # y = tf.reshape(valid_true_boxes[..., 1], (self.batch_size, 3, -1))
+        # w = tf.reshape(valid_true_boxes[..., 2], (self.batch_size, 3, -1))
+        # h = tf.reshape(valid_true_boxes[..., 3], (self.batch_size, 3, -1))
+        # valid_true_boxes =  tf.stack([x,y,w,h], axis=-1)
+        valid_true_boxes = tf.expand_dims(valid_true_boxes, 1)  # [1, 1, 16, 4]
+        valid_true_boxes = tf.expand_dims(valid_true_boxes, 1)  # [1, 1, 1, 16, 4]
+
+        print('##################valid_true_boxes', valid_true_boxes)
+
+        # valid_true_boxes = tf.tile(valid_true_boxes, [1,H,W,1,1])
+        # print(valid_true_boxes)
+        # input()
+
+        true_box_xy = valid_true_boxes[..., :2]  # [1, 1, 1, 16, 2]
+        true_box_wh = valid_true_boxes[..., 2:]  # [1, 1, 1, 16, 2]
+
+        print('##################true_box_wh', true_box_wh)
+
+        # [13, 13, 3, 1, 2] & [1, 1, 1, 16, 2] ==> [13, 13, 3, 16, 2]
+        intersect_mins = tf.maximum(pred_box_xy - pred_box_wh / 2.,
+                                    true_box_xy - true_box_wh / 2.)
+        intersect_maxs = tf.minimum(pred_box_xy + pred_box_wh / 2.,
+                                    true_box_xy + true_box_wh / 2.)
+        intersect_wh = tf.maximum(intersect_maxs - intersect_mins, 0.)
+
+        print('##################intersect_mins', intersect_mins)
+        print('##################intersect_wh', intersect_wh)
+
+        # shape: [13, 13, 3, 16]
+        intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
+        # shape: [13, 13, 3, 1]
+        pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1]
+        # shape: [1, 1, 1, 16]
+        true_box_area = true_box_wh[..., 0] * true_box_wh[..., 1]
+        # shape: [1, V]
+        # true_box_area = tf.expand_dims(true_box_area, -2)
+        print('##################intersect_area', intersect_area)
+        print('##################pred_box_area', pred_box_area)
+        print('##################true_box_area', true_box_area)
+        # [13, 13, 3, 16]
+        iou = intersect_area / (pred_box_area + true_box_area - intersect_area + 1e-10)
+        print('##################iou', iou)
+        # iou = tf.clip_by_value(iou, 0, 1)
+
+        # print(pred_box_xy, pred_box_wh)
+        # print(intersect_area , pred_box_area , true_box_area , intersect_area)
+        # print(iou)
+        # input()
+
+        return iou
+
+    def compute_loss(self, y_pred, y_true, gt_box):
+        '''
+        param:
+            y_pred: returned feature_map list by `forward` function: [feature_map_1, feature_map_2, feature_map_3]
+            y_true: input y_true by the tf.data pipeline
+        '''
+        loss_xy, loss_wh, loss_conf, loss_class = 0., 0., 0., 0.
+        anchor_group = [self.anchors[6:9], self.anchors[3:6], self.anchors[0:3]]
+
+        # calc loss in 3 scales
+        for i in range(len(y_pred)):
+            print('##################level', i)
+
+            result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i], self.featrue_map_shape[i], gt_box[i])
+            loss_xy += result[0]
+            loss_wh += result[1]
+            loss_conf += result[2]
+            loss_class += result[3]
+        total_loss = loss_xy + loss_wh + loss_conf + loss_class
+        return [total_loss, loss_xy, loss_wh, loss_conf, loss_class]
@@ -0,0 +1,58 @@
+#!/bin/bash
+scriptDir=$(cd "$(dirname "$0")"; pwd)
+currentDir=$(cd "$(dirname "$scriptDir")"; pwd)
+
+# set env
+source ${currentDir}/config/npu_set_env.sh
+
+# setting main path
+CODE_PATH=currentDir/code
+
+# set env
+export ASCEND_HOME=/usr/local/Ascend
+export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/
+export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/te:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/topi:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/python/site-packages/hccl:/usr/local/Ascend/ascend-toolkit/latest/tfplugin/python/site-packages:$currentDir
+export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin
+export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
+
+export DDK_VERSION_FLAG=1.60.T49.0.B201
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+#export DUMP_GE_GRAPH=2
+#export DUMP_GRAPH_LEVEL=3
+#export PRINT_MODEL=1
+export SLOG_PRINT_TO_STDOUT=0
+
+# dump op data
+#export DISABLE_REUSE_MEMORY=1
+#export DUMP_OP=1
+
+ulimit -c unlimited
+
+# local variable
+RANK_SIZE=$1
+RANK_TABLE_FILE=./hccl_config/${RANK_SIZE}p.json
+RANK_ID_START=0
+SAVE_PATH=training/t1
+
+# training stage
+MODE=$2
+
+for((RANK_ID=$RANK_ID_START;RANK_ID<$((RANK_SIZE+RANK_ID_START));RANK_ID++));
+do
+echo
+su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[error]\" --device "$RANK_ID
+TMP_PATH=$SAVE_PATH/D$RANK_ID
+mkdir -p $TMP_PATH
+cp run_yolov3.sh $TMP_PATH/
+cp $RANK_TABLE_FILE $TMP_PATH/rank_table.json
+cd $TMP_PATH
+nohup bash run_yolov3.sh $RANK_ID $RANK_SIZE $CODE_PATH $MODE > train_$RANK_ID.log &
+cd -
+
+done
+
+
+
+
@@ -0,0 +1 @@
+nohup bash npu_train.sh 1 multi &
@@ -0,0 +1 @@
+nohup bash npu_train.sh 1 single &
@@ -0,0 +1 @@
+nohup bash npu_train.sh 8 multi &
@@ -0,0 +1 @@
+nohup bash npu_train.sh 8 single &
@@ -0,0 +1,50 @@
+
+#clean slog
+rm -rf /var/log/npu/slog/host-0/*.log
+rm -rf /var/log/npu/slog/device-*/*.log
+
+# setting main path
+MAIN_PATH=$(dirname $(readlink -f $0))
+
+# set env
+export PYTHONPATH=/usr/local/Ascend/ops/op_impl/built-in/ai_core/tbe/:$MAIN_PATH/../../../
+export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/x86_64-linux-gnu
+PATH=$PATH:$HOME/bin
+export PATH=$PATH:/usr/local/Ascend/fwkacllib/ccec_compiler/bin:$PATH
+export ASCEND_OPP_PATH=/usr/local/Ascend/opp
+export DDK_VERSION_FLAG=1.60.T49.0.B201
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+export DUMP_GE_GRAPH=1
+export DUMP_GRAPH_LEVEL=1
+export PRINT_MODEL=1
+#export SLOG_PRINT_TO_STDOUT=1
+
+ulimit -c unlimited
+
+# local variable
+RANK_SIZE=$1
+RANK_TABLE_FILE=./configs/${RANK_SIZE}p.json
+RANK_ID_START=1
+SAVE_PATH=training/t1
+
+for((RANK_ID=$RANK_ID_START;RANK_ID<$((RANK_SIZE+RANK_ID_START));RANK_ID++));
+do
+
+echo
+su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[debug]\" --device "$RANK_ID
+
+TMP_PATH=$SAVE_PATH/D$RANK_ID
+mkdir -p $TMP_PATH
+cp run_yolov3.sh $TMP_PATH/
+cp $RANK_TABLE_FILE $TMP_PATH/rank_table.json
+cd $TMP_PATH
+nohup bash run_yolov3.sh $RANK_ID $RANK_SIZE $MAIN_PATH > train_$RANK_ID.log &
+cd -
+
+done
+
+
+
+
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,57 @@
+
+#export CUDA_VISIBLE_DEVICES=''
+#export CUDA_VISIBLE_DEVICES=7
+
+
+
+# setting main path
+MAIN_PATH=$(dirname $(readlink -f $0))
+
+# set env
+export PYTHONPATH=/usr/local/Ascend/ops/op_impl/built-in/ai_core/tbe/:$MAIN_PATH/../../../
+export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/fwkacllib/lib64/:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/lib/x86_64-linux-gnu
+PATH=$PATH:$HOME/bin
+export PATH=$PATH:/usr/local/Ascend/fwkacllib/ccec_compiler/bin:$PATH
+export ASCEND_OPP_PATH=/usr/local/Ascend/opp
+export DDK_VERSION_FLAG=1.60.T49.0.B201
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+export RANK_ID=7
+export RANK_SIZE=1
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export JOB_ID=10087
+export FUSION_TENSOR_SIZE=1000000000
+#export SLOG_PRINT_TO_STDOUT=1
+#export DUMP_GE_GRAPH=2
+#export DUMP_GRAPH_LEVEL=3
+
+su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[debug]\" --device "$RANK_ID
+
+#RESTORE_PATH=/opt/npu/wujianping/epoch200/
+RESTORE_PATH=/opt/npu/w00558981/yolov3_ok_bak_zip/training/t1/D0/training/
+#RESTORE_PATH=/opt/npu/w00558981/training_done_yolov3/training/t1/D0/training/model-epoch_200_step_182000_loss_20.7852_lr_0
+
+while :
+do
+
+#python3.7 eval.py \
+#--save_img True \
+#--score_thresh 0.2 \
+#--restore_path $RESTORE_PATH \
+#--max_test 10 \
+
+
+python3.7 eval.py \
+--save_json True \
+--score_thresh 0.001 \
+--restore_path $RESTORE_PATH \
+--max_test 10000
+
+break
+sleep 1200
+
+done
+
+
@@ -0,0 +1,86 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+import numpy as np
+import argparse
+import cv2
+
+from utils.misc_utils import parse_anchors, read_class_names
+from utils.nms_utils import gpu_nms
+from utils.plot_utils import get_color_table, plot_one_box
+from utils.data_aug import letterbox_resize
+
+from model import yolov3
+
+parser = argparse.ArgumentParser(description="YOLO-V3 test single image test procedure.")
+parser.add_argument("input_image", type=str,
+                    help="The path of the input image.")
+parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
+                    help="The path of the anchor txt file.")
+parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
+                    help="Resize the input image with `new_size`, size format: [width, height]")
+parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=True,
+                    help="Whether to use the letterbox resize.")
+parser.add_argument("--class_name_path", type=str, default="./data/coco.names",
+                    help="The path of the class names.")
+parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
+                    help="The path of the weights to restore.")
+args = parser.parse_args()
+
+args.anchors = parse_anchors(args.anchor_path)
+args.classes = read_class_names(args.class_name_path)
+args.num_class = len(args.classes)
+
+color_table = get_color_table(args.num_class)
+
+img_ori = cv2.imread(args.input_image)
+if args.letterbox_resize:
+    img, resize_ratio, dw, dh = letterbox_resize(img_ori, args.new_size[0], args.new_size[1])
+else:
+    height_ori, width_ori = img_ori.shape[:2]
+    img = cv2.resize(img_ori, tuple(args.new_size))
+img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+img = np.asarray(img, np.float32)
+img = img[np.newaxis, :] / 255.
+
+with tf.Session() as sess:
+    input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data')
+    yolo_model = yolov3(args.num_class, args.anchors)
+    with tf.variable_scope('yolov3'):
+        pred_feature_maps = yolo_model.forward(input_data, False)
+    pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
+
+    pred_scores = pred_confs * pred_probs
+
+    boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
+
+    saver = tf.train.Saver()
+    saver.restore(sess, args.restore_path)
+
+    boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
+
+    # rescale the coordinates to the original image
+    if args.letterbox_resize:
+        boxes_[:, [0, 2]] = (boxes_[:, [0, 2]] - dw) / resize_ratio
+        boxes_[:, [1, 3]] = (boxes_[:, [1, 3]] - dh) / resize_ratio
+    else:
+        boxes_[:, [0, 2]] *= (width_ori/float(args.new_size[0]))
+        boxes_[:, [1, 3]] *= (height_ori/float(args.new_size[1]))
+
+    print("box coords:")
+    print(boxes_)
+    print('*' * 30)
+    print("scores:")
+    print(scores_)
+    print('*' * 30)
+    print("labels:")
+    print(labels_)
+
+    for i in range(len(boxes_)):
+        x0, y0, x1, y1 = boxes_[i]
+        plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]] + ', {:.2f}%'.format(scores_[i] * 100), color=color_table[labels_[i]])
+    cv2.imshow('Detection result', img_ori)
+    cv2.imwrite('detection_result.jpg', img_ori)
+    cv2.waitKey(0)
@@ -0,0 +1,287 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+import numpy as np
+import logging
+from tqdm import trange
+import random
+import time
+import datetime
+from utils.data_utils import get_batch_data, color_jitter
+from utils.misc_utils import shuffle_and_overwrite, make_summary, config_learning_rate, config_optimizer, AverageMeter
+from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu, get_preds_gpu, voc_eval, parse_gt_rec
+from model import yolov3
+import time
+import os
+import sys
+# npu modified
+from npu_bridge.estimator import npu_ops
+from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
+from npu_bridge.estimator.npu.npu_loss_scale_optimizer import NPULossScaleOptimizer
+from npu_bridge.estimator.npu.npu_loss_scale_manager import FixedLossScaleManager
+from npu_bridge.estimator.npu.npu_loss_scale_manager import ExponentialUpdateLossScaleManager
+from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig
+from npu_bridge.estimator.npu import util
+
+sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)),'../../../../../'))
+sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)),'../../../../utils/atlasboost'))
+from benchmark_log import hwlog
+from benchmark_log.basic_utils import get_environment_info
+from benchmark_log.basic_utils import get_model_parameter
+import argparse
+
+hwlog.ROOT_DIR = os.path.split(os.path.abspath(__file__))[0]
+cpu_info, npu_info, framework_info, os_info, benchmark_version = get_environment_info("tensorflow")
+config_info = get_model_parameter("tensorflow_config")
+initinal_data={"base_lr": 0.128, "dataset": "coco1024", "optimizer": "Adam", "loss_scale": 512, "batchsize": 32}
+
+hwlog.remark_print(key=hwlog.CPU_INFO, value=cpu_info)
+hwlog.remark_print(key=hwlog.NPU_INFO, value=npu_info)
+hwlog.remark_print(key=hwlog.OS_INFO, value=os_info)
+hwlog.remark_print(key=hwlog.FRAMEWORK_INFO, value=framework_info)
+hwlog.remark_print(key=hwlog.BENCHMARK_VERSION, value=benchmark_version)
+hwlog.remark_print(key=hwlog.CONFIG_INFO, value=config_info)
+hwlog.remark_print(key=hwlog.BASE_LR, value=initinal_data.get("base_lr"))
+hwlog.remark_print(key=hwlog.DATASET, value=initinal_data.get("dataset"))
+hwlog.remark_print(key=hwlog.OPT_NAME, value=initinal_data.get("optimizer"))
+hwlog.remark_print(key=hwlog.LOSS_SCALE, value=initinal_data.get("loss_scale"))
+hwlog.remark_print(key=hwlog.INPUT_BATCH_SIZE, value=initinal_data.get("batchsize"))
+
+parser = argparse.ArgumentParser(description="YOLO-V3 training setting.")
+parser.add_argument("--mode", type=str, default='single',
+                    help="setting train mode of training.")
+parser.add_argument("--resume", type=bool, default=False,
+                    help="setting if train from resume.")
+args_input = parser.parse_args()
+
+if args_input.mode == 'single':
+    import args_single as args
+elif args_input.mode == 'multi':
+    import args_multi as args
+print('setting train mode %s.' %args_input.mode)
+
+# setting loggers
+logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s',
+                    datefmt='%a, %d %b %Y %H:%M:%S', filename=args.progress_log_path, filemode='w')
+
+
+##################
+# tf.data pipeline
+##################
+train_dataset = tf.data.TextLineDataset(args.train_file)
+print('##########################args_input.rank_id', os.environ['RANK_ID'])
+logging.info('shuffle seed_%s args.', os.environ['RANK_ID'])
+
+train_dataset = train_dataset.shuffle(args.train_img_cnt, seed=int(os.environ['RANK_ID']),
+                                      reshuffle_each_iteration=True)
+print('##########################args.train_img_cnt', args.train_img_cnt)
+
+train_dataset = train_dataset.repeat()
+train_dataset = train_dataset.batch(args.batch_size, drop_remainder=True)  # npu modified
+train_dataset = train_dataset.map(
+    lambda x: tf.py_func(get_batch_data,
+                         inp=[x, args.class_num, args.img_size, args.anchors, 'train', args.multi_scale_train,
+                              args.use_mix_up, args.letterbox_resize],
+                         Tout=[tf.float32,
+                               tf.float32, tf.float32, tf.float32,
+                               tf.float32, tf.float32, tf.float32]),
+    num_parallel_calls=20
+)
+
+
+def valid_shape(*x):
+    image, y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52 = x
+    y_true = [y_true_13, y_true_26, y_true_52]
+    gt_box = [gt_box_13, gt_box_26, gt_box_52]
+
+    # npu modified
+    if args_input.mode == 'single':
+        image.set_shape([args.batch_size, args.img_size[0], args.img_size[1], 3])
+        y_true[0].set_shape([args.batch_size, 13, 13, 3, 86])
+        y_true[1].set_shape([args.batch_size, 26, 26, 3, 86])
+        y_true[2].set_shape([args.batch_size, 52, 52, 3, 86])
+    elif args_input.mode == 'multi':
+        image.set_shape([args.batch_size, args.img_size[0], args.img_size[1], 3])
+        y_true[0].set_shape([args.batch_size, 19*1, 19*1, 3, 86])
+        y_true[1].set_shape([args.batch_size, 19*2, 19*2, 3, 86])
+        y_true[2].set_shape([args.batch_size, 19*4, 19*4, 3, 86])
+
+    gt_box[0].set_shape([args.batch_size, 1, 32, 4])
+    gt_box[1].set_shape([args.batch_size, 1, 64, 4])
+    gt_box[2].set_shape([args.batch_size, 1, 128, 4])
+
+    image = color_jitter(
+        image, brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05)
+
+    return image, y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52
+
+
+train_dataset = train_dataset.map(valid_shape, num_parallel_calls=20)
+train_dataset = train_dataset.prefetch(args.prefetech_buffer)
+iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
+train_init_op = iterator.make_initializer(train_dataset)
+# get an element from the chosen dataset iterator
+image, y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52 = iterator.get_next()
+y_true = [y_true_13, y_true_26, y_true_52]
+gt_box = [gt_box_13, gt_box_26, gt_box_52]
+
+
+##################
+# Model definition
+##################
+yolo_model = yolov3(args.class_num, args.anchors, args.use_label_smooth, args.use_focal_loss, args.batch_norm_decay,
+                    args.weight_decay, use_static_shape=False,
+                    batch_size=args.batch_size, img_size=args.img_size)
+
+with tf.variable_scope('yolov3'):
+    pred_feature_maps = yolo_model.forward(image, is_training=True)
+loss = yolo_model.compute_loss(pred_feature_maps, y_true, gt_box)
+l2_loss = tf.losses.get_regularization_loss()
+
+# setting restore parts and vars to update
+saver_to_restore = tf.train.Saver(
+    var_list=tf.contrib.framework.get_variables_to_restore(include=args.restore_include, exclude=args.restore_exclude))
+update_vars = tf.contrib.framework.get_variables_to_restore(include=args.update_part)
+
+tf.summary.scalar('train_batch_statistics/total_loss', loss[0])
+tf.summary.scalar('train_batch_statistics/loss_xy', loss[1])
+tf.summary.scalar('train_batch_statistics/loss_wh', loss[2])
+tf.summary.scalar('train_batch_statistics/loss_conf', loss[3])
+tf.summary.scalar('train_batch_statistics/loss_class', loss[4])
+tf.summary.scalar('train_batch_statistics/loss_l2', l2_loss)
+tf.summary.scalar('train_batch_statistics/loss_ratio', l2_loss / loss[0])
+
+def learning_rate_fn(global_step):
+    """Builds scaled learning rate function with 0.08 epoch warm up."""
+    initial_learning_rate = args.learning_rate_init
+    batches_per_epoch = args.train_batch_num // args.iterations_per_loop * args.iterations_per_loop
+    total_steps = int(args.total_epoches * batches_per_epoch)
+    warmup_steps = int(batches_per_epoch * args.warm_up_epoch)
+    tf.compat.v1.logging.info('total_steps: %d', int(total_steps))
+    tf.compat.v1.logging.info('warmup_steps: %d', int(warmup_steps))
+    lr = tf.maximum(
+        tf.compat.v1.train.cosine_decay(
+            learning_rate=initial_learning_rate,
+            global_step=global_step - warmup_steps,
+            decay_steps=total_steps - warmup_steps,
+        ),
+        0,
+    )
+    warmup_lr = (
+            initial_learning_rate * tf.cast(global_step, tf.float32) / tf.cast(
+        warmup_steps, tf.float32))
+    return tf.cond(pred=global_step < warmup_steps,
+                   true_fn=lambda: warmup_lr,
+                   false_fn=lambda: lr)
+
+
+global_step = tf.train.get_or_create_global_step()
+learning_rate = learning_rate_fn(global_step)
+tf.summary.scalar('learning_rate', learning_rate)
+
+if not args.save_optimizer:
+    saver_to_save = tf.train.Saver()
+    saver_best = tf.train.Saver()
+
+optimizer = config_optimizer(args.optimizer_name, learning_rate)
+optimizer = NPUDistributedOptimizer(optimizer)
+loss_scale_manager = FixedLossScaleManager(loss_scale=128)
+if args.num_gpus > 1:
+    optimizer = NPULossScaleOptimizer(optimizer, loss_scale_manager, is_distributed=True)
+else:
+    optimizer = NPULossScaleOptimizer(optimizer, loss_scale_manager, is_distributed=False)
+
+# set dependencies for BN ops
+update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
+with tf.control_dependencies(update_ops):
+    # apply gradient clip to avoid gradient exploding
+    gvs = optimizer.compute_gradients(loss[0] + l2_loss, var_list=update_vars)
+    clip_grad_var = [gv if gv[0] is None else [
+        tf.clip_by_norm(gv[0], 100.), gv[1]] for gv in gvs]
+    train_op = optimizer.apply_gradients(clip_grad_var, global_step=tf.train.get_global_step())
+
+if args.save_optimizer:
+    print(
+        'Saving optimizer parameters to checkpoint! Remember to restore the global_step in the fine-tuning afterwards.')
+    saver_to_save = tf.train.Saver()
+    saver_best = tf.train.Saver()
+
+# npu modified
+config = tf.ConfigProto()
+custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
+custom_op.name = "NpuOptimizer"
+custom_op.parameter_map["use_off_line"].b = True  # training on Ascend chips
+custom_op.parameter_map["enable_data_pre_proc"].b = True
+custom_op.parameter_map["iterations_per_loop"].i = args.iterations_per_loop
+config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
+
+with tf.Session(config=config) as sess:
+    # yolov3 finetuning训练开启（darknet53.ckpt）
+    sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
+
+    # 断点续训开启
+    if args_input.resume:
+        saver_to_restore = tf.train.Saver()
+        saver_to_restore.restore(sess, tf.train.latest_checkpoint(args.save_dir))
+    else:
+        saver_to_restore.restore(sess, args.restore_path)
+
+    merged = tf.summary.merge_all()
+    writer = tf.summary.FileWriter(args.log_dir, sess.graph)
+
+    print('\n----------- start to train -----------\n')
+     
+    #hwlog.logger.info("time_ts:%s, hardware:%s current os:%s" %(date_time,'Ascend910','Ubuntu 18.04'))
+    #hwlog.logger.info("time_ts:%s, framework is tensorflow 1.15.0 " %(date_time))
+    #remark_logger.info("ABK time_ts: %s, yolov3 %s model train begain, total train_epoches:%d, file: %s, lineno: %s" %(date_time,args_input.mode,args.total_epoches,file_name,sys._getframe().f_lineno))
+    hwlog.remark_print(key=hwlog.TOTAL_TRAIN_EPOCH, value=f"{args.total_epoches}")
+    best_mAP = -np.Inf
+    train_op = util.set_iteration_per_loop(sess, train_op, args.iterations_per_loop)
+    sess.run(train_init_op)
+    for epoch in range(args.total_epoches):
+        loss_total, loss_xy, loss_wh, loss_conf, loss_class = AverageMeter(), AverageMeter(), AverageMeter(), AverageMeter(), AverageMeter()
+        for i in trange(args.train_batch_num // args.iterations_per_loop):
+            t = time.time()
+            _, summary, __y_true, __loss, __global_step, __lr = sess.run(
+                [train_op, merged, y_true, loss, global_step, learning_rate]
+            )
+            fps = 1 / (time.time() - t) * args.iterations_per_loop * args.num_gpus * args.batch_size
+
+            writer.add_summary(summary, global_step=__global_step)
+
+            loss_total.update(__loss[0], len(__y_true[0]))
+            loss_xy.update(__loss[1], len(__y_true[0]))
+            loss_wh.update(__loss[2], len(__y_true[0]))
+            loss_conf.update(__loss[3], len(__y_true[0]))
+            loss_class.update(__loss[4], len(__y_true[0]))
+
+            info = "Epoch: {}, global_step: {} fps: {:.2f} lr: {:.5f} | loss: total: {:.2f}, xy: {:.2f}, wh: {:.2f}, conf: {:.2f}, class: {:.2f} | ".format(
+                epoch, int(__global_step), fps, __lr, loss_total.average, loss_xy.average, loss_wh.average,
+                loss_conf.average,
+                loss_class.average)
+            print(info)
+            logging.info(info)
+            #remark_logger.info("ABK time_ts:%s, global_steps %d, learning rate %2f, file: %s, lineno: %s" %(date_time,int(__global_step),__lr,file_name,sys._getframe().f_lineno))
+            #remark_logger.info("ABK time_ts:%s, fps %2f, loss_total %2f, file: %s, lineno: %s" %(date_time,fps,loss_total.average,file_name,sys._getframe().f_lineno))
+
+            hwlog.remark_print(key=hwlog.FPS, value=f"{fps}")
+            hwlog.remark_print(key=hwlog.GLOBAL_STEP, value=f"{int(__global_step)}")
+
+        # NOTE: this is just demo. You can set the conditions when to save the weights.
+        temp_epoch = epoch + 1
+        if temp_epoch % args.save_epoch == 0 and epoch > 0:
+            saver_to_save.save(sess, args.save_dir + 'model-epoch_{}_step_{}_loss_{:.4f}_lr_{:.5g}'.format( \
+                temp_epoch,
+                int(__global_step),
+                loss_total.average,
+                __lr))
+
+        if __lr <= 0:
+            break
+
+    saver_to_save.save(sess, args.save_dir + 'model-final_step_{}_loss_{:.4f}_lr_{:.5g}'.format( \
+        int(__global_step),
+        loss_total.average,
+        __lr))
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,109 @@
+{
+    "board_id": "0x002f",
+    "chip_info": "910",
+    "deploy_mode": "lab",
+    "group_count": "1",
+    "group_list": [
+        {
+            "device_num": "8",
+            "server_num": "1",
+            "group_name": "",
+            "instance_count": "8",
+            "instance_list": [
+                {
+                    "devices": [
+                        {
+                            "device_id": "0",
+                            "device_ip": "192.168.100.101"
+                        }
+                    ],
+                    "rank_id": "0",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "1",
+                            "device_ip": "192.168.101.101"
+                        }
+                    ],
+                    "rank_id": "1",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "2",
+                            "device_ip": "192.168.102.101"
+                        }
+                    ],
+                    "rank_id": "2",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "3",
+                            "device_ip": "192.168.103.101"
+                        }
+                    ],
+                    "rank_id": "3",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "4",
+                            "device_ip": "192.168.100.100"
+                        }
+                    ],
+                    "rank_id": "4",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "5",
+                            "device_ip": "192.168.101.100"
+                        }
+                    ],
+                    "rank_id": "5",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "6",
+                            "device_ip": "192.168.102.100"
+                        }
+                    ],
+                    "rank_id": "6",
+                    "server_id": "0.0.0.0"
+                },
+                {
+                    "devices": [
+                        {
+                            "device_id": "7",
+                            "device_ip": "192.168.103.100"
+                        }
+                    ],
+                    "rank_id": "7",
+                    "server_id": "0.0.0.0"
+                }
+            ]
+        }
+    ],
+    "para_plane_nic_location": "device",
+    "para_plane_nic_name": [
+        "eth0",
+        "eth1",
+        "eth2",
+        "eth3",
+        "eth4",
+        "eth5",
+        "eth6",
+        "eth7"
+    ],
+    "para_plane_nic_num": "8",
+    "status": "completed"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+rm -rf Onnxgraph
+rm -rf Partition
+rm -rf OptimizeSubGraph
+rm -rf Aicpu_Optimized
+rm *txt
+rm -rf result_$RANK_ID
+
+
+
+export RANK_ID=$1
+export RANK_SIZE=$2
+export DEVICE_ID=$RANK_ID
+export DEVICE_INDEX=$RANK_ID
+export RANK_TABLE_FILE=rank_table.json
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+KERNEL_NUM=20
+PID_START=$((KERNEL_NUM * RANK_ID))
+PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3 $3/train.py \
+--mode $4
+
+mkdir graph
+mv *.txt graph
+mv *.pbtxt graph
@@ -0,0 +1,450 @@
+# coding: utf-8
+# part of this is take from Gluon's repo:
+# https://github.com/dmlc/gluon-cv/blob/master/gluoncv/data/transforms/presets/yolo.py
+
+from __future__ import division, print_function
+
+import random
+import numpy as np
+import cv2
+# from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
+
+
+def mix_up(img1, img2, bbox1, bbox2):
+    '''
+    return:
+        mix_img: HWC format mix up image
+        mix_bbox: [N, 5] shape mix up bbox, i.e. `x_min, y_min, x_max, y_mix, mixup_weight`.
+    '''
+    height = max(img1.shape[0], img2.shape[0])
+    width = max(img1.shape[1], img2.shape[1])
+
+    mix_img = np.zeros(shape=(height, width, 3), dtype='float32')
+
+    # rand_num = np.random.random()
+    rand_num = np.random.beta(1.5, 1.5)
+    rand_num = max(0, min(1, rand_num))
+    mix_img[:img1.shape[0], :img1.shape[1], :] = img1.astype('float32') * rand_num
+    mix_img[:img2.shape[0], :img2.shape[1], :] += img2.astype('float32') * (1. - rand_num)
+
+    mix_img = mix_img.astype('uint8')
+
+    # the last element of the 2nd dimention is the mix up weight
+    bbox1 = np.concatenate((bbox1, np.full(shape=(bbox1.shape[0], 1), fill_value=rand_num)), axis=-1)
+    bbox2 = np.concatenate((bbox2, np.full(shape=(bbox2.shape[0], 1), fill_value=1. - rand_num)), axis=-1)
+    mix_bbox = np.concatenate((bbox1, bbox2), axis=0)
+
+    return mix_img, mix_bbox
+
+
+def bbox_crop(bbox, crop_box=None, allow_outside_center=True):
+    """Crop bounding boxes according to slice area.
+    This method is mainly used with image cropping to ensure bonding boxes fit
+    within the cropped image.
+    Parameters
+    ----------
+    bbox : numpy.ndarray
+        Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes.
+        The second axis represents attributes of the bounding box.
+        Specifically, these are :math:`(x_{min}, y_{min}, x_{max}, y_{max})`,
+        we allow additional attributes other than coordinates, which stay intact
+        during bounding box transformations.
+    crop_box : tuple
+        Tuple of length 4. :math:`(x_{min}, y_{min}, width, height)`
+    allow_outside_center : bool
+        If `False`, remove bounding boxes which have centers outside cropping area.
+    Returns
+    -------
+    numpy.ndarray
+        Cropped bounding boxes with shape (M, 4+) where M <= N.
+    """
+    bbox = bbox.copy()
+    if crop_box is None:
+        return bbox
+    if not len(crop_box) == 4:
+        raise ValueError(
+            "Invalid crop_box parameter, requires length 4, given {}".format(str(crop_box)))
+    if sum([int(c is None) for c in crop_box]) == 4:
+        return bbox
+
+    l, t, w, h = crop_box
+
+    left = l if l else 0
+    top = t if t else 0
+    right = left + (w if w else np.inf)
+    bottom = top + (h if h else np.inf)
+    crop_bbox = np.array((left, top, right, bottom))
+
+    if allow_outside_center:
+        mask = np.ones(bbox.shape[0], dtype=bool)
+    else:
+        centers = (bbox[:, :2] + bbox[:, 2:4]) / 2
+        mask = np.logical_and(crop_bbox[:2] <= centers, centers < crop_bbox[2:]).all(axis=1)
+
+    # transform borders
+    bbox[:, :2] = np.maximum(bbox[:, :2], crop_bbox[:2])
+    bbox[:, 2:4] = np.minimum(bbox[:, 2:4], crop_bbox[2:4])
+    bbox[:, :2] -= crop_bbox[:2]
+    bbox[:, 2:4] -= crop_bbox[:2]
+
+    mask = np.logical_and(mask, (bbox[:, :2] < bbox[:, 2:4]).all(axis=1))
+    bbox = bbox[mask]
+    return bbox
+
+def bbox_iou(bbox_a, bbox_b, offset=0):
+    """Calculate Intersection-Over-Union(IOU) of two bounding boxes.
+    Parameters
+    ----------
+    bbox_a : numpy.ndarray
+        An ndarray with shape :math:`(N, 4)`.
+    bbox_b : numpy.ndarray
+        An ndarray with shape :math:`(M, 4)`.
+    offset : float or int, default is 0
+        The ``offset`` is used to control the whether the width(or height) is computed as
+        (right - left + ``offset``).
+        Note that the offset must be 0 for normalized bboxes, whose ranges are in ``[0, 1]``.
+    Returns
+    -------
+    numpy.ndarray
+        An ndarray with shape :math:`(N, M)` indicates IOU between each pairs of
+        bounding boxes in `bbox_a` and `bbox_b`.
+    """
+    if bbox_a.shape[1] < 4 or bbox_b.shape[1] < 4:
+        raise IndexError("Bounding boxes axis 1 must have at least length 4")
+
+    tl = np.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
+    br = np.minimum(bbox_a[:, None, 2:4], bbox_b[:, 2:4])
+
+    area_i = np.prod(br - tl + offset, axis=2) * (tl < br).all(axis=2)
+    area_a = np.prod(bbox_a[:, 2:4] - bbox_a[:, :2] + offset, axis=1)
+    area_b = np.prod(bbox_b[:, 2:4] - bbox_b[:, :2] + offset, axis=1)
+    return area_i / (area_a[:, None] + area_b - area_i)
+
+
+def random_crop_with_constraints(bbox, size, min_scale=0.25, max_scale=1,
+                                 max_aspect_ratio=2, constraints=None,
+                                 max_trial=10):
+    """Crop an image randomly with bounding box constraints.
+    This data augmentation is used in training of
+    Single Shot Multibox Detector [#]_. More details can be found in
+    data augmentation section of the original paper.
+    .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
+       Scott Reed, Cheng-Yang Fu, Alexander C. Berg.
+       SSD: Single Shot MultiBox Detector. ECCV 2016.
+    Parameters
+    ----------
+    bbox : numpy.ndarray
+        Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes.
+        The second axis represents attributes of the bounding box.
+        Specifically, these are :math:`(x_{min}, y_{min}, x_{max}, y_{max})`,
+        we allow additional attributes other than coordinates, which stay intact
+        during bounding box transformations.
+    size : tuple
+        Tuple of length 2 of image shape as (width, height).
+    min_scale : float
+        The minimum ratio between a cropped region and the original image.
+        The default value is :obj:`0.3`.
+    max_scale : float
+        The maximum ratio between a cropped region and the original image.
+        The default value is :obj:`1`.
+    max_aspect_ratio : float
+        The maximum aspect ratio of cropped region.
+        The default value is :obj:`2`.
+    constraints : iterable of tuples
+        An iterable of constraints.
+        Each constraint should be :obj:`(min_iou, max_iou)` format.
+        If means no constraint if set :obj:`min_iou` or :obj:`max_iou` to :obj:`None`.
+        If this argument defaults to :obj:`None`, :obj:`((0.1, None), (0.3, None),
+        (0.5, None), (0.7, None), (0.9, None), (None, 1))` will be used.
+    max_trial : int, default 40
+        Maximum number of trials for each constraint before exit no matter what.
+    Returns
+    -------
+    numpy.ndarray
+        Cropped bounding boxes with shape :obj:`(M, 4+)` where M <= N.
+    tuple
+        Tuple of length 4 as (x_offset, y_offset, new_width, new_height).
+    """
+    # default params in paper
+    if constraints is None:
+        constraints = (
+            # (0.1, None),
+            (0.3, None),
+            (0.5, None),
+            (0.7, None),
+            (0.9, None),
+            (None, 1),
+        )
+
+    w, h = size
+
+    candidates = [(0, 0, w, h)]
+    for min_iou, max_iou in constraints:
+        min_iou = -np.inf if min_iou is None else min_iou
+        max_iou = np.inf if max_iou is None else max_iou
+
+        for _ in range(max_trial):
+            scale = random.uniform(min_scale, max_scale)
+            aspect_ratio = random.uniform(
+                max(1 / max_aspect_ratio, scale * scale),
+                min(max_aspect_ratio, 1 / (scale * scale)))
+            crop_h = int(h * scale / np.sqrt(aspect_ratio))
+            crop_w = int(w * scale * np.sqrt(aspect_ratio))
+
+            crop_t = random.randrange(h - crop_h)
+            crop_l = random.randrange(w - crop_w)
+            crop_bb = np.array((crop_l, crop_t, crop_l + crop_w, crop_t + crop_h))
+
+            if len(bbox) == 0:
+                top, bottom = crop_t, crop_t + crop_h
+                left, right = crop_l, crop_l + crop_w
+                return bbox, (left, top, right-left, bottom-top)
+
+            iou = bbox_iou(bbox, crop_bb[np.newaxis])
+            if min_iou <= iou.min() and iou.max() <= max_iou:
+                top, bottom = crop_t, crop_t + crop_h
+                left, right = crop_l, crop_l + crop_w
+                candidates.append((left, top, right-left, bottom-top))
+                break
+
+    # random select one
+    while candidates:
+        crop = candidates.pop(np.random.randint(0, len(candidates)))
+        new_bbox = bbox_crop(bbox, crop, allow_outside_center=False)
+        if new_bbox.size < 1:
+            continue
+        new_crop = (crop[0], crop[1], crop[2], crop[3])
+        return new_bbox, new_crop
+    return bbox, (0, 0, w, h)
+
+def _rand(a=0., b=1.):
+    return np.random.rand() * (b - a) + a
+def random_color_distort(image_data, _hue=0.1, _sat=1.5, _val=1.5):
+    _hue = _rand(-_hue, _hue)
+    _sat = _rand(1, _sat) if _rand() < .5 else 1 / _rand(1, _sat)
+    _val = _rand(1, _val) if _rand() < .5 else 1 / _rand(1, _val)
+    x = rgb_to_hsv(image_data)
+    x[..., 0] += _hue
+    x[..., 0][x[..., 0] > 1] -= 1
+    x[..., 0][x[..., 0] < 0] += 1
+    x[..., 1] *= _sat
+    x[..., 2] *= _val
+    x[x > 1] = 1
+    x[x < 0] = 0
+    image_data = hsv_to_rgb(x)
+    image_data = image_data.astype(np.float32)
+    return image_data
+
+
+def random_color_distort_1(img, bgain=16, hgain=0.0138, sgain=0.678, vgain=0.36):
+    # brightness_delta = int(np.random.uniform(-bgain, bgain))
+    # img = np.clip(img + brightness_delta , 0, 255)
+    # img = img.astype(np.uint8)
+
+    r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1  # random gains
+    hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))
+    dtype = img.dtype  # uint8
+
+    x = np.arange(0, 256, dtype=np.int16)
+    lut_hue = ((x * r[0]) % 180).astype(dtype)
+    lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
+    lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
+
+    img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))).astype(dtype)
+    img = cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR)  # no return needed
+
+    return img
+
+
+
+def random_color_distort_raw(img, brightness_delta=16, hue_vari=0.01, sat_vari=0.15, val_vari=0.15, p=0.2):
+    '''
+    randomly distort image color. Adjust brightness, hue, saturation, value.
+    param:
+        img: a BGR uint8 format OpenCV image. HWC format.
+    '''
+
+    def random_hue(img_hsv, hue_vari, p=p):
+        if np.random.uniform(0, 1) > p:
+            hue_delta = np.random.randint(-hue_vari, hue_vari)
+            img_hsv[:, :, 0] = (img_hsv[:, :, 0] + hue_delta) % 180
+        return img_hsv
+
+    def random_saturation(img_hsv, sat_vari, p=p):
+        if np.random.uniform(0, 1) > p:
+            sat_mult = 1 + np.random.uniform(-sat_vari, sat_vari)
+            img_hsv[:, :, 1] *= sat_mult
+        return img_hsv
+
+    def random_value(img_hsv, val_vari, p=p):
+        if np.random.uniform(0, 1) > p:
+            val_mult = 1 + np.random.uniform(-val_vari, val_vari)
+            img_hsv[:, :, 2] *= val_mult
+        return img_hsv
+
+    def random_brightness(img, brightness_delta, p=p):
+        if np.random.uniform(0, 1) > p:
+            img = img.astype(np.float32)
+            brightness_delta = int(np.random.uniform(-brightness_delta, brightness_delta))
+            img = img + brightness_delta
+        return np.clip(img, 0, 255)
+
+    # brightness
+    img = random_brightness(img, brightness_delta)
+    img = img.astype(np.uint8)
+
+    # color jitter
+    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV).astype(np.float32)
+
+    if np.random.randint(0, 2):
+        img_hsv = random_value(img_hsv, val_vari)
+        img_hsv = random_saturation(img_hsv, sat_vari)
+        img_hsv = random_hue(img_hsv, hue_vari)
+    else:
+        img_hsv = random_saturation(img_hsv, sat_vari)
+        img_hsv = random_hue(img_hsv, hue_vari)
+        img_hsv = random_value(img_hsv, val_vari)
+
+    img_hsv = np.clip(img_hsv, 0, 255)
+    img = cv2.cvtColor(img_hsv.astype(np.uint8), cv2.COLOR_HSV2BGR)
+
+    return img
+
+
+def letterbox_resize(img, new_width, new_height, interp=0):
+    '''
+    Letterbox resize. keep the original aspect ratio in the resized image.
+    '''
+    ori_height, ori_width = img.shape[:2]
+
+    resize_ratio = min(new_width / ori_width, new_height / ori_height)
+
+    resize_w = int(resize_ratio * ori_width)
+    resize_h = int(resize_ratio * ori_height)
+
+    img = cv2.resize(img, (resize_w, resize_h), interpolation=interp)
+    image_padded = np.full((new_height, new_width, 3), 128, np.uint8)
+
+    dw = int((new_width - resize_w) / 2)
+    dh = int((new_height - resize_h) / 2)
+
+    image_padded[dh: resize_h + dh, dw: resize_w + dw, :] = img
+
+    return image_padded, resize_ratio, dw, dh
+
+
+def resize_with_bbox(img, bbox, new_width, new_height, interp=0, letterbox=False):
+    '''
+    Resize the image and correct the bbox accordingly.
+    '''
+
+    if letterbox:
+        image_padded, resize_ratio, dw, dh = letterbox_resize(img, new_width, new_height, interp)
+
+        # xmin, xmax
+        bbox[:, [0, 2]] = bbox[:, [0, 2]] * resize_ratio + dw
+        # ymin, ymax
+        bbox[:, [1, 3]] = bbox[:, [1, 3]] * resize_ratio + dh
+
+        return image_padded, bbox
+    else:
+        ori_height, ori_width = img.shape[:2]
+
+        img = cv2.resize(img, (new_width, new_height), interpolation=interp)
+
+        # xmin, xmax
+        bbox[:, [0, 2]] = bbox[:, [0, 2]] / ori_width * new_width
+        # ymin, ymax
+        bbox[:, [1, 3]] = bbox[:, [1, 3]] / ori_height * new_height
+
+        return img, bbox
+
+
+def random_flip(img, bbox, px=0, py=0):
+    '''
+    Randomly flip the image and correct the bbox.
+    param:
+    px:
+        the probability of horizontal flip
+    py:
+        the probability of vertical flip
+    '''
+    height, width = img.shape[:2]
+    if np.random.uniform(0, 1) < px:
+        img = cv2.flip(img, 1)
+        xmax = width - bbox[:, 0]
+        xmin = width - bbox[:, 2]
+        bbox[:, 0] = xmin
+        bbox[:, 2] = xmax
+
+    if np.random.uniform(0, 1) < py:
+        img = cv2.flip(img, 0)
+        ymax = height - bbox[:, 1]
+        ymin = height - bbox[:, 3]
+        bbox[:, 1] = ymin
+        bbox[:, 3] = ymax
+    return img, bbox
+
+def random_resize(img, bbox, min_ratio=0.25, max_ratio=2, jitter=0.3):
+    '''
+    Random expand original image with borders, this is identical to placing
+    the original image on a larger canvas.
+    param:
+    max_ratio :
+        Maximum ratio of the output image on both direction(vertical and horizontal)
+    fill :
+        The value(s) for padded borders.
+    keep_ratio : bool
+        If `True`, will keep output image the same aspect ratio as input.
+    '''
+    h,w,c = img.shape
+    max_ratio_limited = 608 / max(h,w)
+    scale = random.uniform(min_ratio, max_ratio)
+    scale = min(max_ratio_limited, scale)
+
+    w_ratio = random.uniform(1 - jitter, 1 + jitter) * scale
+    h_ratio = random.uniform(1 - jitter, 1 + jitter) * scale
+
+    dst = cv2.resize(img, None, fx=w_ratio, fy=h_ratio)
+
+    # correct bbox
+    bbox[:, 0] *= w_ratio
+    bbox[:, 2] *= w_ratio
+    bbox[:, 1] *= h_ratio
+    bbox[:, 3] *= h_ratio
+
+    return dst, bbox
+
+
+def random_expand(img, bbox, max_ratio=2, fill=0, keep_ratio=True):
+    '''
+    Random expand original image with borders, this is identical to placing
+    the original image on a larger canvas.
+    param:
+    max_ratio :
+        Maximum ratio of the output image on both direction(vertical and horizontal)
+    fill :
+        The value(s) for padded borders.
+    keep_ratio : bool
+        If `True`, will keep output image the same aspect ratio as input.
+    '''
+    h, w, c = img.shape
+    ratio_x = random.uniform(1, max_ratio)
+    if keep_ratio:
+        ratio_y = ratio_x
+    else:
+        ratio_y = random.uniform(1, max_ratio)
+
+    oh, ow = int(h * ratio_y), int(w * ratio_x)
+    off_y = random.randint(0, oh - h)
+    off_x = random.randint(0, ow - w)
+
+    dst = np.full(shape=(oh, ow, c), fill_value=fill, dtype=img.dtype)
+
+    dst[off_y:off_y + h, off_x:off_x + w, :] = img
+
+    # correct bbox
+    bbox[:, :2] += (off_x, off_y)
+    bbox[:, 2:4] += (off_x, off_y)
+
+    return dst, bbox
@@ -0,0 +1,294 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import numpy as np
+import cv2
+import sys
+from utils.data_aug import *
+import random
+import tensorflow as tf
+
+PY_VERSION = sys.version_info[0]
+iter_cnt = 0
+IterControl = 50
+
+def color_jitter(image, brightness=0, contrast=0, saturation=0, hue=0):
+  """Distorts the color of the image.
+
+  Args:
+    image: The input image tensor.
+    brightness: A float, specifying the brightness for color jitter.
+    contrast: A float, specifying the contrast for color jitter.
+    saturation: A float, specifying the saturation for color jitter.
+    hue: A float, specifying the hue for color jitter.
+
+  Returns:
+    The distorted image tensor.
+  """
+  with tf.name_scope('distort_color'):
+    if brightness > 0:
+      image = tf.image.random_brightness(image, max_delta=brightness)
+    if contrast > 0:
+      image = tf.image.random_contrast(
+          image, lower=1-contrast, upper=1+contrast)
+    if saturation > 0:
+      image = tf.image.random_saturation(
+          image, lower=1-saturation, upper=1+saturation)
+    if hue > 0:
+      image = tf.image.random_hue(image, max_delta=hue)
+    return image
+
+def parse_line(line):
+    '''
+    Given a line from the training/test txt file, return parsed info.
+    line format: line_index, img_path, img_width, img_height, [box_info_1 (5 number)], ...
+    return:
+        line_idx: int32
+        pic_path: string.
+        boxes: shape [N, 4], N is the ground truth count, elements in the second
+            dimension are [x_min, y_min, x_max, y_max]
+        labels: shape [N]. class index.
+        img_width: int.
+        img_height: int
+    '''
+    if 'str' not in str(type(line)):
+        line = line.decode()
+    s = line.strip().split(' ')
+    assert len(
+        s) > 8, 'Annotation error! Please check your annotation file. Make sure there is at least one target object in each image.'
+    # line_idx = int(s[0])
+    pic_path = s[1]
+    img_width = int(s[2])
+    img_height = int(s[3])
+    s = s[4:]
+    assert len(
+        s) % 5 == 0, 'Annotation error! Please check your annotation file. Maybe partially missing some coordinates?'
+    box_cnt = len(s) // 5
+    boxes = []
+    labels = []
+    for i in range(box_cnt):
+        label, x_min, y_min, x_max, y_max = int(s[i * 5]), float(s[i * 5 + 1]), float(s[i * 5 + 2]), float(
+            s[i * 5 + 3]), float(s[i * 5 + 4])
+        boxes.append([x_min, y_min, x_max, y_max])
+        labels.append(label)
+    boxes = np.asarray(boxes, np.float32)
+    labels = np.asarray(labels, np.int32)
+    return pic_path, boxes, labels, img_width, img_height
+
+
+def process_box(boxes, labels, img_size, class_num, anchors):
+    '''
+    Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales.
+    params:
+        boxes: [N, 5] shape, float32 dtype. `x_min, y_min, x_max, y_mix, mixup_weight`.
+        labels: [N] shape, int32 dtype.
+        class_num: int32 num.
+        anchors: [9, 4] shape, float32 dtype.
+    '''
+    anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+
+    # boxes = np.random.shuffle()
+    # convert boxes form:
+    # shape: [N, 2]
+    # (x_center, y_center)
+    box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2
+    # (width, height)
+    box_sizes = boxes[:, 2:4] - boxes[:, 0:2]
+
+    # [13, 13, 3, 5+num_class+1] `5` means coords and labels. `1` means mix up weight. 
+    y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 6 + class_num), np.float32)
+    y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 6 + class_num), np.float32)
+    y_true_52 = np.zeros((img_size[1] // 8, img_size[0] // 8, 3, 6 + class_num), np.float32)
+
+    gt_box_13 = np.zeros((1, 32, 4), np.float32)
+    gt_box_26 = np.zeros((1, 64, 4), np.float32)
+    gt_box_52 = np.zeros((1, 128, 4), np.float32)
+    gt_box_list = [gt_box_13, gt_box_26, gt_box_52]
+
+    # mix up weight default to 1.
+    y_true_13[..., -1] = 1.
+    y_true_26[..., -1] = 1.
+    y_true_52[..., -1] = 1.
+
+    y_true = [y_true_13, y_true_26, y_true_52]
+
+    # [N, 1, 2]
+    box_sizes = np.expand_dims(box_sizes, 1)
+    # broadcast tricks
+    # [N, 1, 2] & [9, 2] ==> [N, 9, 2]
+    mins = np.maximum(- box_sizes / 2, - anchors / 2)
+    maxs = np.minimum(box_sizes / 2, anchors / 2)
+    # [N, 9, 2]
+    whs = maxs - mins
+
+    # [N, 9]
+    iou = (whs[:, :, 0] * whs[:, :, 1]) / (
+            box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :,
+                                                                                                     1] + 1e-10)
+    # [N]
+    best_match_idx = np.argmax(iou, axis=1)
+
+    ratio_dict = {1.: 8., 2.: 16., 3.: 32.}
+    index_dict = {0: 0, 1: 0, 2: 0}
+    for i, idx in enumerate(best_match_idx):
+        # idx: 0,1,2 ==> 2; 3,4,5 ==> 1; 6,7,8 ==> 0
+        feature_map_group = 2 - idx // 3
+        # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32
+        ratio = ratio_dict[np.ceil((idx + 1) / 3.)]
+        x = int(np.floor(box_centers[i, 0] / ratio))
+        y = int(np.floor(box_centers[i, 1] / ratio))
+        k = anchors_mask[feature_map_group].index(idx)
+        c = labels[i]
+        # print(feature_map_group, '|', y,x,k,c)
+
+        y_true[feature_map_group][y, x, k, :2] = box_centers[i]
+        y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i]
+        y_true[feature_map_group][y, x, k, 4] = 1.
+        y_true[feature_map_group][y, x, k, 5 + c] = 1.
+        y_true[feature_map_group][y, x, k, -1] = boxes[i, -1]
+
+        if index_dict[feature_map_group] < gt_box_list[feature_map_group].shape[1]:
+            gt_box_list[feature_map_group][0, index_dict[feature_map_group], :2] = box_centers[i]
+            gt_box_list[feature_map_group][0, index_dict[feature_map_group], 2:4] = box_sizes[i]
+            index_dict[feature_map_group] += 1
+
+    return y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52
+
+
+def parse_data(line, class_num, img_size, anchors, mode, letterbox_resize, multi_scale):
+    '''
+    param:
+        line: a line from the training/test txt file
+        class_num: totol class nums.
+        img_size: the size of image to be resized to. [width, height] format.
+        anchors: anchors.
+        mode: 'train' or 'val'. When set to 'train', data_augmentation will be applied.
+        letterbox_resize: whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+    '''
+    if not isinstance(line, list):
+        print('###################### line')
+        pic_path, boxes, labels, _, _ = parse_line(line)
+        img = cv2.imread(pic_path)
+        # expand the 2nd dimension, mix up weight default to 1.
+        boxes = np.concatenate((boxes, np.full(shape=(boxes.shape[0], 1), fill_value=1., dtype=np.float32)), axis=-1)
+    else:
+        print('###################### mixup')
+        # the mix up case
+        pic_path1, boxes1, labels1, _, _ = parse_line(line[0])
+        img1 = cv2.imread(pic_path1)
+        pic_path2, boxes2, labels2, _, _ = parse_line(line[1])
+        img2 = cv2.imread(pic_path2)
+
+        img, boxes = mix_up(img1, img2, boxes1, boxes2)
+        labels = np.concatenate((labels1, labels2))
+
+    if mode == 'train':
+        img, boxes = random_resize(img, boxes, min_ratio=0.25, max_ratio=2, jitter=0.3)
+
+        # random expansion with prob 0.5
+        if np.random.uniform(0, 1) > 0.5:
+            img, boxes = random_expand(img, boxes, max_ratio=3, fill=128, keep_ratio=False)
+
+        # random cropping
+        h, w, _ = img.shape
+        boxes, crop = random_crop_with_constraints(boxes, (w, h))
+        x0, y0, w, h = crop
+        img = img[y0: y0 + h, x0: x0 + w]
+
+        # resize with random interpolation
+        h, w, _ = img.shape
+        interp = np.random.randint(0, 5)
+        img, boxes = resize_with_bbox(img, boxes, img_size[0], img_size[1], interp=interp, letterbox=letterbox_resize)
+
+        # random horizontal flip
+        h, w, _ = img.shape
+        img, boxes = random_flip(img, boxes, px=0.5)
+
+    else:
+        img, boxes = resize_with_bbox(img, boxes, img_size[0], img_size[1], interp=1, letterbox=letterbox_resize)
+
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)
+
+    # the input of yolo_v3 should be in range 0~1
+    img = img / 255.
+
+    if mode == 'train' and iter_cnt >= IterControl and multi_scale:
+        cav = np.zeros((608, 608, 3), dtype=np.float32) + 0.5
+        true_h, true_w, c = img.shape
+        cav[:true_h, :true_w, :] = img
+        img = cav.astype(np.float32)
+        img_size = [608, 608]
+
+    y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52 = process_box(boxes, labels, img_size, class_num,
+                                                                                   anchors)
+
+    return img, y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52
+
+
+def get_batch_data(batch_line, class_num, img_size, anchors, mode, multi_scale=False, mix_up=False,
+                   letterbox_resize=True, interval=10):
+    '''
+    generate a batch of imgs and labels
+    param:
+        batch_line: a batch of lines from train/val.txt files
+        class_num: num of total classes.
+        img_size: the image size to be resized to. format: [width, height].
+        anchors: anchors. shape: [9, 2].
+        mode: 'train' or 'val'. if set to 'train', data augmentation will be applied.
+        multi_scale: whether to use multi_scale training, img_size varies from [320, 320] to [640, 640] by default. Note that it will take effect only when mode is set to 'train'.
+        letterbox_resize: whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
+        interval: change the scale of image every interval batches. Note that it's indeterministic because of the multi threading.
+    '''
+    if isinstance(mode, bytes):
+        mode = mode.decode()
+
+    global iter_cnt
+    # multi_scale training
+    if multi_scale and mode == 'train' and iter_cnt >= IterControl:
+        random.seed(iter_cnt // interval)
+        random_img_size = [[x * 32, x * 32] for x in range(10, 20)]
+        img_size = random.sample(random_img_size, 1)[0]
+        print('multi_scale iter: %d, img_size: %d,%d' % (iter_cnt, img_size[0], img_size[1]))
+    else:
+        print('single_scale iter: %d, img_size: %d,%d' % (iter_cnt, img_size[0], img_size[1]))
+    iter_cnt += 1
+
+    img_idx_batch, img_batch, y_true_13_batch, y_true_26_batch, y_true_52_batch = [], [], [], [], []
+    gt_box_13_batch, gt_box_26_batch, gt_box_52_batch = [], [], []
+
+    # mix up strategy
+    if mix_up and mode == 'train':
+        mix_lines = []
+        batch_line = batch_line.tolist()
+        for idx, line in enumerate(batch_line):
+            if np.random.uniform(0, 1) < 0.5:
+                mix_lines.append([line, random.sample(batch_line[:idx] + batch_line[idx + 1:], 1)[0]])
+            else:
+                mix_lines.append(line)
+        batch_line = mix_lines
+
+    for line in batch_line:
+        img, y_true_13, y_true_26, y_true_52, gt_box_13, gt_box_26, gt_box_52 = parse_data(line, class_num,
+                                                                                           img_size, anchors,
+                                                                                           mode,
+                                                                                           letterbox_resize,
+                                                                                           multi_scale)
+
+        img_batch.append(img)
+        y_true_13_batch.append(y_true_13)
+        y_true_26_batch.append(y_true_26)
+        y_true_52_batch.append(y_true_52)
+        gt_box_13_batch.append(gt_box_13)
+        gt_box_26_batch.append(gt_box_26)
+        gt_box_52_batch.append(gt_box_52)
+
+    img_batch, y_true_13_batch, y_true_26_batch, y_true_52_batch = np.asarray(img_batch, np.float32), np.asarray(
+        y_true_13_batch, np.float32), np.asarray(y_true_26_batch, np.float32), np.asarray(y_true_52_batch, np.float32)
+
+    gt_box_13_batch, gt_box_26_batch, gt_box_52_batch = \
+        np.asarray(gt_box_13_batch), np.asarray(gt_box_26_batch), np.asarray(gt_box_52_batch)
+
+    return img_batch, y_true_13_batch, y_true_26_batch, y_true_52_batch, \
+           gt_box_13_batch, gt_box_26_batch, gt_box_52_batch
+
@@ -0,0 +1,423 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import numpy as np
+import cv2
+from collections import Counter
+
+from utils.nms_utils import cpu_nms, gpu_nms
+from utils.data_utils import parse_line
+
+
+def calc_iou(pred_boxes, true_boxes):
+    '''
+    Maintain an efficient way to calculate the ios matrix using the numpy broadcast tricks.
+    shape_info: pred_boxes: [N, 4]
+                true_boxes: [V, 4]
+    return: IoU matrix: shape: [N, V]
+    '''
+
+    # [N, 1, 4]
+    pred_boxes = np.expand_dims(pred_boxes, -2)
+    # [1, V, 4]
+    true_boxes = np.expand_dims(true_boxes, 0)
+
+    # [N, 1, 2] & [1, V, 2] ==> [N, V, 2]
+    intersect_mins = np.maximum(pred_boxes[..., :2], true_boxes[..., :2])
+    intersect_maxs = np.minimum(pred_boxes[..., 2:], true_boxes[..., 2:])
+    intersect_wh = np.maximum(intersect_maxs - intersect_mins, 0.)
+
+    # shape: [N, V]
+    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
+    # shape: [N, 1, 2]
+    pred_box_wh = pred_boxes[..., 2:] - pred_boxes[..., :2]
+    # shape: [N, 1]
+    pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1]
+    # [1, V, 2]
+    true_boxes_wh = true_boxes[..., 2:] - true_boxes[..., :2]
+    # [1, V]
+    true_boxes_area = true_boxes_wh[..., 0] * true_boxes_wh[..., 1]
+
+    # shape: [N, V]
+    iou = intersect_area / (pred_box_area + true_boxes_area - intersect_area + 1e-10)
+
+    return iou
+
+
+def evaluate_on_cpu(y_pred, y_true, num_classes, calc_now=True, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
+    '''
+    Given y_pred and y_true of a batch of data, get the recall and precision of the current batch.
+    '''
+
+    num_images = y_true[0].shape[0]
+    true_labels_dict = {i: 0 for i in range(num_classes)}  # {class: count}
+    pred_labels_dict = {i: 0 for i in range(num_classes)}
+    true_positive_dict = {i: 0 for i in range(num_classes)}
+
+    for i in range(num_images):
+        true_labels_list, true_boxes_list = [], []
+        for j in range(3):  # three feature maps
+            # shape: [13, 13, 3, 80]
+            true_probs_temp = y_true[j][i][..., 5:-1]
+            # shape: [13, 13, 3, 4] (x_center, y_center, w, h)
+            true_boxes_temp = y_true[j][i][..., 0:4]
+
+            # [13, 13, 3]
+            object_mask = true_probs_temp.sum(axis=-1) > 0
+
+            # [V, 3] V: Ground truth number of the current image
+            true_probs_temp = true_probs_temp[object_mask]
+            # [V, 4]
+            true_boxes_temp = true_boxes_temp[object_mask]
+
+            # [V], labels
+            true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist()
+            # [V, 4] (x_center, y_center, w, h)
+            true_boxes_list += true_boxes_temp.tolist()
+
+        if len(true_labels_list) != 0:
+            for cls, count in Counter(true_labels_list).items():
+                true_labels_dict[cls] += count
+
+        # [V, 4] (xmin, ymin, xmax, ymax)
+        true_boxes = np.array(true_boxes_list)
+        box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4]
+        true_boxes[:, 0:2] = box_centers - box_sizes / 2.
+        true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes
+
+        # [1, xxx, 4]
+        pred_boxes = y_pred[0][i:i + 1]
+        pred_confs = y_pred[1][i:i + 1]
+        pred_probs = y_pred[2][i:i + 1]
+
+        # pred_boxes: [N, 4]
+        # pred_confs: [N]
+        # pred_labels: [N]
+        # N: Detected box number of the current image
+        pred_boxes, pred_confs, pred_labels = cpu_nms(pred_boxes, pred_confs * pred_probs, num_classes,
+                                                      max_boxes=max_boxes, score_thresh=score_thresh, iou_thresh=iou_thresh)
+
+        # len: N
+        pred_labels_list = [] if pred_labels is None else pred_labels.tolist()
+        if pred_labels_list == []:
+            continue
+
+        # calc iou
+        # [N, V]
+        iou_matrix = calc_iou(pred_boxes, true_boxes)
+        # [N]
+        max_iou_idx = np.argmax(iou_matrix, axis=-1)
+
+        correct_idx = []
+        correct_conf = []
+        for k in range(max_iou_idx.shape[0]):
+            pred_labels_dict[pred_labels_list[k]] += 1
+            match_idx = max_iou_idx[k]  # V level
+            if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]:
+                if match_idx not in correct_idx:
+                    correct_idx.append(match_idx)
+                    correct_conf.append(pred_confs[k])
+                else:
+                    same_idx = correct_idx.index(match_idx)
+                    if pred_confs[k] > correct_conf[same_idx]:
+                        correct_idx.pop(same_idx)
+                        correct_conf.pop(same_idx)
+                        correct_idx.append(match_idx)
+                        correct_conf.append(pred_confs[k])
+
+        for t in correct_idx:
+            true_positive_dict[true_labels_list[t]] += 1
+
+    if calc_now:
+        # avoid divided by 0
+        recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6)
+        precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6)
+
+        return recall, precision
+    else:
+        return true_positive_dict, true_labels_dict, pred_labels_dict
+
+
+def evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, y_pred, y_true, num_classes, iou_thresh=0.5, calc_now=True):
+    '''
+    Given y_pred and y_true of a batch of data, get the recall and precision of the current batch.
+    This function will perform gpu operation on the GPU.
+    '''
+
+    num_images = y_true[0].shape[0]
+    true_labels_dict = {i: 0 for i in range(num_classes)}  # {class: count}
+    pred_labels_dict = {i: 0 for i in range(num_classes)}
+    true_positive_dict = {i: 0 for i in range(num_classes)}
+
+    for i in range(num_images):
+        true_labels_list, true_boxes_list = [], []
+        for j in range(3):  # three feature maps
+            # shape: [13, 13, 3, 80]
+            true_probs_temp = y_true[j][i][..., 5:-1]
+            # shape: [13, 13, 3, 4] (x_center, y_center, w, h)
+            true_boxes_temp = y_true[j][i][..., 0:4]
+
+            # [13, 13, 3]
+            object_mask = true_probs_temp.sum(axis=-1) > 0
+
+            # [V, 80] V: Ground truth number of the current image
+            true_probs_temp = true_probs_temp[object_mask]
+            # [V, 4]
+            true_boxes_temp = true_boxes_temp[object_mask]
+
+            # [V], labels, each from 0 to 79
+            true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist()
+            # [V, 4] (x_center, y_center, w, h)
+            true_boxes_list += true_boxes_temp.tolist()
+
+        if len(true_labels_list) != 0:
+            for cls, count in Counter(true_labels_list).items():
+                true_labels_dict[cls] += count
+
+        # [V, 4] (xmin, ymin, xmax, ymax)
+        true_boxes = np.array(true_boxes_list)
+        box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4]
+        true_boxes[:, 0:2] = box_centers - box_sizes / 2.
+        true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes
+
+        # [1, xxx, 4]
+        pred_boxes = y_pred[0][i:i + 1]
+        pred_confs = y_pred[1][i:i + 1]
+        pred_probs = y_pred[2][i:i + 1]
+
+        # pred_boxes: [N, 4]
+        # pred_confs: [N]
+        # pred_labels: [N]
+        # N: Detected box number of the current image
+        pred_boxes, pred_confs, pred_labels = sess.run(gpu_nms_op,
+                                                       feed_dict={pred_boxes_flag: pred_boxes,
+                                                                  pred_scores_flag: pred_confs * pred_probs})
+        # len: N
+        pred_labels_list = [] if pred_labels is None else pred_labels.tolist()
+        if pred_labels_list == []:
+            continue
+
+        # calc iou
+        # [N, V]
+        iou_matrix = calc_iou(pred_boxes, true_boxes)
+        # [N]
+        max_iou_idx = np.argmax(iou_matrix, axis=-1)
+
+        correct_idx = []
+        correct_conf = []
+        for k in range(max_iou_idx.shape[0]):
+            pred_labels_dict[pred_labels_list[k]] += 1
+            match_idx = max_iou_idx[k]  # V level
+            if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]:
+                if match_idx not in correct_idx:
+                    correct_idx.append(match_idx)
+                    correct_conf.append(pred_confs[k])
+                else:
+                    same_idx = correct_idx.index(match_idx)
+                    if pred_confs[k] > correct_conf[same_idx]:
+                        correct_idx.pop(same_idx)
+                        correct_conf.pop(same_idx)
+                        correct_idx.append(match_idx)
+                        correct_conf.append(pred_confs[k])
+
+        for t in correct_idx:
+            true_positive_dict[true_labels_list[t]] += 1
+
+    if calc_now:
+        # avoid divided by 0
+        recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6)
+        precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6)
+
+        return recall, precision
+    else:
+        return true_positive_dict, true_labels_dict, pred_labels_dict
+
+
+def get_preds_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, image_ids, y_pred):
+    '''
+    Given the y_pred of an input image, get the predicted bbox and label info.
+    return:
+        pred_content: 2d list.
+    '''
+    image_id = image_ids[0]
+
+    # keep the first dimension 1
+    pred_boxes = y_pred[0][0:1]
+    pred_confs = y_pred[1][0:1]
+    pred_probs = y_pred[2][0:1]
+
+    boxes, scores, labels = sess.run(gpu_nms_op,
+                                     feed_dict={pred_boxes_flag: pred_boxes,
+                                                pred_scores_flag: pred_confs * pred_probs})
+
+    pred_content = []
+    for i in range(len(labels)):
+        x_min, y_min, x_max, y_max = boxes[i]
+        score = scores[i]
+        label = labels[i]
+        pred_content.append([image_id, x_min, y_min, x_max, y_max, score, label])
+
+    return pred_content
+
+
+gt_dict = {}  # key: img_id, value: gt object list
+def parse_gt_rec(gt_filename, target_img_size, letterbox_resize=True):
+    '''
+    parse and re-organize the gt info.
+    return:
+        gt_dict: dict. Each key is a img_id, the value is the gt bboxes in the corresponding img.
+    '''
+
+    global gt_dict
+
+    if not gt_dict:
+        new_width, new_height = target_img_size
+        with open(gt_filename, 'r') as f:
+            for line in f:
+                img_id, pic_path, boxes, labels, ori_width, ori_height = parse_line(line)
+
+                objects = []
+                for i in range(len(labels)):
+                    x_min, y_min, x_max, y_max = boxes[i]
+                    label = labels[i]
+
+                    if letterbox_resize:
+                        resize_ratio = min(new_width / ori_width, new_height / ori_height)
+
+                        resize_w = int(resize_ratio * ori_width)
+                        resize_h = int(resize_ratio * ori_height)
+
+                        dw = int((new_width - resize_w) / 2)
+                        dh = int((new_height - resize_h) / 2)
+
+                        objects.append([x_min * resize_ratio + dw,
+                                        y_min * resize_ratio + dh,
+                                        x_max * resize_ratio + dw,
+                                        y_max * resize_ratio + dh,
+                                        label])
+                    else:
+                        objects.append([x_min * new_width / ori_width,
+                                        y_min * new_height / ori_height,
+                                        x_max * new_width / ori_width,
+                                        y_max * new_height / ori_height,
+                                        label])
+                gt_dict[img_id] = objects
+    return gt_dict
+
+
+# The following two functions are modified from FAIR's Detectron repo to calculate mAP:
+# https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/voc_eval.py
+def voc_ap(rec, prec, use_07_metric=False):
+    """Compute VOC AP given precision and recall. If use_07_metric is true, uses
+    the VOC 07 11-point method (default:False).
+    """
+    if use_07_metric:
+        # 11 point metric
+        ap = 0.
+        for t in np.arange(0., 1.1, 0.1):
+            if np.sum(rec >= t) == 0:
+                p = 0
+            else:
+                p = np.max(prec[rec >= t])
+            ap = ap + p / 11.
+    else:
+        # correct AP calculation
+        # first append sentinel values at the end
+        mrec = np.concatenate(([0.], rec, [1.]))
+        mpre = np.concatenate(([0.], prec, [0.]))
+
+        # compute the precision envelope
+        for i in range(mpre.size - 1, 0, -1):
+            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+
+        # to calculate area under PR curve, look for points
+        # where X axis (recall) changes value
+        i = np.where(mrec[1:] != mrec[:-1])[0]
+
+        # and sum (\Delta recall) * prec
+        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+
+
+def voc_eval(gt_dict, val_preds, classidx, iou_thres=0.5, use_07_metric=False):
+    '''
+    Top level function that does the PASCAL VOC evaluation.
+    '''
+    # 1.obtain gt: extract all gt objects for this class
+    class_recs = {}
+    npos = 0
+    for img_id in gt_dict:
+        R = [obj for obj in gt_dict[img_id] if obj[-1] == classidx]
+        bbox = np.array([x[:4] for x in R])
+        det = [False] * len(R)
+        npos += len(R)
+        class_recs[img_id] = {'bbox': bbox, 'det': det}
+
+    # 2. obtain pred results
+    pred = [x for x in val_preds if x[-1] == classidx]
+    img_ids = [x[0] for x in pred]
+    confidence = np.array([x[-2] for x in pred])
+    BB = np.array([[x[1], x[2], x[3], x[4]] for x in pred])
+
+    # 3. sort by confidence
+    sorted_ind = np.argsort(-confidence)
+    try:
+        BB = BB[sorted_ind, :]
+    except:
+        print('no box, ignore')
+        return 1e-6, 1e-6, 0, 0, 0
+    img_ids = [img_ids[x] for x in sorted_ind]
+
+    # 4. mark TPs and FPs
+    nd = len(img_ids)
+    tp = np.zeros(nd)
+    fp = np.zeros(nd)
+
+    for d in range(nd):
+        # all the gt info in some image
+        R = class_recs[img_ids[d]]
+        bb = BB[d, :]
+        ovmax = -np.Inf
+        BBGT = R['bbox']
+
+        if BBGT.size > 0:
+            # calc iou
+            # intersection
+            ixmin = np.maximum(BBGT[:, 0], bb[0])
+            iymin = np.maximum(BBGT[:, 1], bb[1])
+            ixmax = np.minimum(BBGT[:, 2], bb[2])
+            iymax = np.minimum(BBGT[:, 3], bb[3])
+            iw = np.maximum(ixmax - ixmin + 1., 0.)
+            ih = np.maximum(iymax - iymin + 1., 0.)
+            inters = iw * ih
+
+            # union
+            uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + (BBGT[:, 2] - BBGT[:, 0] + 1.) * (
+                        BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
+
+            overlaps = inters / uni
+            ovmax = np.max(overlaps)
+            jmax = np.argmax(overlaps)
+
+        if ovmax > iou_thres:
+            # gt not matched yet
+            if not R['det'][jmax]:
+                tp[d] = 1.
+                R['det'][jmax] = 1
+            else:
+                fp[d] = 1.
+        else:
+            fp[d] = 1.
+
+    # compute precision recall
+    fp = np.cumsum(fp)
+    tp = np.cumsum(tp)
+    rec = tp / float(npos)
+    # avoid divide by zero in case the first detection matches a difficult
+    # ground truth
+    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
+    ap = voc_ap(rec, prec, use_07_metric)
+
+    # return rec, prec, ap
+    return npos, nd, tp[-1] / float(npos), tp[-1] / float(nd), ap
@@ -0,0 +1,89 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import numpy as np
+import tensorflow as tf
+slim = tf.contrib.slim
+
+def conv2d(inputs, filters, kernel_size, strides=1):
+    def _fixed_padding(inputs, kernel_size):
+        pad_total = kernel_size - 1
+        pad_beg = pad_total // 2
+        pad_end = pad_total - pad_beg
+
+        padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
+                                        [pad_beg, pad_end], [0, 0]], mode='CONSTANT')
+        return padded_inputs
+    if strides > 1: 
+        inputs = _fixed_padding(inputs, kernel_size)
+    inputs = slim.conv2d(inputs, filters, kernel_size, stride=strides,
+                         padding=('SAME' if strides == 1 else 'VALID'))
+    return inputs
+
+def darknet53_body(inputs):
+    def res_block(inputs, filters):
+        shortcut = inputs
+        net = conv2d(inputs, filters * 1, 1)
+        net = conv2d(net, filters * 2, 3)
+
+        net = net + shortcut
+
+        return net
+    
+    # first two conv2d layers
+    net = conv2d(inputs, 32,  3, strides=1)
+    net = conv2d(net, 64,  3, strides=2)
+
+    # res_block * 1
+    net = res_block(net, 32)
+
+    net = conv2d(net, 128, 3, strides=2)
+
+    # res_block * 2
+    for i in range(2):
+        net = res_block(net, 64)
+
+    net = conv2d(net, 256, 3, strides=2)
+
+    # res_block * 8
+    for i in range(8):
+        net = res_block(net, 128)
+
+    route_1 = net
+    net = conv2d(net, 512, 3, strides=2)
+
+    # res_block * 8
+    for i in range(8):
+        net = res_block(net, 256)
+
+    route_2 = net
+    net = conv2d(net, 1024, 3, strides=2)
+
+    # res_block * 4
+    for i in range(4):
+        net = res_block(net, 512)
+    route_3 = net
+
+    return route_1, route_2, route_3
+
+
+def yolo_block(inputs, filters):
+    net = conv2d(inputs, filters * 1, 1)
+    net = conv2d(net, filters * 2, 3)
+    net = conv2d(net, filters * 1, 1)
+    net = conv2d(net, filters * 2, 3)
+    net = conv2d(net, filters * 1, 1)
+    route = net
+    net = conv2d(net, filters * 2, 3)
+    return route, net
+
+
+def upsample_layer(inputs, out_shape):
+    new_height, new_width = out_shape[1], out_shape[2]
+    # NOTE: here height is the first
+    # TODO: Do we need to set `align_corners` as True?
+    inputs = tf.image.resize_nearest_neighbor(inputs, (new_height, new_width), name='upsampled')
+    return inputs
+
+
@@ -0,0 +1,165 @@
+# coding: utf-8
+
+import numpy as np
+import tensorflow as tf
+import random
+
+from tensorflow.core.framework import summary_pb2
+
+
+def make_summary(name, val):
+    return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)])
+
+
+class AverageMeter(object):
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.val = 0
+        self.average = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.average = self.sum / float(self.count)
+
+
+def parse_anchors(anchor_path):
+    '''
+    parse anchors.
+    returned data: shape [N, 2], dtype float32
+    '''
+    anchors = np.reshape(np.asarray(open(anchor_path, 'r').read().split(','), np.float32), [-1, 2])
+    return anchors
+
+
+def read_class_names(class_name_path):
+    names = {}
+    with open(class_name_path, 'r') as data:
+        for ID, name in enumerate(data):
+            names[ID] = name.strip('\n')
+    return names
+
+
+def shuffle_and_overwrite(file_name):
+    content = open(file_name, 'r').readlines()
+    random.shuffle(content)
+    with open(file_name, 'w') as f:
+        for line in content:
+            f.write(line)
+
+
+def update_dict(ori_dict, new_dict):
+    if not ori_dict:
+        return new_dict
+    for key in ori_dict:
+        ori_dict[key] += new_dict[key]
+    return ori_dict
+
+
+def list_add(ori_list, new_list):
+    for i in range(len(ori_list)):
+        ori_list[i] += new_list[i]
+    return ori_list
+
+
+def load_weights(var_list, weights_file):
+    """
+    Loads and converts pre-trained weights.
+    param:
+        var_list: list of network variables.
+        weights_file: name of the binary file.
+    """
+    with open(weights_file, "rb") as fp:
+        np.fromfile(fp, dtype=np.int32, count=5)
+        weights = np.fromfile(fp, dtype=np.float32)
+
+    ptr = 0
+    i = 0
+    assign_ops = []
+    try:
+        while i < len(var_list) - 1:
+            var1 = var_list[i]
+            var2 = var_list[i + 1]
+            # do something only if we process conv layer
+            if 'Conv' in var1.name.split('/')[-2]:
+                # check type of next layer
+                if 'BatchNorm' in var2.name.split('/')[-2]:
+                    # load batch norm params
+                    gamma, beta, mean, var = var_list[i + 1:i + 5]
+                    batch_norm_vars = [beta, gamma, mean, var]
+                    for var in batch_norm_vars:
+                        shape = var.shape.as_list()
+                        num_params = np.prod(shape)
+                        var_weights = weights[ptr:ptr + num_params].reshape(shape)
+                        ptr += num_params
+                        assign_ops.append(tf.assign(var, var_weights, validate_shape=True))
+                    # we move the pointer by 4, because we loaded 4 variables
+                    i += 4
+                elif 'Conv' in var2.name.split('/')[-2]:
+                    # load biases
+                    bias = var2
+                    bias_shape = bias.shape.as_list()
+                    bias_params = np.prod(bias_shape)
+                    bias_weights = weights[ptr:ptr +
+                                           bias_params].reshape(bias_shape)
+                    ptr += bias_params
+                    assign_ops.append(tf.assign(bias, bias_weights, validate_shape=True))
+                    # we loaded 1 variable
+                    i += 1
+                # we can load weights of conv layer
+                shape = var1.shape.as_list()
+                num_params = np.prod(shape)
+
+                var_weights = weights[ptr:ptr + num_params].reshape(
+                    (shape[3], shape[2], shape[0], shape[1]))
+                # remember to transpose to column-major
+                var_weights = np.transpose(var_weights, (2, 3, 1, 0))
+                ptr += num_params
+                assign_ops.append(
+                    tf.assign(var1, var_weights, validate_shape=True))
+                i += 1
+    except:
+        pass
+    return assign_ops
+
+
+def config_learning_rate(args, global_step):
+    if args.lr_type == 'exponential':
+        lr_tmp = tf.train.exponential_decay(args.learning_rate_init, global_step, args.lr_decay_freq,
+                                            args.lr_decay_factor, staircase=True, name='exponential_learning_rate')
+        return tf.maximum(lr_tmp, args.lr_lower_bound)
+    elif args.lr_type == 'cosine_decay':
+        train_steps = (args.total_epoches - float(args.use_warm_up) * args.warm_up_epoch) * args.train_batch_num
+        return args.lr_lower_bound + 0.5 * (args.learning_rate_init - args.lr_lower_bound) * \
+            (1 + tf.cos(global_step / train_steps * np.pi))
+    elif args.lr_type == 'cosine_decay_restart':
+        return tf.train.cosine_decay_restarts(args.learning_rate_init, global_step, 
+                                              args.lr_decay_freq, t_mul=2.0, m_mul=1.0, 
+                                              name='cosine_decay_learning_rate_restart')
+    elif args.lr_type == 'fixed':
+        return tf.convert_to_tensor(args.learning_rate_init, name='fixed_learning_rate')
+    elif args.lr_type == 'piecewise':
+        return tf.train.piecewise_constant(global_step, boundaries=args.pw_boundaries, values=args.pw_values,
+                                           name='piecewise_learning_rate')
+    else:
+        raise ValueError('Unsupported learning rate type!')
+
+
+def config_optimizer(optimizer_name, learning_rate, decay=0.9, momentum=0.9):
+    if optimizer_name == 'momentum':
+        return tf.train.MomentumOptimizer(learning_rate, momentum=momentum, use_nesterov=False)
+    elif optimizer_name == 'nesterov':
+        return tf.train.MomentumOptimizer(learning_rate, momentum=momentum, use_nesterov=True)
+    elif optimizer_name == 'rmsprop':
+        return tf.train.RMSPropOptimizer(learning_rate, decay=decay, momentum=momentum)
+    elif optimizer_name == 'adam':
+        return tf.train.AdamOptimizer(learning_rate)
+    elif optimizer_name == 'sgd':
+        return tf.train.GradientDescentOptimizer(learning_rate)
+    else:
+        raise ValueError('Unsupported optimizer type!')
@@ -0,0 +1,123 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import numpy as np
+import tensorflow as tf
+
+def gpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, nms_thresh=0.5):
+    """
+    Perform NMS on GPU using TensorFlow.
+
+    params:
+        boxes: tensor of shape [1, 10647, 4] # 10647=(13*13+26*26+52*52)*3, for input 416*416 image
+        scores: tensor of shape [1, 10647, num_classes], score=conf*prob
+        num_classes: total number of classes
+        max_boxes: integer, maximum number of predicted boxes you'd like, default is 50
+        score_thresh: if [ highest class probability score < score_threshold]
+                        then get rid of the corresponding box
+        nms_thresh: real value, "intersection over union" threshold used for NMS filtering
+    """
+
+    boxes_list, label_list, score_list = [], [], []
+    max_boxes = tf.constant(max_boxes, dtype='int32')
+
+    # since we do nms for single image, then reshape it
+    boxes = tf.reshape(boxes, [-1, 4]) # '-1' means we don't konw the exact number of boxes
+    score = tf.reshape(scores, [-1, num_classes])
+
+    # Step 1: Create a filtering mask based on "box_class_scores" by using "threshold".
+    mask = tf.greater_equal(score, tf.constant(score_thresh))
+    # Step 2: Do non_max_suppression for each class
+    for i in range(num_classes):
+        # Step 3: Apply the mask to scores, boxes and pick them out
+        filter_boxes = tf.boolean_mask(boxes, mask[:,i])
+        filter_score = tf.boolean_mask(score[:,i], mask[:,i])
+        nms_indices = tf.image.non_max_suppression(boxes=filter_boxes,
+                                                   scores=filter_score,
+                                                   max_output_size=max_boxes,
+                                                   iou_threshold=nms_thresh, name='nms_indices')
+        label_list.append(tf.ones_like(tf.gather(filter_score, nms_indices), 'int32')*i)
+        boxes_list.append(tf.gather(filter_boxes, nms_indices))
+        score_list.append(tf.gather(filter_score, nms_indices))
+
+    boxes = tf.concat(boxes_list, axis=0)
+    score = tf.concat(score_list, axis=0)
+    label = tf.concat(label_list, axis=0)
+
+    return boxes, score, label
+
+
+def py_nms(boxes, scores, max_boxes=50, iou_thresh=0.5):
+    """
+    Pure Python NMS baseline.
+
+    Arguments: boxes: shape of [-1, 4], the value of '-1' means that dont know the
+                      exact number of boxes
+               scores: shape of [-1,]
+               max_boxes: representing the maximum of boxes to be selected by non_max_suppression
+               iou_thresh: representing iou_threshold for deciding to keep boxes
+    """
+    assert boxes.shape[1] == 4 and len(scores.shape) == 1
+
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+
+    areas = (x2 - x1) * (y2 - y1)
+    order = scores.argsort()[::-1]
+
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+        inds = np.where(ovr <= iou_thresh)[0]
+        order = order[inds + 1]
+
+    return keep[:max_boxes]
+
+
+def cpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
+    """
+    Perform NMS on CPU.
+    Arguments:
+        boxes: shape [1, 10647, 4]
+        scores: shape [1, 10647, num_classes]
+    """
+
+    boxes = boxes.reshape(-1, 4)
+    scores = scores.reshape(-1, num_classes)
+    # Picked bounding boxes
+    picked_boxes, picked_score, picked_label = [], [], []
+
+    for i in range(num_classes):
+        indices = np.where(scores[:,i] >= score_thresh)
+        filter_boxes = boxes[indices]
+        filter_scores = scores[:,i][indices]
+        if len(filter_boxes) == 0: 
+            continue
+        # do non_max_suppression on the cpu
+        indices = py_nms(filter_boxes, filter_scores,
+                         max_boxes=max_boxes, iou_thresh=iou_thresh)
+        picked_boxes.append(filter_boxes[indices])
+        picked_score.append(filter_scores[indices])
+        picked_label.append(np.ones(len(indices), dtype='int32')*i)
+    if len(picked_boxes) == 0: 
+        return None, None, None
+
+    boxes = np.concatenate(picked_boxes, axis=0)
+    score = np.concatenate(picked_score, axis=0)
+    label = np.concatenate(picked_label, axis=0)
+
+    return boxes, score, label
@@ -0,0 +1,35 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import cv2
+import random
+
+
+def get_color_table(class_num, seed=2):
+    random.seed(seed)
+    color_table = {}
+    for i in range(class_num):
+        color_table[i] = [random.randint(0, 255) for _ in range(3)]
+    return color_table
+
+
+def plot_one_box(img, coord, label=None, color=None, line_thickness=None):
+    '''
+    coord: [x_min, y_min, x_max, y_max] format coordinates.
+    img: img to plot on.
+    label: str. The label name.
+    color: int. color index.
+    line_thickness: int. rectangle line thickness.
+    '''
+    tl = line_thickness or int(round(0.002 * max(img.shape[0:2])))  # line thickness
+    color = color or [random.randint(0, 255) for _ in range(3)]
+    c1, c2 = (int(coord[0]), int(coord[1])), (int(coord[2]), int(coord[3]))
+    cv2.rectangle(img, c1, c2, color, thickness=tl)
+    if label:
+        tf = max(tl - 1, 1)  # font thickness
+        t_size = cv2.getTextSize(label, 0, fontScale=float(tl) / 3, thickness=tf)[0]
+        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
+        cv2.rectangle(img, c1, c2, color, -1)  # filled
+        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, float(tl) / 3, [0, 0, 0], thickness=tf, lineType=cv2.LINE_AA)
+
@@ -0,0 +1,102 @@
+# coding: utf-8
+
+from __future__ import division, print_function
+
+import tensorflow as tf
+import numpy as np
+import argparse
+import cv2
+import time
+
+from utils.misc_utils import parse_anchors, read_class_names
+from utils.nms_utils import gpu_nms
+from utils.plot_utils import get_color_table, plot_one_box
+from utils.data_aug import letterbox_resize
+
+from model import yolov3
+
+parser = argparse.ArgumentParser(description="YOLO-V3 video test procedure.")
+parser.add_argument("input_video", type=str,
+                    help="The path of the input video.")
+parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
+                    help="The path of the anchor txt file.")
+parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
+                    help="Resize the input image with `new_size`, size format: [width, height]")
+parser.add_argument("--letterbox_resize", type=lambda x: (str(x).lower() == 'true'), default=True,
+                    help="Whether to use the letterbox resize.")
+parser.add_argument("--class_name_path", type=str, default="./data/coco.names",
+                    help="The path of the class names.")
+parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
+                    help="The path of the weights to restore.")
+parser.add_argument("--save_video", type=lambda x: (str(x).lower() == 'true'), default=False,
+                    help="Whether to save the video detection results.")
+args = parser.parse_args()
+
+args.anchors = parse_anchors(args.anchor_path)
+args.classes = read_class_names(args.class_name_path)
+args.num_class = len(args.classes)
+
+color_table = get_color_table(args.num_class)
+
+vid = cv2.VideoCapture(args.input_video)
+video_frame_cnt = int(vid.get(7))
+video_width = int(vid.get(3))
+video_height = int(vid.get(4))
+video_fps = int(vid.get(5))
+
+if args.save_video:
+    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
+    videoWriter = cv2.VideoWriter('video_result.mp4', fourcc, video_fps, (video_width, video_height))
+
+with tf.Session() as sess:
+    input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data')
+    yolo_model = yolov3(args.num_class, args.anchors)
+    with tf.variable_scope('yolov3'):
+        pred_feature_maps = yolo_model.forward(input_data, False)
+    pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
+
+    pred_scores = pred_confs * pred_probs
+
+    boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
+
+    saver = tf.train.Saver()
+    saver.restore(sess, args.restore_path)
+
+    for i in range(video_frame_cnt):
+        ret, img_ori = vid.read()
+        if args.letterbox_resize:
+            img, resize_ratio, dw, dh = letterbox_resize(img_ori, args.new_size[0], args.new_size[1])
+        else:
+            height_ori, width_ori = img_ori.shape[:2]
+            img = cv2.resize(img_ori, tuple(args.new_size))
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        img = np.asarray(img, np.float32)
+        img = img[np.newaxis, :] / 255.
+
+        start_time = time.time()
+        boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
+        end_time = time.time()
+
+        # rescale the coordinates to the original image
+        if args.letterbox_resize:
+            boxes_[:, [0, 2]] = (boxes_[:, [0, 2]] - dw) / resize_ratio
+            boxes_[:, [1, 3]] = (boxes_[:, [1, 3]] - dh) / resize_ratio
+        else:
+            boxes_[:, [0, 2]] *= (width_ori/float(args.new_size[0]))
+            boxes_[:, [1, 3]] *= (height_ori/float(args.new_size[1]))
+
+
+        for i in range(len(boxes_)):
+            x0, y0, x1, y1 = boxes_[i]
+            plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]] + ', {:.2f}%'.format(scores_[i] * 100), color=color_table[labels_[i]])
+        cv2.putText(img_ori, '{:.2f}ms'.format((end_time - start_time) * 1000), (40, 40), 0,
+                    fontScale=1, color=(0, 255, 0), thickness=2)
+        cv2.imshow('image', img_ori)
+        if args.save_video:
+            videoWriter.write(img_ori)
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    vid.release()
+    if args.save_video:
+        videoWriter.release()
@@ -0,0 +1,9 @@
+{
+    "server_count": "1",
+    "server_list": [{
+        "device": [{devices}],
+        "server_id": "127.0.0.1"
+    }],
+    "status": "completed",
+    "version": "1.0"
+}
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+# main env
+if [ -d /usr/local/Ascend/nnae/latest ];then
+
+	export LD_LIBRARY_PATH=/usr/local/:/usr/local/lib/:/usr/lib/:/usr/local/Ascend/nnae/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/Ascend/driver/tools/hccn_tool/:/usr/local/mpirun4.0/lib
+	export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/nnae/latest/fwkacllib/python/site-packages/:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages
+	export PATH=$PATH:/usr/local/Ascend/nnae/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
+	export ASCEND_OPP_PATH=/usr/local/Ascend/nnae/latest/opp
+else
+	export LD_LIBRARY_PATH=/usr/local/lib/:/usr/lib/:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:/usr/local/Ascend/add-ons/:/usr/local/mpirun4.0/lib
+	export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe:/usr/local/Ascend/ascend-toolkit/latest//fwkacllib/python/site-packages/:/usr/local/Ascend/ascend-toolkit/latest/tfplugin/python/site-packages:$projectDir
+	export PATH=$PATH:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:/usr/local/mpirun4.0/bin
+	export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/
+	
+fi
+
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+#export DUMP_GE_GRAPH=2
+#export DUMP_GRAPH_LEVEL=3
+#export PRINT_MODEL=1
+export SLOG_PRINT_TO_STDOUT=0
+export HCCL_CONNECT_TIMEOUT=600
+
+
+# system env
+ulimit -c unlimited
@@ -0,0 +1,53 @@
+
+# setting main path
+MAIN_PATH=$(dirname $(readlink -f $0))
+echo $MAIN_PATH
+
+DEVICE_NUM=$1
+ckpt_path=$2
+
+#echo $1
+#echo $2
+# set env
+export DDK_VERSION_FLAG=1.60.T49.0.B201
+export NEW_GE_FE_ID=1
+export GE_AICPU_FLAG=1
+export SOC_VERSION=Ascend910
+
+export JOB_ID=10087
+export FUSION_TENSOR_SIZE=1000000000
+
+
+export RANK_ID=yolo
+#echo "device_num is  $DEVICE_NUM"
+for((i=0;i<${DEVICE_NUM};i++));
+do
+
+export RANK_SIZE=$DEVICE_NUM
+export DEVICE_ID=$i
+export DEVICE_INDEX=$i
+
+#su HwHiAiUser -c "adc --host 0.0.0.0:22118 --log \"SetLogLevel(0)[debug]\" --device "$RANK_ID
+cd ${MAIN_PATH}/../result
+if [ x"${ckpt_path}" == x"" ];then
+    lastresult=$(ls -t | grep -E "Train*" | head -n 1)
+    RESTORE_PATH=${lastresult}/${i}/training/
+   
+else
+    lastresult=${ckpt_path}
+    RESTORE_PATH=${ckpt_path}/${i}/training/
+   
+fi
+echo $RESTORE_PATH
+ python3.7 ${MAIN_PATH}/../code/eval.py \
+--save_json True \
+--score_thresh 0.0001 \
+--nms_thresh 0.55 \
+--max_boxes 100 \
+--restore_path $RESTORE_PATH \
+--max_test 10000 \
+--save_json_path eval_res_D$DEVICE_NUM.json > ${lastresult}/eval_$i.out 2>&1
+
+done
+
+
@@ -0,0 +1,77 @@
+#!/bin/bash
+
+rank_size=$1
+yamlPath=$2
+toolsPath=$3
+if [ -f /.dockerenv ];then
+        CLUSTER=$4
+        MPIRUN_ALL_IP="$5"
+        export CLUSTER=${CLUSTER}
+fi
+currentDir=$(cd "$(dirname "$0")/.."; pwd)
+
+# 从 yaml 获取配置
+eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "tensorflow_config")
+source ${currentDir}/config/npu_set_env.sh
+
+if [ x"$runmode" != x"evaluate" ];then
+    currtime=`date +%Y%m%d%H%M%S`
+    mkdir -p ${currentDir%train*}/train/result/tf_yolov3/training_job_${currtime}/
+    train_job_dir=${currentDir%train*}/train/result/tf_yolov3/training_job_${currtime}/
+    echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${train_job_dir} &"
+fi
+
+
+# device 列表, 若无指定 device 根据 rank_size 顺序选择
+eval device_group=\$device_group_${rank_size}p
+if [ x"${device_group}" == x"" ] || [ ${rank_size} -ge 8 ];then
+    device_group="$(seq 0 "$(expr $rank_size - 1)")"
+fi
+
+# get last device id in device_group, hw log in performance from the dir named first_device_id
+device_group_str=`echo ${device_group} | sed 's/ //g'`
+first_device_id=`echo ${device_group_str: 0:1}`
+
+argsFilePath=${currentDir}/code/args_${mode}.py
+
+#echo "argsFilePath is "${argsFilePath}
+sed -i "0,/batch_size.*$/s//batch_size\ = ${batch_size}/g" ${argsFilePath}
+sed -i "s/save_epoch.*$/save_epoch\ = ${save_epoch}/g" ${argsFilePath}
+sed -i "s/total_epoches =.*$/total_epoches\ = ${total_epoches}/g" ${argsFilePath}
+sed -i 's/\r//g' ${argsFilePath}
+
+if [ x"${CLUSTER}" == x"True" ];then
+    # ln hw log
+    ln -snf ${train_job_dir}/0/hw_yolov3.log ${train_job_dir}
+    this_ip=$(hostname -I |awk '{print $1}')
+    for ip in $MPIRUN_ALL_IP;do
+        if [ x"$ip" != x"$this_ip" ];then
+            scp $yamlPath root@$ip:$yamlPath
+            scp $argsFilePath root@$ip:$argsFilePath
+        fi
+    done
+    export PATH=$PATH:/usr/local/mpirun4.0/bin
+    mpirun -H ${mpirun_ip} \
+    --bind-to none -map-by slot\
+    --allow-run-as-root \
+    --mca btl_tcp_if_exclude lo,docker0,endvnic,virbr0,vethf40501b,docker_gwbridge,br-f42ac38052b4\
+    --prefix /usr/local/mpirun4.0/ \
+    ${currentDir}/scripts/train.sh 0 $rank_size $yamlPath $currtime ${toolsPath} ${CLUSTER}
+elif [ $runmode == "train" ];then
+    ln -snf ${train_job_dir}/${first_device_id}/hw_yolov3.log ${train_job_dir}
+    rank_id=0
+    for device_id in $device_group;do
+      #echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] start: train ${device_id} & " >> ${currentDir}/result/main.log
+      ${currentDir}/scripts/train.sh $device_id $rank_size $yamlPath $currtime ${toolsPath} $rank_id&
+      let rank_id++
+    done
+else
+    echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] ${ckpt_path} &"
+    ln -snf ${train_job_dir}/${first_device_id}/hw_yolov3.log ${train_job_dir}
+    bash ${currentDir}/scripts/eval.sh ${rank_size} ${ckpt_path}
+fi
+
+wait
+
+#echo "[`date +%Y%m%d-%H:%M:%S`] [INFO] all train exit " >> ${currentDir}/result/main.log
+
@@ -0,0 +1,115 @@
+#!/bin/bash
+scriptDir=$(cd "$(dirname "$0")"; pwd)
+mainDir=$(cd "$(dirname "$scriptDir")"; pwd)
+
+device_id=$1
+rank_size=$2
+yamlPath=$3
+currentDir=$(cd "$(dirname "$0")/.."; pwd)
+currtime=$4
+toolsPath=$5
+export YAML_PATH=$3
+mkdir -p ${currentDir%train*}/train/result/tf_yolov3/training_job_${currtime}/
+export train_job_dir=${currentDir%train*}/train/result/tf_yolov3/training_job_${currtime}/
+
+
+
+# 从 yaml 获取配置
+eval $(${toolsPath}/get_params_for_yaml.sh ${yamlPath} "tensorflow_config")
+
+
+source ${currentDir}/config/npu_set_env.sh
+# 声明变量
+export REMARK_LOG_FILE=hw_yolov3.log  # 打点日志文件名称， 必须hw_后跟模型名称小写
+# 添加日志打点模块路径
+benchmark_log_path=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils
+export PYTHONPATH=$PYTHONPATH:${benchmark_log_path}
+
+# user env
+export HCCL_CONNECT_TIMEOUT=600
+export RANK_TABLE_FILE=${currentDir}/config/${rank_size}p.json
+export RANK_SIZE=${rank_size}
+export SLOG_PRINT_TO_STDOUT=0
+export DEVICE_ID=${device_id}
+export DEVICE_INDEX=${DEVICE_INDEX}
+export DEVICE_INDEX=$RANK_ID
+export JOB_ID=123678
+export FUSION_TENSOR_SIZE=1000000000
+
+
+if [ ${profiling_mode} == True ];
+then
+	export PROFILING_MODE=true
+else
+	export PROFILING_MODE=false
+fi
+
+if [ ${aicpu_profiling_mode} == True ];
+then
+	export AICPU_PROFILING_MODE=true
+else
+    export AICPU_PROFILING_MODE=false
+fi
+
+export PROFILING_OPTIONS=${profiling_options}
+export FP_POINT=${fp_point}
+export BP_POINT=${bp_point}
+
+cd ${train_job_dir}
+curd_dir=${currentDir%atlas_benchmark-master*}/atlas_benchmark-master/utils/atlasboost
+export PYTHONPATH=$PYTHONPATH:${curd_dir}
+
+if [ x"$6" != x"True" ];then
+        rank_id=$6
+        export RANK_ID=$6
+else
+        device_id_mo=$(python3.7 -c "import src.tensorflow.mpi_ops as atlasboost;atlasboost.init(); \
+                device_id = atlasboost.local_rank();cluster_device_id = str(device_id); \
+                atlasboost.set_device_id(device_id);print(atlasboost.rank())")
+        device_id_mo=`echo $device_id_mo`
+        rank_id=${device_id_mo##* }
+        export RANK_ID=${rank_id}
+        device=${device_id_mo##*deviceid = }
+        device_id=${device%% phyid=*}
+        export DEVICE_ID=${device_id}
+        hccljson=${train_job_dir}/*.json
+        cp ${hccljson} ${currentDir}/config/${rank_size}p.json
+fi
+
+#mkdir exec path
+mkdir -p ${train_job_dir}/${device_id}
+cd ${train_job_dir}/${device_id}
+
+num_cpus=$(getconf _NPROCESSORS_ONLN)
+num_cpus_per_device=$((num_cpus/8))
+PID_START=$((num_cpus_per_device*device_id))
+PID_END=$((num_cpus_per_device*device_id+num_cpus_per_device-1))
+
+startTime=`date +%Y%m%d-%H:%M:%S`
+startTime_s=`date +%s`
+
+#KERNEL_NUM=20
+#PID_START=$((KERNEL_NUM * DEVICE_ID))
+#PID_END=$((PID_START + KERNEL_NUM - 1))
+
+#sleep 5
+taskset -c  $PID_START-$PID_END python3.7 $mainDir/code/train.py --mode $mode > ${train_job_dir}/train_${device_id}.log 2>&1
+
+if [ $? -eq 0 ] ;then
+    echo ":::ABK 1.0.0 yolov3 train success"
+    echo ":::ABK 1.0.0 yolov3 train success" >> ${train_job_dir}/train_${device_id}.log
+    echo ":::ABK 1.0.0 yolov3 train success" >> ${train_job_dir}/${device_id}/hw_yolov3.log
+else
+    echo ":::ABK 1.0.0 yolov3 train failed"
+    echo ":::ABK 1.0.0 yolov3 train failed" >> ${train_job_dir}/train_${device_id}.log
+    echo ":::ABK 1.0.0 yolov3 train failed" >> ${train_job_dir}/${device_id}/hw_yolov3.log
+fi
+
+endTime=`date +%Y%m%d-%H:%M:%S`
+endTime_s=`date +%s`
+sumTime=$[ $endTime_s - $startTime_s ]
+hour=$(( $sumTime/3600 ))
+min=$(( ($sumTime-${hour}*3600)/60 ))
+sec=$(( $sumTime-${hour}*3600-${min}*60 ))
+echo ${hour}:${min}:${sec}
+echo ":::ABK 1.0.0 yolov3 train total time ${hour}:${min}:${sec}" >> ${train_job_dir}/${device_id}/hw_yolov3.log
				`@@ -0,0 +1 @@`
				`10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326`