Initial commit
Esse commit está contido em:
+139
@@ -0,0 +1,139 @@
|
||||
# Cuneiform-Sign-Detection-Code
|
||||
|
||||
Author: Tobias Dencker - <tobias.dencker@gmail.com>
|
||||
|
||||
This is the code repository for the article submission on "Deep learning of cuneiform sign detection with weak supervision using transliteration alignment".
|
||||
|
||||
This repository contains code to execute the proposed iterative training procedure as well as code to evaluate and visualize results.
|
||||
Moreover, we provide pre-trained models of the cuneiform sign detector for Neo-Assyrian script after iterative training on the [Cuneiform Sign Detection Dataset](https://compvis.github.io/cuneiform-sign-detection-dataset/).
|
||||
Finally, we provide a web application for the analysis of tablet images with the help of a pre-trained cuneiform sign detector.
|
||||
|
||||
<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_decent.jpg" alt="sign detections on tablet images: yellow box indicate TP and blue FP detections" width="700"/>
|
||||
<!--- <img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_difficult.jpg" alt="Web interface detection" width="500"/> -->
|
||||
|
||||
## Repository description
|
||||
|
||||
- General structure:
|
||||
- `data`: tablet images, annotations, transliterations, metadata
|
||||
- `experiments`: training, testing, evaluation and visualization
|
||||
- `lib`: project library code
|
||||
- `results`: generated detections (placed, raw and aligned), network weights, logs
|
||||
- `scripts`: scripts to run the alignment and placement step of iterative training
|
||||
|
||||
|
||||
### Use cases
|
||||
|
||||
- Pre-processing of training data
|
||||
- line detection
|
||||
- Iterative training
|
||||
- generate sign annotations (aligned and placed detections)
|
||||
- sign detector training
|
||||
- Evaluation (on test set)
|
||||
- raw detections
|
||||
- placed detections
|
||||
- aligned detections
|
||||
- Test & visualize
|
||||
- line segmentation and post-processing
|
||||
- line-level and sign-level alignments
|
||||
- TP/FP for raw, aligned and placed detections (full tablet and crop level)
|
||||
|
||||
|
||||
### Pre-processing
|
||||
As pre-processing of the training data line detections are obtained for all tablet images before iterative training.
|
||||
- use jupyter notebooks (`experiments/line_segmentation/`) for train, eval of line segmentation network and to perform line detection on all tablet images of train set
|
||||
|
||||
|
||||
### Training
|
||||
*Iterative training* alternates between generating aligned and placed detections and training a new sign detector:
|
||||
1. use command-line scripts (`scripts/generate/`) for running alignment and placement step of iterative training
|
||||
2. use jupyter notebooks (`experiments/sign_detector/`) for sign detector training step of iterative training
|
||||
|
||||
To keep track of the sign detector and generated sign annotations of each iteration of iterative training (stored in `results/`),
|
||||
we follow the convention to label the sign detector with a *model version* (e.g. v002)
|
||||
which is also used to label the raw, aligned and placed detections based on this detector.
|
||||
Besides providing a model version, a user also selects which subsets of the training data to use for the generation of new annotations.
|
||||
In particular, *subsets of SAAo collections* (e.g. saa01, saa05, saa08) are selected, when running the scripts under `scripts/generate/`.
|
||||
To enable the evaluation on the test set, it is necessary to include the collections (test, saa06).
|
||||
|
||||
|
||||
### Evaluation
|
||||
Use the [*test sign detector notebook*](./experiments/sign_detector/test_sign_detector.ipynb) in order to test the performance of the trained sign detector (mAP) on the test set or other subsets of the dataset.
|
||||
In `experiments/alignment_evaluation/` you find further notebooks for evaluation and visualization of line-level and sign-level alignments and TP/FP for raw, aligned and placed detections (full tablet and crop level).
|
||||
|
||||
|
||||
### Pre-trained models
|
||||
|
||||
We provide pre-trained models in the form of [PyTorch model files](https://pytorch.org/tutorials/beginner/saving_loading_models.html) for the line segmentation network as well as the sign detector.
|
||||
|
||||
| Model name | Model type | Train annotations |
|
||||
|----------------|-------------------|------------------------|
|
||||
| [lineNet_basic_vpub.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/lineNet_basic_vpub.pth) | line segmentation | 410 lines |
|
||||
|
||||
For the sign detector, we provide the best weakly supervised model (fpn_net_vA) and the best semi-supervised model (fpn_net_vF).
|
||||
|
||||
| Model name | Model type | Weak supervision in training | Annotations in training | mAP on test_full |
|
||||
|----------------|-------------------|-------------------|------------------------|------------------------|
|
||||
| [fpn_net_vA.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vA.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | None | 45.3 |
|
||||
| [fpn_net_vF.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vF.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | train_full (4663 bboxes) | 65.6 |
|
||||
|
||||
|
||||
|
||||
|
||||
### Web application
|
||||
|
||||
We also provide a demo web application that enables a user to apply a trained cuneiform sign detector to a large collection of tablet images.
|
||||
The code of the web front-end is available in the [webapp repo](https://github.com/compvis/cuneiform-sign-detection-webapp/).
|
||||
The back-end code is part of this repository and is located in [lib/webapp/](./lib/webapp/).
|
||||
Below you find a short animation of how the sign detector is used with this web interface.
|
||||
|
||||
<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/demo_cuneiform_sign_detection.gif" alt="Web interface detection" width="700"/>
|
||||
|
||||
|
||||
For demonstration purposes, we also host an instance of the web application: [Demo Web Application](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/).
|
||||
If you would like to test the web application, please contact us for user credentials to log in.
|
||||
Please note that this web application is a prototype for demonstration purposes only and not a production system.
|
||||
In case the website is not reachable, or other technical issues occur, please contact us.
|
||||
|
||||
|
||||
|
||||
### Cuneiform font
|
||||
|
||||
For visualization of the cuneiform characters, we recommend installing the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren.
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
#### Software
|
||||
Install general dependencies:
|
||||
|
||||
- **OpenGM** with python wrapper - library for discrete graphical models. http://hciweb2.iwr.uni-heidelberg.de/opengm/
|
||||
This library is needed for the alignment step during training. Testing is not affected. An installation guide for Ubuntu 14.04 can be found [here](./install_opengm.md).
|
||||
|
||||
- Python 2.7.X
|
||||
|
||||
- Python packages:
|
||||
- torch 1.0
|
||||
- torchvision
|
||||
- scikit-image 0.14.0
|
||||
- pandas, scipy, sklearn, jupyter
|
||||
- pillow, tqdm, tensorboardX, nltk, Levensthein, editdistance, easydict
|
||||
|
||||
|
||||
Clone this repository and place the [*cuneiform-sign-detection-dataset*](https://github.com/compvis/cuneiform-sign-detection-dataset) in the [./data sub-folder](./data/).
|
||||
|
||||
#### Hardware
|
||||
|
||||
Training and evaluation can be performed on a machine with a single GPU (we used a GeFore GTX 1080).
|
||||
The demo web application can run on a web server without GPU support,
|
||||
since detection inference with a lightweight MobileNetV2 backbone is fast even in CPU only mode
|
||||
(less than 1s for an image with HD resolution, less than 10s for 4K resolution).
|
||||
|
||||
### References
|
||||
This repository also includes external code. In particular, we want to mention:
|
||||
> - kuangliu's *torchcv* and *pytorch-cifar* repositories from which we adapted the SSD and FPN detector code:
|
||||
https://github.com/kuangliu/pytorch-cifar and
|
||||
https://github.com/kuangliu/torchcv
|
||||
> - Ross Girshick's *py-faster-rcnn* repository from which we adapted part of our evaluation routine:
|
||||
https://github.com/rbgirshick/py-faster-rcnn
|
||||
> - Rico Sennrich's *Bleualign* repository from which we adapted part of the Bleualign implementation:
|
||||
https://github.com/rsennrich/Bleualign
|
||||
@@ -0,0 +1 @@
|
||||
theme: jekyll-theme-cayman
|
||||
@@ -0,0 +1,15 @@
|
||||
### Data folder
|
||||
|
||||
Place [*cuneiform-sign-detection-dataset*](https://github.com/to3i/cuneiform-sign-detection-dataset) folders here:
|
||||
- ./data/annotations
|
||||
- ./data/images
|
||||
- ./data/segments
|
||||
- ./data/transliterations
|
||||
|
||||
#### Meta data files:
|
||||
|
||||
- *cunei_mzl.csv* contains the sign code class index established by Borger's Mesopotamisches Zeichenlexikon (MZL)
|
||||
- *newLabels.json* contains new labels (re-indexing) for the subset of Neo-Assyrian MZL code classes so that labels range from 0-360 instead of 0-910 which reduces the output dimension of the detector
|
||||
- *unicode_sign_stats.csv* contains estimates for sign length and height for individual cuneiform sign classes. These estimates were derived from the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren.
|
||||
|
||||
|
||||
Diferenças do arquivo suprimidas por serem muito extensas
Carregar Diff
@@ -0,0 +1 @@
|
||||
[0, 2, 0, 191, 0, 184, 196, 238, 239, 40, 26, 221, 240, 241, 24, 109, 73, 236, 210, 0, 205, 0, 242, 0, 58, 0, 133, 0, 0, 243, 0, 199, 244, 0, 0, 0, 0, 245, 0, 0, 0, 0, 0, 0, 246, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 249, 0, 0, 250, 208, 0, 0, 0, 0, 0, 193, 0, 251, 0, 252, 0, 0, 0, 162, 154, 0, 0, 0, 66, 23, 48, 0, 0, 108, 228, 129, 140, 0, 0, 0, 0, 253, 41, 97, 0, 254, 0, 0, 0, 255, 256, 0, 257, 258, 4, 8, 17, 31, 0, 202, 0, 224, 83, 213, 49, 259, 260, 0, 0, 0, 0, 20, 0, 175, 207, 261, 90, 0, 6, 0, 156, 93, 189, 152, 110, 7, 76, 64, 0, 0, 0, 0, 145, 262, 263, 194, 264, 265, 0, 0, 0, 266, 0, 0, 267, 0, 29, 0, 78, 268, 142, 231, 269, 0, 63, 0, 89, 270, 271, 272, 34, 273, 186, 0, 101, 107, 274, 275, 104, 0, 0, 0, 276, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 277, 0, 278, 0, 0, 0, 69, 0, 279, 0, 0, 280, 0, 0, 281, 0, 0, 0, 0, 0, 176, 215, 116, 0, 0, 0, 0, 0, 0, 282, 0, 283, 0, 0, 0, 284, 0, 157, 0, 0, 0, 61, 0, 0, 0, 232, 206, 5, 0, 0, 0, 80, 124, 222, 36, 0, 0, 183, 195, 84, 160, 237, 0, 0, 0, 119, 0, 0, 0, 285, 71, 0, 0, 0, 229, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 0, 0, 132, 286, 0, 0, 287, 115, 52, 0, 197, 233, 209, 0, 0, 0, 0, 0, 0, 200, 0, 204, 288, 46, 0, 0, 289, 0, 0, 0, 0, 0, 0, 0, 290, 0, 50, 0, 0, 0, 0, 0, 291, 0, 0, 0, 292, 0, 293, 192, 294, 295, 0, 0, 0, 0, 0, 0, 296, 0, 25, 120, 297, 212, 123, 0, 146, 134, 21, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 298, 299, 0, 0, 0, 300, 141, 44, 62, 45, 0, 0, 0, 0, 0, 138, 0, 0, 0, 0, 301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 302, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 303, 0, 0, 0, 304, 305, 0, 0, 306, 0, 165, 307, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 308, 0, 0, 0, 113, 309, 137, 0, 0, 28, 0, 0, 310, 0, 159, 0, 0, 0, 0, 0, 0, 0, 0, 181, 99, 158, 311, 0, 0, 0, 30, 102, 0, 74, 177, 3, 126, 312, 19, 67, 188, 130, 128, 313, 178, 0, 0, 163, 0, 314, 0, 42, 315, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 225, 72, 0, 139, 223, 0, 0, 316, 57, 317, 318, 319, 13, 35, 320, 321, 217, 322, 179, 190, 121, 65, 150, 0, 323, 324, 148, 96, 0, 0, 226, 325, 0, 326, 327, 0, 219, 328, 98, 43, 87, 0, 0, 234, 329, 112, 0, 60, 0, 18, 198, 136, 330, 0, 0, 331, 39, 0, 155, 27, 0, 92, 0, 0, 0, 0, 0, 0, 332, 0, 0, 333, 105, 0, 334, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 218, 0, 0, 94, 135, 0, 0, 103, 174, 0, 0, 0, 75, 32, 0, 201, 187, 0, 335, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 211, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 167, 0, 0, 22, 168, 0, 0, 230, 12, 0, 0, 336, 85, 166, 0, 227, 0, 53, 337, 0, 37, 0, 0, 185, 0, 338, 339, 171, 0, 0, 173, 0, 0, 86, 340, 0, 153, 0, 0, 0, 0, 341, 216, 342, 343, 0, 79, 180, 144, 0, 0, 125, 0, 161, 169, 9, 0, 0, 77, 100, 0, 0, 0, 344, 0, 0, 214, 131, 55, 95, 14, 0, 47, 345, 0, 81, 56, 117, 106, 0, 0, 0, 235, 33, 0, 0, 0, 0, 346, 347, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 348, 0, 349, 0, 0, 0, 0, 0, 0, 350, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 38, 351, 352, 0, 82, 353, 0, 164, 354, 0, 355, 0, 0, 172, 0, 0, 356, 51, 357, 358, 88, 0, 359, 0, 360, 127, 54, 361, 147, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 362, 0, 363, 0, 111, 0, 0, 1, 0, 220, 364, 365, 366, 0, 0, 0, 367, 10, 0, 0, 0, 0, 0, 0, 0, 368, 0, 0, 0, 369, 0, 170, 203, 0, 0, 151, 0, 91, 0, 370, 371, 372, 0, 0, 0, 0, 0, 149, 59, 0, 0, 0, 0, 373, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 374, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
|
||||
@@ -0,0 +1,239 @@
|
||||
train_lbl,width,height
|
||||
1,0.71875,0.9375
|
||||
2,0.9296875,0.9375
|
||||
3,1.71875,0.9375
|
||||
4,1.546875,0.9375
|
||||
5,1.6640625,1.0234375
|
||||
6,1.8125,0.9375
|
||||
7,1.6484375,0.9375
|
||||
8,1.2109375,1.0703125
|
||||
9,2.7421875,1.0625
|
||||
10,0.4765625,1.0703125
|
||||
11,0.7109375,0.9375
|
||||
12,1.71875,1.0
|
||||
13,1.171875,0.9375
|
||||
14,0.453125,0.9375
|
||||
15,1.3125,0.9375
|
||||
16,0.4375,0.9375
|
||||
17,0.9375,1.015625
|
||||
18,0.984375,0.9375
|
||||
19,1.3046875,0.9375
|
||||
20,1.890625,1.078125
|
||||
21,1.2421875,0.984375
|
||||
22,1.296875,0.9375
|
||||
23,2.109375,1.078125
|
||||
24,1.4296875,1.0546875
|
||||
25,1.7890625,1.0078125
|
||||
26,1.171875,0.9375
|
||||
27,1.203125,0.9375
|
||||
28,1.171875,0.9375
|
||||
29,2.96875,0.96875
|
||||
30,1.84375,0.9375
|
||||
31,1.21875,0.9375
|
||||
32,1.5078125,0.9453125
|
||||
33,1.5859375,1.0390625
|
||||
34,1.5234375,0.9375
|
||||
35,1.46875,0.9375
|
||||
36,1.2578125,0.9375
|
||||
37,1.7109375,0.9375
|
||||
38,1.4453125,1.0703125
|
||||
39,0.7421875,0.9375
|
||||
40,1.171875,0.9375
|
||||
41,1.2421875,0.9375
|
||||
42,1.7265625,0.9921875
|
||||
43,0.65625,0.9375
|
||||
44,0.9453125,0.9375
|
||||
45,1.2578125,1.015625
|
||||
46,2.3046875,0.9375
|
||||
47,1.3515625,0.9375
|
||||
48,1.6015625,1.0
|
||||
49,0.9296875,0.9375
|
||||
50,2.53125,1.0625
|
||||
51,0.703125,0.9375
|
||||
52,1.5234375,0.984375
|
||||
53,1.4921875,0.9453125
|
||||
54,0.8828125,0.9375
|
||||
55,1.0,0.9375
|
||||
56,2.0703125,0.9375
|
||||
57,0.921875,0.9375
|
||||
58,2.0859375,1.0703125
|
||||
59,1.3828125,1.0
|
||||
60,2.09375,0.9375
|
||||
61,2.15625,0.9453125
|
||||
62,0.9375,0.9375
|
||||
63,1.625,0.9453125
|
||||
64,1.59375,0.9453125
|
||||
65,1.6328125,0.9375
|
||||
66,2.53125,0.984375
|
||||
67,1.5,0.9453125
|
||||
68,0.671875,0.9375
|
||||
69,2.1171875,1.0546875
|
||||
70,2.5546875,0.9921875
|
||||
71,2.3125,1.078125
|
||||
72,2.25,1.0703125
|
||||
73,1.6015625,1.0234375
|
||||
74,3.1328125,1.0546875
|
||||
75,1.296875,0.9375
|
||||
76,1.7265625,0.9375
|
||||
77,1.4453125,0.9375
|
||||
78,1.1640625,1.078125
|
||||
79,1.3671875,0.9375
|
||||
80,1.2578125,0.9375
|
||||
81,1.1484375,0.9375
|
||||
82,1.515625,1.0546875
|
||||
83,1.515625,0.9375
|
||||
84,1.7421875,0.9375
|
||||
85,1.6953125,0.953125
|
||||
86,0.9453125,0.9375
|
||||
87,1.6015625,0.9375
|
||||
88,1.390625,1.0625
|
||||
89,1.1875,0.9453125
|
||||
90,1.4140625,0.9453125
|
||||
91,1.9921875,1.0546875
|
||||
92,2.40625,0.9375
|
||||
93,2.0625,0.9453125
|
||||
94,0.671875,0.9375
|
||||
95,1.234375,0.9375
|
||||
96,1.1953125,1.046875
|
||||
97,1.171875,0.9453125
|
||||
98,0.671875,0.9375
|
||||
99,1.5078125,0.9375
|
||||
100,1.46875,1.078125
|
||||
101,1.4140625,1.0703125
|
||||
102,1.6328125,0.9375
|
||||
103,1.59375,0.9375
|
||||
104,2.1953125,0.9375
|
||||
105,0.71875,0.9375
|
||||
106,1.6328125,1.0390625
|
||||
107,1.4140625,1.0546875
|
||||
108,1.4921875,0.9375
|
||||
109,1.6171875,1.03125
|
||||
110,1.59375,0.9375
|
||||
111,0.765625,0.9453125
|
||||
112,1.875,0.9375
|
||||
113,0.9296875,0.9375
|
||||
114,1.6015625,1.0703125
|
||||
115,2.0,1.0
|
||||
116,1.2734375,0.9375
|
||||
117,1.5859375,1.03125
|
||||
118,1.78125,1.0234375
|
||||
119,2.109375,1.03125
|
||||
120,1.984375,1.0546875
|
||||
121,1.546875,0.9375
|
||||
122,1.3046875,0.9375
|
||||
123,1.765625,1.078125
|
||||
124,1.265625,0.9375
|
||||
125,2.0703125,0.9375
|
||||
126,1.5390625,0.9375
|
||||
127,2.3515625,1.078125
|
||||
128,1.6171875,0.9375
|
||||
129,1.75,1.0625
|
||||
130,0.8515625,0.9375
|
||||
131,0.890625,0.9453125
|
||||
132,1.84375,1.0546875
|
||||
133,2.2890625,1.0859375
|
||||
134,1.5703125,0.96875
|
||||
135,0.671875,0.9375
|
||||
136,0.75,0.9375
|
||||
137,2.53125,1.078125
|
||||
138,1.1875,0.9375
|
||||
139,1.2890625,0.9375
|
||||
140,0.9296875,0.9375
|
||||
141,0.8515625,0.9375
|
||||
142,2.046875,1.0703125
|
||||
143,1.625,0.9375
|
||||
144,3.0546875,0.9453125
|
||||
145,1.4296875,0.9453125
|
||||
146,2.359375,1.03125
|
||||
147,0.8515625,0.9375
|
||||
148,1.640625,0.9375
|
||||
149,1.859375,1.0078125
|
||||
150,1.1328125,0.9375
|
||||
151,2.609375,1.078125
|
||||
152,1.7890625,0.9375
|
||||
153,0.984375,0.9375
|
||||
154,1.953125,0.9765625
|
||||
155,1.46875,0.9375
|
||||
156,1.65625,0.9375
|
||||
157,1.96875,0.9375
|
||||
158,1.5078125,1.078125
|
||||
159,1.8828125,1.046875
|
||||
160,2.0625,1.0078125
|
||||
161,2.671875,1.0625
|
||||
162,2.296875,0.984375
|
||||
163,1.7109375,0.9375
|
||||
164,1.5234375,1.09375
|
||||
165,0.9296875,0.9609375
|
||||
166,2.2734375,1.03125
|
||||
167,1.25,0.9375
|
||||
168,2.1640625,0.9375
|
||||
169,2.84375,1.078125
|
||||
170,1.1796875,1.0
|
||||
171,2.484375,0.9375
|
||||
172,3.9609375,1.0078125
|
||||
173,0.8515625,0.9375
|
||||
174,1.796875,0.9375
|
||||
175,2.2578125,1.0546875
|
||||
176,1.046875,0.9375
|
||||
177,1.671875,0.9375
|
||||
178,1.828125,1.0
|
||||
179,1.4765625,0.9375
|
||||
180,2.5546875,1.0703125
|
||||
181,1.5078125,0.9453125
|
||||
182,3.0078125,1.078125
|
||||
183,1.4921875,0.9375
|
||||
184,1.3359375,0.9375
|
||||
185,1.2890625,0.9375
|
||||
186,2.578125,0.9375
|
||||
187,1.59375,0.9453125
|
||||
188,1.5234375,0.9375
|
||||
189,3.828125,0.9375
|
||||
190,1.28125,0.9375
|
||||
191,1.125,0.9375
|
||||
192,2.0625,0.9375
|
||||
193,1.640625,0.96875
|
||||
194,1.0234375,0.9375
|
||||
195,1.7421875,0.9375
|
||||
196,1.5859375,0.9375
|
||||
197,1.84375,0.9375
|
||||
198,1.6875,0.9375
|
||||
199,2.171875,1.0703125
|
||||
200,1.3359375,0.9375
|
||||
201,1.953125,0.9375
|
||||
202,1.7578125,1.0
|
||||
203,1.7734375,1.078125
|
||||
204,2.203125,0.9375
|
||||
205,1.515625,1.046875
|
||||
206,2.234375,0.9375
|
||||
207,1.34375,0.9375
|
||||
208,2.2109375,1.0625
|
||||
209,1.3671875,0.9375
|
||||
210,1.4609375,1.0234375
|
||||
211,2.5078125,1.0703125
|
||||
212,1.765625,1.0703125
|
||||
213,0.9296875,1.0625
|
||||
214,1.859375,0.9375
|
||||
215,2.234375,1.015625
|
||||
216,1.8671875,1.078125
|
||||
217,1.7890625,1.0703125
|
||||
218,0.59375,0.9375
|
||||
219,0.53125,0.9375
|
||||
220,0.8046875,0.9375
|
||||
221,1.9453125,0.9375
|
||||
223,2.328125,1.03125
|
||||
224,1.5859375,0.9375
|
||||
225,1.3046875,0.9375
|
||||
226,1.7265625,1.046875
|
||||
227,1.84375,0.9375
|
||||
228,1.5234375,0.9375
|
||||
229,2.6796875,1.046875
|
||||
230,1.53125,0.984375
|
||||
231,1.8046875,0.953125
|
||||
232,1.25,0.9375
|
||||
233,1.5859375,0.9375
|
||||
234,2.0625,0.9453125
|
||||
235,1.5859375,1.03125
|
||||
236,1.84375,1.0234375
|
||||
237,2.171875,1.0703125
|
||||
238,1.5859375,0.9375
|
||||
239,1.6875,0.9375
|
||||
|
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
@@ -0,0 +1,6 @@
|
||||
### Train & eval line segmentation network
|
||||
|
||||
- use `train_line_segmentation.ipynb` for training and `test_line_segmentation.ipynb` for eval
|
||||
|
||||
### Pre-processing before iterative training
|
||||
- use `precompute_line_segmentations.ipynb` obtain line detections for all tablet images in the training set as pre-processing before iterative training starts
|
||||
@@ -0,0 +1,301 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pre-compute and store line segmentations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from PIL import Image\n",
|
||||
"from ast import literal_eval"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# torch\n",
|
||||
"import torch\n",
|
||||
"import torchvision\n",
|
||||
"# addons\n",
|
||||
"from tqdm import tqdm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#%pylab inline\n",
|
||||
"%matplotlib inline\n",
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# for auto-reloading external modules\n",
|
||||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"relative_path = '../../'\n",
|
||||
"# ensure that parent path is on the python path in order to have all packages available\n",
|
||||
"import sys, os\n",
|
||||
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
|
||||
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
|
||||
"sys.path.insert(0, parent_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from lib.models.trained_model_loader import get_line_net_fcn\n",
|
||||
"from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
|
||||
"from lib.transliteration.sign_labels import get_label_list\n",
|
||||
"from lib.utils.transform_utils import UnNormalize\n",
|
||||
"\n",
|
||||
"from lib.detection.run_gen_line_detection import gen_line_detections"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Config Basics"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# toggle generation\n",
|
||||
"save_line_detections = True"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# set line segmentation network\n",
|
||||
"line_model_version = 'v002'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# dataset config\n",
|
||||
"collections = ['test', 'train', 'saa01', 'saa05', 'saa06', 'saa08', 'saa09', 'saa10', 'saa13', 'saa16'] \n",
|
||||
"#collections = ['saa01', 'saa05']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data_layer_params = dict(batch_size=[128, 16],\n",
|
||||
" img_channels=1,\n",
|
||||
" gray_mean=[0.5],\n",
|
||||
" gray_std=[1.0], \n",
|
||||
" num_classes = 2\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Config Data Augmentation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"num_classes = data_layer_params['num_classes']\n",
|
||||
"num_c = data_layer_params['img_channels']\n",
|
||||
"gray_mean = data_layer_params['gray_mean']\n",
|
||||
"gray_std = data_layer_params['gray_std']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"re_transform = torchvision.transforms.Compose([\n",
|
||||
" UnNormalize(mean=gray_mean, std=gray_std),\n",
|
||||
" torchvision.transforms.ToPILImage(),\n",
|
||||
" ])\n",
|
||||
"re_transform_rgb = torchvision.transforms.Compose([\n",
|
||||
" UnNormalize(mean=gray_mean * 3, std=gray_std * 3),\n",
|
||||
" torchvision.transforms.ToPILImage(),\n",
|
||||
" ])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#use_gpu = torch.cuda.is_available()\n",
|
||||
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_fcn = get_line_net_fcn(line_model_version, device, num_classes=num_classes, num_c=num_c)\n",
|
||||
"print(model_fcn)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run experiment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for saa_version in collections:\n",
|
||||
" print('collection: <><>{}<><>'.format(saa_version))\n",
|
||||
" \n",
|
||||
" ### Get collection dataset\n",
|
||||
" dataset = CuneiformSegments(collections=[saa_version], relative_path=relative_path, \n",
|
||||
" only_annotated=False, only_assigned=True, preload_segments=False)\n",
|
||||
" \n",
|
||||
" # filter collection dataset - OPTIONAL\n",
|
||||
" didx_list = range(len(dataset))\n",
|
||||
" \n",
|
||||
" ### Generate line detections\n",
|
||||
" gen_line_detections(didx_list, dataset, saa_version, relative_path,\n",
|
||||
" line_model_version, model_fcn, re_transform, device,\n",
|
||||
" save_line_detections) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
@@ -0,0 +1,11 @@
|
||||
### Perform sign detector training
|
||||
After the sign annotations (aligned and placed detections) have been generated and stored under `results/results_ssd/` using the scripts in `scripts/generate/`,
|
||||
the sign detector is trained by performing the following steps:
|
||||
|
||||
1) use `train_sign_classifier.ipynb` as template to train sign classifier
|
||||
2) use `train_sign_detector.ipynb` as template to train sign detector (initialized with pre-trained sign classifier from 1.)
|
||||
3) in semi-supervised case, use `finetune_sign_detector.ipynb` to fine-tune sign detector on manual annotations
|
||||
|
||||
### Eval sign detector
|
||||
|
||||
- use `test_sign_detector.ipynb` for evaluation of the sign detector
|
||||
@@ -0,0 +1,623 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fine-tune sign detector network (in semi-supervised case)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from PIL import Image\n",
|
||||
"from ast import literal_eval\n",
|
||||
"import os.path\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"import copy\n",
|
||||
"\n",
|
||||
"import torch\n",
|
||||
"import torch.optim as optim\n",
|
||||
"from torch.optim import lr_scheduler\n",
|
||||
"import torch.utils.data as data\n",
|
||||
"\n",
|
||||
"from torchvision import transforms as trafos\n",
|
||||
"import torchvision.transforms as transforms"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# for auto-reloading external modules\n",
|
||||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"relative_path = '../../'\n",
|
||||
"# ensure that parent path is on the python path in order to have all packages available\n",
|
||||
"import sys, os\n",
|
||||
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
|
||||
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
|
||||
"sys.path.insert(0, parent_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
|
||||
"\n",
|
||||
"from lib.alignment.LineFragment import plot_boxes\n",
|
||||
"from lib.utils.pytorch_utils import get_tensorboard_writer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
|
||||
"from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
|
||||
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
|
||||
"from lib.utils.torchcv.models.net import FPNSSD\n",
|
||||
"from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"hh = 0.001\n",
|
||||
"## time.sleep(60*60*hh)\n",
|
||||
"for i in tqdm(range(int(6*60*hh))):\n",
|
||||
" time.sleep(10)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Config Basics"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_version = 'v001ft01'\n",
|
||||
"\n",
|
||||
"# config pretrained detector\n",
|
||||
"pretrained_model_version = 'v001' # 'v191' \n",
|
||||
"\n",
|
||||
"# config datasets for training and testing\n",
|
||||
"train_collections = ['train_D'] \n",
|
||||
"test_collections = ['testEXT'] # ['test_full']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# config generated data\n",
|
||||
"with_gen_data = False\n",
|
||||
"\n",
|
||||
"gen_model_version = 'v001' \n",
|
||||
"\n",
|
||||
"gen_folder = 'results_ssd/{}/'.format(gen_model_version) \n",
|
||||
"gen_file_path = None\n",
|
||||
"\n",
|
||||
"gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
|
||||
"gen_collections += ['train']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# config backbone architecture\n",
|
||||
"arch_opt = 1\n",
|
||||
"arch_type = 'mobile'\n",
|
||||
"width_mult = 0.625\n",
|
||||
"\n",
|
||||
"# config detector\n",
|
||||
"with_64 = False\n",
|
||||
"create_bg_class = False\n",
|
||||
"img_size = 512\n",
|
||||
"num_classes = 240\n",
|
||||
"\n",
|
||||
"# config schedule\n",
|
||||
"num_epochs = 11 \n",
|
||||
"lr_milestones = [60]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# set log file name\n",
|
||||
"if with_gen_data:\n",
|
||||
" version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
|
||||
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
|
||||
"else:\n",
|
||||
" version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
|
||||
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Preparing Datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
|
||||
" from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
|
||||
" from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
|
||||
" from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
|
||||
"else:\n",
|
||||
" from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
|
||||
" from lib.utils.torchcv.transforms.resize import resize\n",
|
||||
" from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
|
||||
" from lib.utils.torchcv.transforms.pad_gs import pad"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
|
||||
"print('num_anchors', len(box_coder.anchor_boxes))\n",
|
||||
"print('anchor areas', np.sqrt(box_coder.anchor_areas))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data: \n",
|
||||
" def transform_train(img, boxes, labels, linemap):\n",
|
||||
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
|
||||
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
|
||||
" transforms.Lambda(lambda x: x) # identity\n",
|
||||
" ])(img) \n",
|
||||
" img, linemap = pad_lm(img, linemap, (600, 600))\n",
|
||||
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
|
||||
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
|
||||
"\n",
|
||||
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
|
||||
"else:\n",
|
||||
" def transform_train(img, boxes, labels):\n",
|
||||
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
|
||||
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
|
||||
" transforms.Lambda(lambda x: x) # identity\n",
|
||||
" ])(img) \n",
|
||||
" img = pad(img, (600, 600))\n",
|
||||
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
|
||||
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels)\n",
|
||||
" return img, boxes, labels"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
|
||||
" gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
|
||||
" relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
|
||||
" remove_empty_tiles=False, min_align_ratio=0.2)\n",
|
||||
"else:\n",
|
||||
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
|
||||
" gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" def transform_test(img, boxes, labels, linemap):\n",
|
||||
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
|
||||
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
|
||||
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
|
||||
"else:\n",
|
||||
" def transform_test(img, boxes, labels):\n",
|
||||
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
|
||||
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels)\n",
|
||||
" return img, boxes, labels"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
|
||||
" gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
|
||||
"else:\n",
|
||||
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
|
||||
" gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
|
||||
"testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Building Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load FPN model from pretrained detector model\n",
|
||||
"fpnssd_net = get_fpn_ssd_net(pretrained_model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
|
||||
" relative_path, num_classes, num_c=1)\n",
|
||||
"fpnssd_net.train()\n",
|
||||
"\n",
|
||||
"# print model\n",
|
||||
"print(fpnssd_net)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Test net\n",
|
||||
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
|
||||
"print(loc_preds.size(), cls_preds.size())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Optimization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"criterion = SSDLoss(num_classes=num_classes)\n",
|
||||
"#criterion = FocalLoss(num_classes=num_classes)\n",
|
||||
"optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.0001, momentum=0.9, weight_decay=1e-4)\n",
|
||||
"\n",
|
||||
"# lr policy\n",
|
||||
"# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
|
||||
"scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# init logger\n",
|
||||
"if version_remark == '':\n",
|
||||
" comment_str = '_{}'.format(model_version)\n",
|
||||
"else:\n",
|
||||
" comment_str = '_{}_{}'.format(model_version, version_remark)\n",
|
||||
"writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Training\n",
|
||||
"best_loss = float('inf') # best test loss\n",
|
||||
"best_epoch = 0\n",
|
||||
"best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def train(epoch):\n",
|
||||
" fpnssd_net.train()\n",
|
||||
" train_loss = 0\n",
|
||||
"\n",
|
||||
" scheduler.step()\n",
|
||||
"\n",
|
||||
" if with_gen_data:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
"\n",
|
||||
" train_loss += loss.item()\n",
|
||||
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
|
||||
" else:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
"\n",
|
||||
" train_loss += loss.item()\n",
|
||||
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
|
||||
"\n",
|
||||
" # write to logger\n",
|
||||
" phase = 'train'\n",
|
||||
" writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
|
||||
"\n",
|
||||
"def test(epoch):\n",
|
||||
" fpnssd_net.eval()\n",
|
||||
" test_loss = 0\n",
|
||||
" with torch.no_grad():\n",
|
||||
"\n",
|
||||
" if with_gen_data:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" test_loss += loss.item()\n",
|
||||
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
|
||||
" else:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" test_loss += loss.item()\n",
|
||||
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
|
||||
"\n",
|
||||
" # write to logger\n",
|
||||
" phase = 'test'\n",
|
||||
" writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
|
||||
"\n",
|
||||
" # deep copy the model\n",
|
||||
" global best_loss\n",
|
||||
" global best_epoch\n",
|
||||
" test_loss /= len(testloader)\n",
|
||||
" if test_loss < best_loss and epoch > 5:\n",
|
||||
" # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
|
||||
" weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
|
||||
" torch.save(fpnssd_net.state_dict(), weights_path)\n",
|
||||
" best_epoch = epoch\n",
|
||||
" best_loss = test_loss"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for epoch in tqdm(range(num_epochs)):\n",
|
||||
" print('\\nEpoch: %d' % epoch)\n",
|
||||
" train(epoch)\n",
|
||||
" if epoch % 2 == 0:\n",
|
||||
" print('\\nTest')\n",
|
||||
" test(epoch)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# choose model filename\n",
|
||||
"weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
|
||||
"# Save only the model parameters\n",
|
||||
"torch.save(fpnssd_net.state_dict(), weights_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,907 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test and visualize sign detector"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from PIL import Image\n",
|
||||
"from ast import literal_eval\n",
|
||||
"\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"\n",
|
||||
"import torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# for auto-reloading external modules\n",
|
||||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"relative_path = '../../'\n",
|
||||
"# ensure that parent path is on the python path in order to have all packages available\n",
|
||||
"import sys, os\n",
|
||||
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
|
||||
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
|
||||
"sys.path.insert(0, parent_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/home/tobias/.virtualenvs/pytorch/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.\n",
|
||||
" utils.DeprecatedIn23,\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
|
||||
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
|
||||
"from lib.detection.run_gen_ssd_detection import gen_ssd_detections"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"collections = ['test'] # e.g. train test saa06\n",
|
||||
"only_annotated = True\n",
|
||||
"only_assigned = True\n",
|
||||
"\n",
|
||||
"# store detections for re-use\n",
|
||||
"save_detections = False\n",
|
||||
"\n",
|
||||
"# show detections\n",
|
||||
"show_detections = True"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_version = 'v191ft01' \n",
|
||||
"\n",
|
||||
"arch_type = 'mobile' # resnet, mobile\n",
|
||||
"arch_opt = 1\n",
|
||||
"width_mult = 0.625 # 0.5 0.625 0.75\n",
|
||||
"\n",
|
||||
"crop_shape = [600, 600]\n",
|
||||
"tile_shape = [600, 600]\n",
|
||||
"\n",
|
||||
"num_classes = 240\n",
|
||||
"\n",
|
||||
"with_64 = False \n",
|
||||
"create_bg_class = False \n",
|
||||
"with_4_aspects = False "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a id='complconf'>Config Completeness</a>\n",
|
||||
"\n",
|
||||
"[Jump to results](#results)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"test_min_score_thresh = 0.01 # 0.01 0.05\n",
|
||||
"test_nms_thresh = 0.5 \n",
|
||||
"\n",
|
||||
"eval_ovthresh = 0.5 # 0.4"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Load Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"FPNSSD(\n",
|
||||
" (fpn): MobileNetV2FPN(\n",
|
||||
" (features): Sequential(\n",
|
||||
" (0): Sequential(\n",
|
||||
" (0): Conv2d(1, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" )\n",
|
||||
" (1): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(20, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(20, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=20, bias=False)\n",
|
||||
" (4): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (2): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(10, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(60, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=60, bias=False)\n",
|
||||
" (4): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(60, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (1): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=90, bias=False)\n",
|
||||
" (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(90, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (3): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=90, bias=False)\n",
|
||||
" (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(90, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (1): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
|
||||
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (2): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
|
||||
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (4): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=120, bias=False)\n",
|
||||
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (1): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
|
||||
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (2): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
|
||||
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (3): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
|
||||
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (5): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
|
||||
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(240, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (1): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
|
||||
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (2): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
|
||||
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (6): MobileBlock(\n",
|
||||
" (mobile_block): Sequential(\n",
|
||||
" (0): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
|
||||
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(360, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (1): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
|
||||
" (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (2): InvertedResidual(\n",
|
||||
" (conv): Sequential(\n",
|
||||
" (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
|
||||
" (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (5): ReLU6(inplace)\n",
|
||||
" (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" (7): Sequential(\n",
|
||||
" (0): Conv2d(100, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
|
||||
" (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
|
||||
" (2): ReLU6(inplace)\n",
|
||||
" )\n",
|
||||
" (8): AvgPool2d(kernel_size=7, stride=1, padding=0)\n",
|
||||
" )\n",
|
||||
" (conv6): Conv2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
|
||||
" (toplayer): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))\n",
|
||||
" )\n",
|
||||
" (loc_head): Sequential(\n",
|
||||
" (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (1): ReLU(inplace)\n",
|
||||
" (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (3): ReLU(inplace)\n",
|
||||
" (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (5): ReLU(inplace)\n",
|
||||
" (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (7): ReLU(inplace)\n",
|
||||
" (8): Conv2d(256, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" )\n",
|
||||
" (cls_head): Sequential(\n",
|
||||
" (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (1): ReLU(inplace)\n",
|
||||
" (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (3): ReLU(inplace)\n",
|
||||
" (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (5): ReLU(inplace)\n",
|
||||
" (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" (7): ReLU(inplace)\n",
|
||||
" (8): Conv2d(256, 2880, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
|
||||
" )\n",
|
||||
")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"fpnssd_net = get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
|
||||
" relative_path, num_classes, num_c=1, rnd_init_model=False)\n",
|
||||
"\n",
|
||||
"print(fpnssd_net)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(torch.Size([1, 15360, 4]), torch.Size([1, 15360, 240]))\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"### Test net\n",
|
||||
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, 1024, 1024).to(device))\n",
|
||||
"print(loc_preds.size(), cls_preds.size())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Prepare dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Setup dataset spanning 3 collections with 4465 annotations [67 segments, 67 indices]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"dataset = CuneiformSegments(collections=collections, relative_path=relative_path, \n",
|
||||
" only_annotated=only_annotated, only_assigned=only_assigned, preload_segments=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Predict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"train: 5%|▌ | 1/19 [00:00<00:15, 1.20it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334921', 'Obv')\n",
|
||||
"mAP 0.7926 | global AP: 0.7473 | mAP (align): 0.8859\n",
|
||||
"total_tp: 22 | total_fp: 17 [46] | acc: 0.56\n",
|
||||
"('P334921', 'Rev')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 11%|█ | 2/19 [00:01<00:08, 1.98it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.7143 | global AP: 0.7286 | mAP (align): 1.0\n",
|
||||
"total_tp: 7 | total_fp: 1 [9] | acc: 0.88\n",
|
||||
"('P334863', 'Obv')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/home/tobias/Dropbox/NeuralNets/caffe_workspace/pycaffe/cuneiform-sign-detection/lib/evaluations/sign_evaluator.py:184: RuntimeWarning: invalid value encountered in divide\n",
|
||||
" return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)\n",
|
||||
"\r",
|
||||
"train: 16%|█▌ | 3/19 [00:01<00:06, 2.43it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.0 | global AP: 0.0 | mAP (align): nan\n",
|
||||
"total_tp: 0 | total_fp: 0 [2] | acc: nan\n",
|
||||
"('P334831', 'Rev')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 21%|██ | 4/19 [00:01<00:05, 2.57it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.8409 | global AP: 0.7323 | mAP (align): 0.881\n",
|
||||
"total_tp: 27 | total_fp: 38 [74] | acc: 0.42\n",
|
||||
"('P334831', 'Obv')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 26%|██▋ | 5/19 [00:01<00:05, 2.60it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.9274 | global AP: 0.8946 | mAP (align): 0.9518\n",
|
||||
"total_tp: 59 | total_fp: 55 [97] | acc: 0.52\n",
|
||||
"('P334892', 'Rev')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 32%|███▏ | 6/19 [00:02<00:04, 2.73it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.9236 | global AP: 0.8063 | mAP (align): 0.9236\n",
|
||||
"total_tp: 18 | total_fp: 10 [28] | acc: 0.64\n",
|
||||
"('P334892', 'Obv')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 37%|███▋ | 7/19 [00:02<00:04, 2.91it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.9667 | global AP: 0.9474 | mAP (align): 0.9667\n",
|
||||
"total_tp: 18 | total_fp: 8 [27] | acc: 0.69\n",
|
||||
"('P336635', 'Obv')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 42%|████▏ | 8/19 [00:02<00:03, 2.94it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.9061 | global AP: 0.7393 | mAP (align): 0.9061\n",
|
||||
"total_tp: 13 | total_fp: 8 [35] | acc: 0.62\n",
|
||||
"('P334865', 'Obv')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 47%|████▋ | 9/19 [00:03<00:03, 2.93it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.9157 | global AP: 0.887 | mAP (align): 0.9443\n",
|
||||
"total_tp: 40 | total_fp: 34 [52] | acc: 0.54\n",
|
||||
"('P334865', 'Rev')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"train: 58%|█████▊ | 11/19 [00:03<00:02, 3.14it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"mAP 0.8139 | global AP: 0.8114 | mAP (align): 0.8879\n",
|
||||
"total_tp: 49 | total_fp: 47 [71] | acc: 0.51\n",
|
||||
"('P334842', 'Obv')\n",
|
||||
"mAP 0.0 | global AP: 0.0 | mAP (align): 0.0\n",
|
||||
"total_tp: 0 | total_fp: 0 [0] | acc: 0.0\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 63%|██████▎ | 12/19 [00:03<00:02, 3.16it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334848', 'Obv')\n",
|
||||
"mAP 0.9074 | global AP: 0.9346 | mAP (align): 0.9074\n",
|
||||
"total_tp: 22 | total_fp: 14 [25] | acc: 0.61\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 68%|██████▊ | 13/19 [00:04<00:01, 3.23it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334848', 'Rev')\n",
|
||||
"mAP 0.9375 | global AP: 0.8014 | mAP (align): 1.0\n",
|
||||
"total_tp: 16 | total_fp: 4 [38] | acc: 0.8\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 74%|███████▎ | 14/19 [00:04<00:01, 3.23it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334839', 'Obv')\n",
|
||||
"mAP 0.9333 | global AP: 0.8562 | mAP (align): 0.9956\n",
|
||||
"total_tp: 22 | total_fp: 17 [52] | acc: 0.56\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 79%|███████▉ | 15/19 [00:04<00:01, 3.24it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334896', 'Obv')\n",
|
||||
"mAP 0.9375 | global AP: 0.8586 | mAP (align): 0.9375\n",
|
||||
"total_tp: 26 | total_fp: 20 [36] | acc: 0.57\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 84%|████████▍ | 16/19 [00:04<00:00, 3.21it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334836', 'Rev')\n",
|
||||
"mAP 1.0 | global AP: 0.9705 | mAP (align): 1.0\n",
|
||||
"total_tp: 45 | total_fp: 24 [65] | acc: 0.65\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 89%|████████▉ | 17/19 [00:05<00:00, 3.17it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334894', 'Rev')\n",
|
||||
"mAP 1.0 | global AP: 0.8218 | mAP (align): 1.0\n",
|
||||
"total_tp: 15 | total_fp: 9 [39] | acc: 0.62\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 95%|█████████▍| 18/19 [00:05<00:00, 3.14it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P334894', 'Obv')\n",
|
||||
"mAP 0.8727 | global AP: 0.8457 | mAP (align): 0.8727\n",
|
||||
"total_tp: 41 | total_fp: 39 [62] | acc: 0.51\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\r",
|
||||
"train: 100%|██████████| 19/19 [00:06<00:00, 3.14it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('P336178', 'Obv')\n",
|
||||
"mAP 0.9375 | global AP: 0.8319 | mAP (align): 0.9375\n",
|
||||
"total_tp: 31 | total_fp: 20 [34] | acc: 0.61\n",
|
||||
"train | v191ft01\n",
|
||||
"RESULTS ON FULL COLLECTION :\n",
|
||||
"mAP 0.7739 | global AP: 0.7816 | mAP (align): 0.7958\n",
|
||||
"total_tp: 471 | total_fp: 690 [792] | prec: 0.406\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# filter collection dataset - OPTIONAL\n",
|
||||
"didx_list = range(len(dataset))\n",
|
||||
"didx_list = didx_list[:19] #19\n",
|
||||
"\n",
|
||||
"### Generate ssd detections\n",
|
||||
"(list_seg_ap, \n",
|
||||
" list_seg_name_with_anno) = gen_ssd_detections(didx_list, dataset, collections[0], relative_path, \n",
|
||||
" model_version, fpnssd_net, with_64, create_bg_class, device,\n",
|
||||
" test_min_score_thresh, test_nms_thresh, eval_ovthresh,\n",
|
||||
" save_detections, show_detections, with_4_aspects=with_4_aspects, \n",
|
||||
" verbose_mode=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a id='results'>Results</a>\n",
|
||||
"\n",
|
||||
"[Jump to completeness config](#complconf)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Jump to Results](#results)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
@@ -0,0 +1,634 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train sign detector network"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"from PIL import Image\n",
|
||||
"from ast import literal_eval\n",
|
||||
"import os.path\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"import copy\n",
|
||||
"\n",
|
||||
"import torch\n",
|
||||
"import torch.optim as optim\n",
|
||||
"from torch.optim import lr_scheduler\n",
|
||||
"import torch.utils.data as data\n",
|
||||
"\n",
|
||||
"from torchvision import transforms as trafos\n",
|
||||
"import torchvision.transforms as transforms"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# for auto-reloading external modules\n",
|
||||
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"relative_path = '../../'\n",
|
||||
"# ensure that parent path is on the python path in order to have all packages available\n",
|
||||
"import sys, os\n",
|
||||
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
|
||||
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
|
||||
"sys.path.insert(0, parent_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
|
||||
"\n",
|
||||
"from lib.alignment.LineFragment import plot_boxes\n",
|
||||
"from lib.utils.pytorch_utils import get_tensorboard_writer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
|
||||
"from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
|
||||
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
|
||||
"from lib.utils.torchcv.models.net import FPNSSD\n",
|
||||
"from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"hh = 0.001\n",
|
||||
"## time.sleep(60*60*hh)\n",
|
||||
"for i in tqdm(range(int(6*60*hh))):\n",
|
||||
" time.sleep(10)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Config Basics"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_version = 'v001'\n",
|
||||
"\n",
|
||||
"# config pretrained classifier\n",
|
||||
"pretrained_model_version = 'v001' #'v239' \n",
|
||||
"\n",
|
||||
"# config datasets for training and testing\n",
|
||||
"train_collections = ['train_E'] \n",
|
||||
"test_collections = ['test_full']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# config generated data\n",
|
||||
"with_gen_data = True\n",
|
||||
"\n",
|
||||
"gen_model_version = 'v001' #'v171_hp04' \n",
|
||||
"\n",
|
||||
"gen_folder = 'results_ssd/{}/'.format(gen_model_version) \n",
|
||||
"gen_file_path = None\n",
|
||||
"\n",
|
||||
"gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
|
||||
"#gen_collections = ['saa01', 'saa05']\n",
|
||||
"gen_collections += ['train']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# config backbone architecture\n",
|
||||
"arch_opt = 1\n",
|
||||
"arch_type = 'mobile'\n",
|
||||
"width_mult = 0.625\n",
|
||||
"\n",
|
||||
"# config detector\n",
|
||||
"with_64 = False\n",
|
||||
"create_bg_class = False\n",
|
||||
"img_size = 512\n",
|
||||
"num_classes = 240\n",
|
||||
"\n",
|
||||
"# config schedule\n",
|
||||
"num_epochs = 51 \n",
|
||||
"lr_milestones = [60]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# set log file name\n",
|
||||
"if with_gen_data:\n",
|
||||
" version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
|
||||
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
|
||||
"else:\n",
|
||||
" version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
|
||||
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Preparing Datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
|
||||
" from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
|
||||
" from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
|
||||
" from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
|
||||
"else:\n",
|
||||
" from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
|
||||
" from lib.utils.torchcv.transforms.resize import resize\n",
|
||||
" from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
|
||||
" from lib.utils.torchcv.transforms.pad_gs import pad"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
|
||||
"print('num_anchors', len(box_coder.anchor_boxes))\n",
|
||||
"print('anchor areas', np.sqrt(box_coder.anchor_areas))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data: \n",
|
||||
" def transform_train(img, boxes, labels, linemap):\n",
|
||||
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
|
||||
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
|
||||
" transforms.Lambda(lambda x: x) # identity\n",
|
||||
" ])(img) \n",
|
||||
" img, linemap = pad_lm(img, linemap, (600, 600))\n",
|
||||
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
|
||||
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
|
||||
"\n",
|
||||
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
|
||||
"else:\n",
|
||||
" def transform_train(img, boxes, labels):\n",
|
||||
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
|
||||
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
|
||||
" transforms.Lambda(lambda x: x) # identity\n",
|
||||
" ])(img) \n",
|
||||
" img = pad(img, (600, 600))\n",
|
||||
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
|
||||
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels)\n",
|
||||
" return img, boxes, labels"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
|
||||
" gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
|
||||
" relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
|
||||
" remove_empty_tiles=False, min_align_ratio=0.2)\n",
|
||||
"else:\n",
|
||||
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
|
||||
" gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" def transform_test(img, boxes, labels, linemap):\n",
|
||||
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
|
||||
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
|
||||
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
|
||||
"else:\n",
|
||||
" def transform_test(img, boxes, labels):\n",
|
||||
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
|
||||
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
|
||||
" img = transforms.Compose([\n",
|
||||
" transforms.ToTensor(),\n",
|
||||
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
|
||||
" ])(img)\n",
|
||||
" boxes, labels = box_coder.encode(boxes, labels)\n",
|
||||
" return img, boxes, labels"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if with_gen_data:\n",
|
||||
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
|
||||
" gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
|
||||
"else:\n",
|
||||
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
|
||||
" gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
|
||||
"testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Building Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load classifier model\n",
|
||||
"basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=1, arch_opt=arch_opt)\n",
|
||||
"\n",
|
||||
"# load pretrained weights\n",
|
||||
"weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, pretrained_model_version)\n",
|
||||
"basic_net.load_state_dict(torch.load(weights_path)) # , strict=False\n",
|
||||
"basic_net = basic_net.to(device)\n",
|
||||
"\n",
|
||||
"# load FPN model with classifier model\n",
|
||||
"fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64).to(device)\n",
|
||||
"\n",
|
||||
"# load full detector net\n",
|
||||
"fpnssd_net = FPNSSD(fpn_net, num_classes=num_classes).to(device)\n",
|
||||
"fpnssd_net.train()\n",
|
||||
"\n",
|
||||
"# print model\n",
|
||||
"print(fpnssd_net)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Test net\n",
|
||||
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
|
||||
"print(loc_preds.size(), cls_preds.size())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Optimization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"criterion = SSDLoss(num_classes=num_classes)\n",
|
||||
"#criterion = FocalLoss(num_classes=num_classes)\n",
|
||||
"optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)\n",
|
||||
"\n",
|
||||
"# lr policy\n",
|
||||
"# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
|
||||
"scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# init logger\n",
|
||||
"if version_remark == '':\n",
|
||||
" comment_str = '_{}'.format(model_version)\n",
|
||||
"else:\n",
|
||||
" comment_str = '_{}_{}'.format(model_version, version_remark)\n",
|
||||
"writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Training\n",
|
||||
"best_loss = float('inf') # best test loss\n",
|
||||
"best_epoch = 0\n",
|
||||
"best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def train(epoch):\n",
|
||||
" fpnssd_net.train()\n",
|
||||
" train_loss = 0\n",
|
||||
"\n",
|
||||
" scheduler.step()\n",
|
||||
"\n",
|
||||
" if with_gen_data:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
"\n",
|
||||
" train_loss += loss.item()\n",
|
||||
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
|
||||
" else:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
"\n",
|
||||
" train_loss += loss.item()\n",
|
||||
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
|
||||
"\n",
|
||||
" # write to logger\n",
|
||||
" phase = 'train'\n",
|
||||
" writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
|
||||
"\n",
|
||||
"def test(epoch):\n",
|
||||
" fpnssd_net.eval()\n",
|
||||
" test_loss = 0\n",
|
||||
" with torch.no_grad():\n",
|
||||
"\n",
|
||||
" if with_gen_data:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" test_loss += loss.item()\n",
|
||||
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
|
||||
" else:\n",
|
||||
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
|
||||
" inputs = inputs.to(device)\n",
|
||||
" loc_targets = loc_targets.to(device)\n",
|
||||
" cls_targets = cls_targets.to(device)\n",
|
||||
"\n",
|
||||
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
|
||||
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
|
||||
" test_loss += loss.item()\n",
|
||||
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
|
||||
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
|
||||
"\n",
|
||||
" # write to logger\n",
|
||||
" phase = 'test'\n",
|
||||
" writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
|
||||
"\n",
|
||||
" # deep copy the model\n",
|
||||
" global best_loss\n",
|
||||
" global best_epoch\n",
|
||||
" test_loss /= len(testloader)\n",
|
||||
" if test_loss < best_loss and epoch > 5:\n",
|
||||
" # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
|
||||
" weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
|
||||
" torch.save(fpnssd_net.state_dict(), weights_path)\n",
|
||||
" best_epoch = epoch\n",
|
||||
" best_loss = test_loss"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for epoch in tqdm(range(num_epochs)):\n",
|
||||
" print('\\nEpoch: %d' % epoch)\n",
|
||||
" train(epoch)\n",
|
||||
" if epoch % 2 == 0:\n",
|
||||
" print('\\nTest')\n",
|
||||
" test(epoch)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# choose model filename\n",
|
||||
"weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
|
||||
"# Save only the model parameters\n",
|
||||
"torch.save(fpnssd_net.state_dict(), weights_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
### Compile and intall OpenGM with python wrapper in virtualenv
|
||||
|
||||
#### Online references
|
||||
- conda install guide:
|
||||
- https://groups.google.com/forum/#!searchin/opengm/nose%7Csort:relevance/opengm/Nte5Zpu9RL0/YSanK09kNwAJ
|
||||
- plain ubuntu install guides:
|
||||
- http://cvlab-dresden.de/HTML/people/bogdan/teaching/slides-script/ml2-ss15/installation-readme.txt
|
||||
- https://memoryaux.wordpress.com/2014/08/15/installing-opengm-with-python-wrapper/
|
||||
|
||||
#### Instructions (tested for Ubuntu 14.04)
|
||||
|
||||
clone source using
|
||||
`git clone https://github.com/opengm/opengm.git`
|
||||
|
||||
make build dir under opengm/
|
||||
|
||||
`makedir build/`
|
||||
|
||||
and enter build/
|
||||
|
||||
`cd build/`
|
||||
|
||||
using ccmake and try to configure with 'c'
|
||||
|
||||
`ccmake ../`
|
||||
|
||||
run ccmake again and select options
|
||||
|
||||
`ccmake ../`
|
||||
|
||||
|
||||
build:
|
||||
- command line ?
|
||||
- converter ?
|
||||
- docs ?
|
||||
- examples ? (requires external lib like cplex)
|
||||
- python docs ? (requires pip install sphinx and produces ugly outputs)
|
||||
- python wrapper
|
||||
- testing
|
||||
- tutorials
|
||||
|
||||
with:
|
||||
|
||||
- boost
|
||||
- hdf5
|
||||
|
||||
python:
|
||||
|
||||
- python exectuable: /home/USER/.virtualenvs/VNAME/bin
|
||||
- include dir: /home/USER/.virtualenvs/VNAME/include
|
||||
- include dir2: /home/USER/.virtualenvs/VNAME/include/python2.7
|
||||
- library: /usr/lib/x86_64-linux-gnu/libpython2.7.so
|
||||
(alternative is /home/USER/.virtualenvs/VNAME/lib/python2.7, but no *.so file here)
|
||||
- library debug: PYTHON_LIBRARY_DEBUG-NOTFOUND (default)
|
||||
- numpy include directory: /home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/numpy/core/include
|
||||
|
||||
*for some unkown reason* opengm python site-package is installed under `/usr/local/lib/python0./`
|
||||
therefore, better to skip make install and simply copy files by hand (see below)
|
||||
|
||||
|
||||
To build run (-j only if multicore system):
|
||||
|
||||
```
|
||||
make -j4
|
||||
make -j2 test
|
||||
make install
|
||||
|
||||
```
|
||||
|
||||
|
||||
simply copy it to `/home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/`
|
||||
|
||||
now test in python:
|
||||
`import opengm`
|
||||
|
||||
Hopefully things work :)
|
||||
Diferenças do arquivo suprimidas por serem muito extensas
Carregar Diff
@@ -0,0 +1,528 @@
|
||||
from scipy.spatial.distance import cdist, seuclidean, euclidean, squareform
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from timeit import default_timer as timer
|
||||
|
||||
import opengm
|
||||
|
||||
from ..utils.bbox_utils import bb_intersection_over_union
|
||||
|
||||
|
||||
class LineMatching1D(object):
|
||||
|
||||
def __init__(self, tl_line_rec, region_det, line_rec, line_pts, stats, scale=1.0, sign_hypos=None, param_dict=None):
|
||||
# create graphical model from fragement
|
||||
self.stats = stats
|
||||
self.scale = scale
|
||||
self.scaled_sign_height = stats.tblSignHeight * scale
|
||||
self.min_sign_dist = self.scaled_sign_height / 2. # distance between sign centers
|
||||
|
||||
self.tl_line_rec = tl_line_rec
|
||||
|
||||
# null hypothesis for signs in tl
|
||||
self.sign_hypos = sign_hypos
|
||||
|
||||
# detections contained in rectangluar area around respective alignments
|
||||
# [ID, cx, cy, score, x1, y1, x2, y2, idx]
|
||||
self.region_det = region_det
|
||||
|
||||
# init
|
||||
self.num_vars = len(self.tl_line_rec)
|
||||
self.num_relevant = 0
|
||||
self.max_cost = 1e10 # 1e11 # "inifinite" cost
|
||||
|
||||
# only continue, if there is a sign in line to match
|
||||
if self.num_vars > 0:
|
||||
|
||||
# compute num_lbls_per_var from detections
|
||||
ulbls, counts = np.unique(self.region_det[:, 0], return_counts=True)
|
||||
hypo_det_counts = np.array([counts[ulbls == item] if item in ulbls else 0 for item in self.tl_line_rec.lbl],
|
||||
dtype=int).squeeze()
|
||||
|
||||
self.tl_line_rec['det_count'] = hypo_det_counts
|
||||
# optional: remove vars without detections
|
||||
if False:
|
||||
# self.tl_line_rec = self.tl_line_rec[hypo_det_counts > 0]
|
||||
self.tl_line_rec = self.tl_line_rec.iloc[np.where(hypo_det_counts > 0)] # deal with scalar case of boolean indexing
|
||||
# hypo_det_counts = hypo_det_counts[hypo_det_counts > 0]
|
||||
|
||||
# only continue, at least a single matching detection
|
||||
self.num_relevant = np.sum(counts[np.isin(ulbls, self.tl_line_rec.lbl)])
|
||||
if self.num_relevant > 0:
|
||||
# update num_vars
|
||||
self.num_vars = len(self.tl_line_rec)
|
||||
# opengm setup
|
||||
self.num_lbls_per_var = max(counts[np.isin(ulbls, self.tl_line_rec.lbl)]) + 1 # + 1 outlier detection
|
||||
var_space = np.ones(self.num_vars) * self.num_lbls_per_var
|
||||
self.gm = opengm.gm(var_space)
|
||||
|
||||
# parameter setup
|
||||
if param_dict is not None:
|
||||
self.params = param_dict
|
||||
else:
|
||||
self.params = dict()
|
||||
|
||||
# extra settings
|
||||
self.params['outlier_cost'] = 10
|
||||
self.params['angle_long_range'] = True
|
||||
|
||||
# unary potentials
|
||||
self.params['lambda_score'] = 0.3
|
||||
self.params['sigma_score'] = 0.4
|
||||
|
||||
self.params['lambda_offset'] = 1 # currently offset used linearly without exp function
|
||||
self.params['sigma_offset'] = 1 # lambda & sigma have no influence!
|
||||
|
||||
# pairwise binary potentials
|
||||
self.params['lambda_p'] = 3 # 1
|
||||
self.params['sigma_p'] = 3
|
||||
|
||||
self.params['lambda_angle'] = 2
|
||||
self.params['sigma_angle'] = 0.6
|
||||
|
||||
self.params['lambda_iou'] = 2
|
||||
self.params['sigma_iou'] = 0.4
|
||||
|
||||
# OPTIONAL: strong penalties for long range connections
|
||||
if True:
|
||||
self.params['lr_lambda_angle'] = 0.05
|
||||
self.params['lr_sigma_angle'] = 0.1
|
||||
|
||||
self.params['lr_lambda_iou'] = 0.1
|
||||
self.params['lr_sigma_iou'] = 0.05
|
||||
else:
|
||||
self.params['lr_lambda_angle'] = self.params['lambda_angle']
|
||||
self.params['lr_sigma_angle'] = self.params['sigma_angle']
|
||||
|
||||
self.params['lr_lambda_iou'] = self.params['lambda_iou']
|
||||
self.params['lr_sigma_iou'] = self.params['sigma_iou']
|
||||
|
||||
# angle of hypothesis line
|
||||
self.b = line_pts[-1, :] - line_pts[0, :]
|
||||
# print 'hypo angle:', np.arctan2(self.b[1], self.b[0]) * (180 / np.pi), self.b
|
||||
|
||||
# offset
|
||||
self.Xb = line_pts[0, :].reshape(1, -1)
|
||||
|
||||
# define variance between line distance and sign distance - for seculidean and mahalanobis
|
||||
self.variance_p = np.array([1, 0.2], dtype=np.float) # [8, 1] [1, 1]
|
||||
|
||||
if False:
|
||||
print('#syms:', len(self.tl_line_rec), 'max#dets_per_sym:', self.num_lbls_per_var - 1,
|
||||
'relevant#dets:', self.num_relevant, 'total#dets:', self.region_det.shape[0])
|
||||
|
||||
# print(np.vstack([self.fm_hypo_df.lbl, hypo_det_counts])).astype(int)
|
||||
# print self.fm_hypo_df
|
||||
|
||||
# assemble potentials
|
||||
self.add_unary()
|
||||
self.add_pairwise()
|
||||
|
||||
def add_unary(self):
|
||||
|
||||
# for monitoring costs
|
||||
self.unary_score_ct = {}
|
||||
self.unary_offset_ct = {}
|
||||
self.unary_det_ct = {}
|
||||
# compute for later usage by alignment vector
|
||||
self.det_same_label_ct_idx = []
|
||||
|
||||
# assemble unary potentials
|
||||
for vidx, (tl_sign_idx, fm_sign) in enumerate(self.tl_line_rec.iterrows()):
|
||||
lbl = int(fm_sign.lbl)
|
||||
# ctr = [float(fm_sign.ctr_l), float(fm_sign.ctr_r)]
|
||||
|
||||
# print vidx, lbl, ctr
|
||||
# get boxes of certain lbl
|
||||
det_same_label = self.region_det[self.region_det[:, 0] == lbl]
|
||||
self.det_same_label_ct_idx.append(det_same_label[:, -1])
|
||||
# detection locations
|
||||
Xa = det_same_label[:, [1, 2]]
|
||||
|
||||
# incorporate score
|
||||
unary_vec = np.ones(self.num_lbls_per_var) * self.max_cost
|
||||
|
||||
U1, U2, U3 = [], [], []
|
||||
if det_same_label.shape[0] > 0:
|
||||
# compute partial cost (vectorized)
|
||||
U1 = 1 - det_same_label[:, 3]
|
||||
# since goal of matching is to incorporate low confidence detections,
|
||||
# linear contribution of score might be enough / otherwise penalize only low confidence below 0.01
|
||||
if True:
|
||||
U1 = self.params['lambda_score'] * (np.exp(U1 / self.params['sigma_score']) - 1)
|
||||
|
||||
# incorporate distance from hypothesis line
|
||||
# cx, cy
|
||||
U2 = np.zeros(len(U1))
|
||||
if False: # disabled in favour of null hypo offset
|
||||
if self.params['lambda_offset'] != 0:
|
||||
U2 = cdist(Xa, self.Xb, lambda u, v: np.linalg.norm(np.cross(u - v, self.b))
|
||||
/ np.linalg.norm(self.b)) / self.min_sign_dist
|
||||
# compute partial cost
|
||||
U2 = self.params['lambda_offset'] * (np.exp(U2.squeeze() / self.params['sigma_offset']) - 1)
|
||||
|
||||
# incorporate null hypothesis of signs
|
||||
U3 = np.zeros(len(U1))
|
||||
if self.params['lambda_offset'] != 0 and self.sign_hypos is not None:
|
||||
# sign hypo location
|
||||
X0 = self.sign_hypos[vidx, 0:2].reshape(1, -1)
|
||||
# get sign width and set variance
|
||||
if lbl in self.stats.sign_df.index:
|
||||
sign_width = self.stats.get_sign_width(lbl)
|
||||
else:
|
||||
sign_width = 1
|
||||
var = np.array([sign_width * 1, 1], dtype=np.float)
|
||||
# compute pairwise distance
|
||||
U3 = cdist(X0, Xa, metric='seuclidean', V=var) / self.min_sign_dist
|
||||
# U3 = self.params['lambda_offset'] * (np.exp(U3.squeeze() / self.params['sigma_offset']) - 1)
|
||||
# U3 = np.clip(U3, 0, 1e-5 * self.max_cost)
|
||||
|
||||
# sum up cost and insert into unary vector (only replace values if there a detections
|
||||
unary_vec[:len(U1)] = U1 + U2 + U3
|
||||
|
||||
# for outlier detection set specific unary cost
|
||||
unary_vec[-1] = self.params['outlier_cost']
|
||||
|
||||
# add function and factor
|
||||
func_id = self.gm.addFunction(unary_vec)
|
||||
self.gm.addFactor(func_id, vidx)
|
||||
|
||||
# for debugging
|
||||
# self.unary_score_ct.append(U1)
|
||||
# self.unary_offset_ct.append(U3)
|
||||
# self.unary_det_ct.append(det_same_label)
|
||||
self.unary_score_ct[vidx] = U1
|
||||
self.unary_offset_ct[vidx] = U3
|
||||
self.unary_det_ct[vidx] = det_same_label
|
||||
|
||||
def add_pairwise(self):
|
||||
# assemble pairwise potentials
|
||||
# Assumption: vars are in order of symbols in line
|
||||
# ATTENTION: ORDER of fm_hypo_lbls is important for pairwise potential generation!!!
|
||||
|
||||
self.pairwise_dist_ct = {}
|
||||
self.pairwise_angle_ct = {}
|
||||
self.pairwise_iou_ct = {}
|
||||
self.pairwise_long_range = {}
|
||||
for vidx in range(self.num_vars - 1):
|
||||
# setup basic matrix with maximum cost
|
||||
dist_mat = np.ones([self.num_lbls_per_var] * 2) * self.max_cost
|
||||
|
||||
sym_lt = self.tl_line_rec.lbl.iat[vidx]
|
||||
sym_rt = self.tl_line_rec.lbl.iat[vidx + 1]
|
||||
# get boxes according to labels
|
||||
# [ID, cx, cy, score, x1, y1, x2, y2]
|
||||
det_sym_lt = self.region_det[self.region_det[:, 0] == sym_lt]
|
||||
det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
|
||||
|
||||
# x2, cy
|
||||
# sym_lt_right_border = det_sym_lt[:,[6,2]]
|
||||
# x1, cy
|
||||
# sym_rt_left_border = det_sym_rt[:,[4,2]]
|
||||
|
||||
# cx, cy
|
||||
sym_lt_right_border = det_sym_lt[:, [1, 2]]
|
||||
sym_rt_left_border = det_sym_rt[:, [1, 2]]
|
||||
|
||||
# bboxes
|
||||
sym_lt_bboxes = det_sym_lt[:, 4:]
|
||||
sym_rt_bboxes = det_sym_rt[:, 4:]
|
||||
|
||||
# compute pairwise distances between detections of lt and rt sym
|
||||
|
||||
# 1) basic computation
|
||||
# X = cdist(sym_lt_right_border, sym_rt_left_border, metric='euclidean')
|
||||
X = cdist(sym_lt_right_border, sym_rt_left_border, metric='seuclidean', V=self.variance_p)
|
||||
# because vertical offset always depends on underlying rotation, mahalanobis should be used here
|
||||
# X = cdist(sym_lt_right_border, sym_rt_left_border, metric='mahalanobis', VI=self.VI)
|
||||
# reduce distances to normal scale and normalize with 10 * times sign_height
|
||||
X = ((X/self.scaled_sign_height) - 1)
|
||||
|
||||
inX = X.copy()
|
||||
# compute partial cost
|
||||
X = self.params['lambda_p'] * (np.exp(X / self.params['sigma_p']) - 1)
|
||||
|
||||
# 2) penalty for wrong side
|
||||
# if on wrong side, increase cost by factor 4 [is deprecated due to angle computation!!!]
|
||||
#X2 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: u[0] > v[0])
|
||||
#X[X2.astype(bool)] *= 5
|
||||
|
||||
# 3) penalize distance only in x-dimension
|
||||
# X8 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: v[0] - u[0])
|
||||
# X8 = self.params['lambda_p'] * np.exp((self.min_sign_dist - X8) / self.params['sigma_p'])
|
||||
|
||||
# incorporate angle
|
||||
# angle with x-axis: np.arctan((u[1]-v[1])/(u[0]-v[0]))
|
||||
# b=np.array([1,0])
|
||||
# angle between vectors less stable: acos(dot(v1, v2) / (norm(v1) * norm(v2)))
|
||||
# X3 = cdist(sym_lt_right_border, sym_rt_left_border,
|
||||
# lambda u,v: np.arccos(np.dot(v-u,b) / (np.linalg.norm(v-u) * np.linalg.norm(b))))/pi
|
||||
# angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
|
||||
X3 = cdist(sym_lt_right_border, sym_rt_left_border,
|
||||
lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)), np.dot(v - u, self.b))) / np.pi
|
||||
inX3 = X3.copy()
|
||||
# compute partial cost
|
||||
X3 = self.params['lambda_angle'] * (np.exp(X3 / self.params['sigma_angle']) - 1)
|
||||
|
||||
# incorporate IoU
|
||||
X4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
|
||||
lambda u, v: bb_intersection_over_union(u, v))
|
||||
inX4 = X4.copy()
|
||||
# compute partial cost
|
||||
X4 = self.params['lambda_iou'] * (np.exp(X4 / self.params['sigma_iou']) - 1)
|
||||
|
||||
# sum up cost and insert into dist_mat
|
||||
dist_mat[:X.shape[0], :X.shape[1]] = X + X3 + X4
|
||||
|
||||
# for outlier class set pairwise cost to 0
|
||||
dist_mat[-1, :] = 0
|
||||
dist_mat[:, -1] = 0
|
||||
|
||||
# avoid identity solutions
|
||||
if sym_lt == sym_rt:
|
||||
np.fill_diagonal(dist_mat, self.max_cost)
|
||||
|
||||
# add function and factor
|
||||
func_id = self.gm.addFunction(dist_mat)
|
||||
self.gm.addFactor(func_id, [vidx, vidx + 1])
|
||||
|
||||
# for debugging
|
||||
# self.pairwise_dist_ct[(vidx, vidx + 1)] = inX
|
||||
# self.pairwise_angle_ct[(vidx, vidx + 1)] = inX3
|
||||
# self.pairwise_iou_ct[(vidx, vidx + 1)] = inX4
|
||||
self.pairwise_dist_ct[(vidx, vidx + 1)] = X
|
||||
self.pairwise_angle_ct[(vidx, vidx + 1)] = X3
|
||||
self.pairwise_iou_ct[(vidx, vidx + 1)] = X4
|
||||
|
||||
# in the case of angles add pairwise potentials for all possible combinations
|
||||
if self.params['angle_long_range']:
|
||||
# add combinations on the right of var
|
||||
# not necessary to add combinations on the left of var due to symmetry
|
||||
for vidx_rt in range(vidx + 2, self.num_vars):
|
||||
sym_rt = self.tl_line_rec.lbl.iat[vidx_rt]
|
||||
# detections
|
||||
det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
|
||||
# cx, cy
|
||||
sym_rt_left_border = det_sym_rt[:, [1, 2]]
|
||||
# bboxes
|
||||
sym_rt_bboxes = det_sym_rt[:, 4:]
|
||||
# incorporate angle
|
||||
# angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
|
||||
XY3 = cdist(sym_lt_right_border, sym_rt_left_border,
|
||||
lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)),
|
||||
np.dot(v - u, self.b))) / np.pi
|
||||
# compute partial cost
|
||||
XY3 = self.params['lr_lambda_angle'] * (np.exp(XY3 / self.params['lr_sigma_angle']) - 1)
|
||||
|
||||
# incorporate iou
|
||||
XY4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
|
||||
lambda u, v: bb_intersection_over_union(u, v))
|
||||
|
||||
# compute partial cost
|
||||
XY4 = self.params['lr_lambda_iou'] * (np.exp(XY4 / self.params['lr_sigma_iou']) - 1)
|
||||
|
||||
# sum up cost and insert into dist_mat
|
||||
dist_mat[:XY3.shape[0], :XY3.shape[1]] = XY3 + XY4
|
||||
|
||||
# for outlier class set pairwise cost to 0
|
||||
dist_mat[-1, :] = 0
|
||||
dist_mat[:, -1] = 0
|
||||
|
||||
# avoid identity solutions
|
||||
if sym_lt == sym_rt:
|
||||
np.fill_diagonal(dist_mat, self.max_cost)
|
||||
|
||||
# add function and factor
|
||||
func_id = self.gm.addFunction(dist_mat)
|
||||
self.gm.addFactor(func_id, [vidx, vidx_rt])
|
||||
|
||||
# for debugging
|
||||
self.pairwise_long_range[(vidx, vidx_rt)] = XY3 + XY4 # XY3, XY4, XY3 + XY4
|
||||
|
||||
def run_inference(self):
|
||||
# only continue, if there is a sign/detection in line to match
|
||||
if len(self.tl_line_rec) > 0 and self.num_relevant > 0:
|
||||
|
||||
if False:
|
||||
# basic belief propagation (slower)
|
||||
bfprop = opengm.inference.BeliefPropagation(gm=self.gm)
|
||||
if True:
|
||||
# TRWS: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/pyTrws.cxx
|
||||
# default params: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/param/trws_external_param.hxx
|
||||
parameter = opengm.InfParam(steps=200)
|
||||
bfprop = opengm.inference.TrwsExternal(gm=self.gm, accumulator='minimizer', parameter=parameter)
|
||||
|
||||
#start = timer()
|
||||
bfprop.infer()
|
||||
#run_time = timer() - start
|
||||
#print('{}'.format(run_time))
|
||||
|
||||
# get and save labeling
|
||||
self.labeling = bfprop.arg()
|
||||
self.tl_line_rec['lbl_arg'] = bfprop.arg()
|
||||
|
||||
# get raw energy and check if inference failed
|
||||
self.raw_energy = self.gm.evaluate(bfprop.arg())
|
||||
self.inference_failed = self.raw_energy > self.num_vars * self.params['outlier_cost']
|
||||
|
||||
# get energy, normalize by num_vars * outlier_cost
|
||||
# worst case should be outliers only
|
||||
max_line_cost = self.num_vars * self.params['outlier_cost']
|
||||
# clip energy, because inference sometimes fails !?
|
||||
self.energy = min(self.raw_energy, max_line_cost) / float(max_line_cost)
|
||||
# attributes cost to individual assignments (selected detections) and normalize using outlier_cost
|
||||
self.tl_line_rec['nE'] = np.around(self.compute_labeling_energy() / self.params['outlier_cost'], decimals=2)
|
||||
|
||||
if self.inference_failed:
|
||||
# all outlier
|
||||
self.tl_line_rec['aligned_det_idx'] = -1
|
||||
self.tl_line_rec['region_det_idx'] = -1
|
||||
else:
|
||||
# compute actual alignments with respect to original detections indices
|
||||
self._compute_global_alignments()
|
||||
# compute alignments with respect to region detection indices
|
||||
self._compute_region_alignments()
|
||||
|
||||
def _compute_global_alignments(self):
|
||||
alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
|
||||
for i, lbl in enumerate(self.labeling):
|
||||
if lbl != (self.num_lbls_per_var - 1) and len(self.det_same_label_ct_idx[i]) > 0:
|
||||
alignments[i] = self.det_same_label_ct_idx[i][lbl]
|
||||
else:
|
||||
# outlier
|
||||
alignments[i] = -1
|
||||
# set values in dataframe
|
||||
self.tl_line_rec['aligned_det_idx'] = alignments.astype(int)
|
||||
|
||||
def _compute_region_alignments(self):
|
||||
alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
|
||||
for ii, global_det_idx in enumerate(self.tl_line_rec.aligned_det_idx.values):
|
||||
if global_det_idx != -1:
|
||||
# map global to region detection index
|
||||
alignments[ii] = np.where(self.region_det[:, -1] == global_det_idx)[0]
|
||||
else:
|
||||
# outlier detection
|
||||
alignments[ii] = -1
|
||||
# set values in dataframe
|
||||
self.tl_line_rec['region_det_idx'] = alignments.astype(int)
|
||||
|
||||
def get_region_alignments(self):
|
||||
# maybe I should also return the self.tl_line_rec.index
|
||||
# problem arises if self.tl_line_rec is changed inside LineMatching1D
|
||||
if 'region_det_idx' in self.tl_line_rec.columns:
|
||||
return self.tl_line_rec.region_det_idx.values
|
||||
else:
|
||||
return []
|
||||
|
||||
def visualize_matching(self, input_im, sign_hypos, ax=None):
|
||||
# only continue, if there is a sign in line to match
|
||||
if len(self.tl_line_rec) > 0:
|
||||
|
||||
# select detections using alignment index
|
||||
alignments = self.get_region_alignments()
|
||||
aligned = self.region_det[alignments[alignments >= 0], 1:3]
|
||||
|
||||
if ax is None:
|
||||
fig, ax = plt.subplots(figsize=(12, 8))
|
||||
# plot hypo
|
||||
ax.plot(sign_hypos[:, 0], sign_hypos[:, 1], '*b', markersize=10, label='null hypo')
|
||||
ax.plot(aligned[:, 0], aligned[:, 1], 'oy', markersize=8, label='gm aligned detections')
|
||||
# plot tablet
|
||||
ax.imshow(input_im, cmap=plt.cm.Greys_r)
|
||||
|
||||
# annotate
|
||||
for i, pos_idx in enumerate(self.tl_line_rec.iloc[alignments >= 0].pos_idx.values):
|
||||
ax.annotate(pos_idx, (aligned[i, 0], aligned[i, 1]), fontsize=15)
|
||||
|
||||
ax.legend(shadow=True, fancybox=True)
|
||||
ax.axis('off')
|
||||
# plt.show()
|
||||
|
||||
# energy marginal computation
|
||||
|
||||
def _get_unary_cost(self, unary_dict, vidx, didx):
|
||||
unary = unary_dict[vidx]
|
||||
if len(unary) > 0:
|
||||
return unary.flatten()[didx]
|
||||
else:
|
||||
# in cases there inference fails and labeling is out of bounds
|
||||
return self.max_cost
|
||||
|
||||
def _get_pairwise_val(self, pairwise_dict, idx0, idx1):
|
||||
outlier_lbl = self.num_lbls_per_var - 1
|
||||
pairwise = pairwise_dict[idx0, idx1]
|
||||
didx0 = self.labeling[idx0]
|
||||
didx1 = self.labeling[idx1]
|
||||
if didx0 != outlier_lbl and didx1 != outlier_lbl:
|
||||
if pairwise.size > 0:
|
||||
return pairwise[didx0, didx1]
|
||||
else:
|
||||
# in cases there inference fails and labeling is out of bounds
|
||||
return self.max_cost
|
||||
else:
|
||||
return 0
|
||||
|
||||
def _get_pairwise_cost(self, pairwise_dict, vidx):
|
||||
# deal with boundary cases
|
||||
if vidx == self.num_vars - 1:
|
||||
return self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
|
||||
elif vidx == 0:
|
||||
return self._get_pairwise_val(pairwise_dict, vidx, vidx + 1)
|
||||
else:
|
||||
return (self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
|
||||
+ self._get_pairwise_val(pairwise_dict, vidx, vidx + 1))
|
||||
|
||||
def _get_lr_pairwise_cost(self, lr_pairwise_dict, vidx):
|
||||
energy = 0
|
||||
for vidx_rt in range(vidx + 2, self.num_vars):
|
||||
energy += self._get_pairwise_val(lr_pairwise_dict, vidx, vidx_rt)
|
||||
return energy
|
||||
|
||||
def compute_unary_cost(self):
|
||||
list_unary = [self.unary_score_ct, self.unary_offset_ct]
|
||||
outlier_lbl = self.num_lbls_per_var - 1
|
||||
u_marginals = np.zeros_like(self.labeling, dtype=np.float)
|
||||
for vidx, didx in enumerate(self.labeling):
|
||||
if didx != outlier_lbl:
|
||||
for unary_dict in list_unary:
|
||||
u_marginals[vidx] += self._get_unary_cost(unary_dict, vidx, didx)
|
||||
else:
|
||||
u_marginals[vidx] += self.params['outlier_cost']
|
||||
|
||||
return u_marginals
|
||||
|
||||
def compute_pairwise_cost(self):
|
||||
list_pairwise = [self.pairwise_angle_ct, self.pairwise_dist_ct, self.pairwise_iou_ct]
|
||||
|
||||
p_marginals = np.zeros_like(self.labeling, dtype=np.float)
|
||||
if len(self.labeling) > 1: # only compute if there are any pairs
|
||||
for vidx, dvidx in enumerate(self.labeling):
|
||||
for pairwise_dict in list_pairwise:
|
||||
p_marginals[vidx] += self._get_pairwise_cost(pairwise_dict, vidx)
|
||||
return p_marginals
|
||||
|
||||
def compute_pairwise_cost_lr(self):
|
||||
lr_pairwise_dict = self.pairwise_long_range
|
||||
|
||||
p_marginals = np.zeros_like(self.labeling, dtype=np.float)
|
||||
if len(self.labeling) > 1: # only compute if there are any pairs
|
||||
for vidx, dvidx in enumerate(self.labeling):
|
||||
p_marginals[vidx] += self._get_lr_pairwise_cost(lr_pairwise_dict, vidx)
|
||||
return p_marginals
|
||||
|
||||
def compute_labeling_energy(self):
|
||||
# compute an energy vector that attributes cost to individual labels
|
||||
# if the output vector summed up, this equals the un-normalized energy
|
||||
|
||||
# deal with case when inference failed
|
||||
if self.inference_failed:
|
||||
|
||||
return self.max_cost
|
||||
else:
|
||||
u_marginals = self.compute_unary_cost()
|
||||
p_marginals = self.compute_pairwise_cost()
|
||||
plr_marginals = self.compute_pairwise_cost_lr()
|
||||
|
||||
return u_marginals + (p_marginals/2. + plr_marginals)
|
||||
|
||||
@@ -0,0 +1,464 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import math
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
from operator import itemgetter
|
||||
|
||||
from scipy.stats import norm
|
||||
from scipy import ndimage as ndi
|
||||
from scipy.spatial.distance import pdist, cdist, squareform
|
||||
|
||||
from LineFragment import LineFragment
|
||||
|
||||
|
||||
# LINES - TRANSLITERATION ALIGNMENT PROBLEM
|
||||
# associate lines with transliteration lines
|
||||
|
||||
|
||||
# OPTION 0) use line_models sorted by dist as basic alignment
|
||||
|
||||
def align_lines_tl_by_sort(line_hypos, tl_df):
|
||||
# use tl_line_indices
|
||||
tl_line_indices = tl_df.line_idx.unique()
|
||||
|
||||
# extend or cut if too short or long respectively
|
||||
diff_len = line_hypos.label.nunique() - len(tl_line_indices)
|
||||
if diff_len > 0:
|
||||
last_idx = tl_line_indices[-1] + 1
|
||||
tl_line_indices = np.concatenate([tl_line_indices, range(last_idx, last_idx + diff_len)])
|
||||
else:
|
||||
tl_line_indices = tl_line_indices[:line_hypos.label.nunique()]
|
||||
|
||||
# print tl_line_indices, line_hypos.groupby('label').mean().sort_values('dist').index
|
||||
|
||||
# find basic alignment by sorting (enumerate line models sorted according to dist)
|
||||
tl_line_assignment = pd.DataFrame({'tl_line': tl_line_indices, # np.arange(line_hypos.label.nunique())
|
||||
'hypo_line_lbl': line_hypos.groupby('label').mean().sort_values('dist').index})
|
||||
|
||||
# add tl_line column in line_hypos using join on line_hypos
|
||||
return line_hypos.join(tl_line_assignment.set_index('hypo_line_lbl'), on='label')
|
||||
|
||||
|
||||
# OPTION 1) use ground truth annotations as alignment
|
||||
# use gt line annotations (implicit update tl_line column in line_hypos)
|
||||
# (unreliable, because gt line annotations and transliteration are not necessarily aligned themselves!)
|
||||
|
||||
def align_lines_tl_by_ground_truth(line_hypos, tl_df):
|
||||
# update tl_line with gt_line_idx (set nan to -1)
|
||||
#line_hypos['tl_line'] = line_hypos['gt_line_idx'].fillna(-1)
|
||||
line_hypos = line_hypos.assign(tl_line=line_hypos['gt_line_idx'].fillna(-1))
|
||||
# if there are more gt_lines than tl_lines ...
|
||||
# gt_line_idx that are not in tl are replaced with -1
|
||||
not_tl_line_idx = ~line_hypos['tl_line'].isin(tl_df.line_idx.unique())
|
||||
line_hypos.loc[not_tl_line_idx, 'tl_line'] = -1
|
||||
return line_hypos
|
||||
|
||||
|
||||
# OPTION 2) adopted from GALE-CHURCH algorithm for sentence alignment
|
||||
# relies on line lengths only
|
||||
|
||||
norm_logsf = norm.logsf
|
||||
LOG2 = math.log(2)
|
||||
|
||||
AVERAGE_CHARACTERS = 1
|
||||
VARIANCE_CHARACTERS = 6.8
|
||||
|
||||
BEAD_COSTS = {(1, 1): 0, (2, 1): 1000, # (2, 1): 230
|
||||
(1, 2): 1000, (0, 1): 230,
|
||||
(1, 0): 230, (2, 2): 2000} # (1, 0): 450
|
||||
|
||||
# BEAD_COSTS = {(1, 1): 0, (2, 1): 230, (1, 2): 230, (0, 1): 450,
|
||||
# (1, 0): 450, (2, 2): 440}
|
||||
|
||||
|
||||
def length_cost(sx, sy, mean_xy, variance_xy):
|
||||
"""
|
||||
Code from https://github.com/alvations/gachalign:
|
||||
Calculate length cost given 2 sentence. Lower cost = higher prob.
|
||||
|
||||
The original Gale-Church (1993:pp. 81) paper considers l2/l1 = 1 hence:
|
||||
delta = (l2-l1*c)/math.sqrt(l1*s2)
|
||||
|
||||
If l2/l1 != 1 then the following should be considered:
|
||||
delta = (l2-l1*c)/math.sqrt((l1+l2*c)/2 * s2)
|
||||
substituting c = 1 and c = l2/l1, gives the original cost function.
|
||||
"""
|
||||
lx, ly = sum(sx), sum(sy)
|
||||
m = (lx + ly * mean_xy) / 2
|
||||
|
||||
try:
|
||||
delta = (lx - ly * mean_xy) / math.sqrt(m * variance_xy)
|
||||
except ZeroDivisionError:
|
||||
return float('-inf')
|
||||
|
||||
return - 100 * (LOG2 + norm_logsf(abs(delta)))
|
||||
|
||||
|
||||
def _align(x, y, mean_xy, variance_xy, bead_costs):
|
||||
"""
|
||||
The minimization function to choose the sentence pair with
|
||||
cheapest alignment cost.
|
||||
"""
|
||||
m = {}
|
||||
for i in range(len(x) + 1):
|
||||
for j in range(len(y) + 1):
|
||||
if i == j == 0:
|
||||
m[0, 0] = (0, 0, 0)
|
||||
else:
|
||||
m[i, j] = min((m[i - di, j - dj][0] + length_cost(x[i - di:i], y[j - dj:j], mean_xy, variance_xy)
|
||||
+ bead_cost, di, dj)
|
||||
for (di, dj), bead_cost in BEAD_COSTS.iteritems()
|
||||
if i - di >= 0 and j - dj >= 0)
|
||||
|
||||
i, j = len(x), len(y)
|
||||
while True:
|
||||
(c, di, dj) = m[i, j]
|
||||
if di == dj == 0:
|
||||
break
|
||||
yield (i - di, i), (j - dj, j)
|
||||
i -= di
|
||||
j -= dj
|
||||
|
||||
|
||||
def align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=3.0):
|
||||
# updates line_hypos with tl_line idx
|
||||
# actually uses line_hypos_agg
|
||||
|
||||
# get line lengths
|
||||
tl_line_len = tl_df.groupby('line_idx').mean().prior_line_len
|
||||
det_line_len = line_hypos.groupby('label').mean().sort_values('dist').accum
|
||||
# define input
|
||||
|
||||
cx = tl_line_len.values
|
||||
cy = det_line_len.values
|
||||
|
||||
# use detection line lengths to normalize (better range than tl lengths)
|
||||
max_char = int(cy.max())
|
||||
|
||||
# normalize
|
||||
cx /= cx.max()
|
||||
cx *= max_char
|
||||
#cy /= cy.max()
|
||||
#cy *= max_char
|
||||
bc = BEAD_COSTS
|
||||
|
||||
# iterate over aligned pairs
|
||||
for (i1, i2), (j1, j2) in reversed(list(_align(cx, cy, 1.0, variance_characters, bc))):
|
||||
# print (i1, i2), (j1, j2)
|
||||
# print (tl_line_len.index[i1:i2].values, det_line_len.index[j1:j2].values)
|
||||
# check if line_hypo exists
|
||||
if len(det_line_len.index[j1:j2].values) > 0:
|
||||
tl_line_idx = -1
|
||||
if len(tl_line_len.index[i1:i2].values) > 0:
|
||||
tl_line_idx = int(tl_line_len.index[i1:i2].values[0])
|
||||
# assign tl line idx to detected line
|
||||
line_hypos.loc[line_hypos.label.isin(det_line_len.index[j1:j2].values), 'tl_line'] = tl_line_idx
|
||||
# return cx, cy
|
||||
return line_hypos
|
||||
|
||||
|
||||
# OPTION 3) adopted from Bleualign algorithm for sentence alignment
|
||||
# relies on matching score between tl null hypothesis and sign detections (sign detector)
|
||||
# the problem is to align hypo_line_indices (detected lines) with tl_line_indices (transliteration lines)
|
||||
# all information required is contained in line fragment
|
||||
|
||||
# a) make sure that score_mat forms valid positive weights for edges in graph
|
||||
# b) get matching score matrix with shape=[len(hypo_line_indices), len(tl_line_indices)]
|
||||
# c) alignment consists of segments that are connected diagonally
|
||||
|
||||
|
||||
def compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag):
|
||||
# ransac score
|
||||
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_bleu_score(a_idx.squeeze(), b_idx.squeeze())) # 5/5, 4/1
|
||||
# score in range [0, 1], but order needs to be reversed
|
||||
score_mat = 1 - score_mat
|
||||
return score_mat
|
||||
|
||||
|
||||
def compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag):
|
||||
# ransac score
|
||||
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
|
||||
max_dist_thresh=2, dist_weight=1)) # 5/5, 4/1
|
||||
# score in range [0, 1], but order needs to be reversed
|
||||
score_mat = 1 - score_mat
|
||||
return score_mat
|
||||
|
||||
|
||||
def compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag):
|
||||
# line matching score
|
||||
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
|
||||
|
||||
# score in range [0, 1], but order needs to be reversed
|
||||
score_mat = 1 - score_mat
|
||||
return score_mat
|
||||
|
||||
|
||||
# use this if you want to implement your own similarity score
|
||||
def eval_sents_dummy(translist, targetlist, max_alternatives=3):
|
||||
scoredict = {}
|
||||
|
||||
for testID, testSent in enumerate(translist):
|
||||
scores = []
|
||||
|
||||
for refID, refSent in enumerate(targetlist):
|
||||
score = 100 - abs(len(testSent) - len(refSent)) # replace this with your own similarity score
|
||||
if score > 0:
|
||||
scores.append((score, refID, score))
|
||||
# sorted by first item in tuple (i.e. score)
|
||||
scoredict[testID] = sorted(scores, key=itemgetter(0), reverse=True)[:max_alternatives]
|
||||
|
||||
return scoredict
|
||||
|
||||
|
||||
# follow the backpointers in score matrix to extract best path of 1-to-1 alignments
|
||||
def extract_best_path(pointers):
|
||||
|
||||
i = len(pointers)-1
|
||||
j = len(pointers[0])-1
|
||||
pointer = ''
|
||||
best_path = []
|
||||
|
||||
while i >= 0 and j >= 0:
|
||||
pointer = pointers[i][j]
|
||||
if pointer == '^':
|
||||
i -= 1
|
||||
elif pointer == '<':
|
||||
j -= 1
|
||||
elif pointer == 'match':
|
||||
best_path.append((i, j))
|
||||
i -= 1
|
||||
j -= 1
|
||||
|
||||
best_path.reverse()
|
||||
return best_path
|
||||
|
||||
|
||||
# dynamic programming search for best path of alignments (maximal score)
|
||||
def pathfinder(translist, targetlist, scoremat): # scoredict
|
||||
|
||||
# add an extra row/column to the matrix and start filling it from 1,1 (to avoid exceptions for first row/column)
|
||||
matrix = [[0 for column in range(len(targetlist)+1)] for row in range(len(translist)+1)]
|
||||
pointers = [['' for column in range(len(targetlist))] for row in range(len(translist))]
|
||||
|
||||
for i in range(len(translist)):
|
||||
for j in range(len(targetlist)):
|
||||
|
||||
best_score = matrix[i][j+1]
|
||||
best_pointer = '^'
|
||||
|
||||
score = matrix[i+1][j]
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_pointer = '<'
|
||||
|
||||
#if np.abs(j - i) < 5: # distance from diagonal
|
||||
score = scoremat[i, j] + matrix[i][j]
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_pointer = 'match'
|
||||
|
||||
matrix[i+1][j+1] = best_score
|
||||
pointers[i][j] = best_pointer
|
||||
|
||||
bleualign = extract_best_path(pointers)
|
||||
return bleualign
|
||||
|
||||
|
||||
def align_lines_tl_by_score(line_hypos, line_frag, visualize=True):
|
||||
# alignment based on longest path through score mat (topological sort)
|
||||
assert 'tl_line' in line_hypos.columns, "tl_line needs to be set (e.g. use align by sort"
|
||||
|
||||
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
|
||||
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
|
||||
# print(hypo_line_indices, tl_line_indices)
|
||||
|
||||
align_opts = [1, 0, 0, 0, 0, 0] # align + ransac, most accurate, slow [NORMAL]
|
||||
#align_opts = [0, 0, 1, 0, 0, 0] # bleu + ransac, a little less accurate, fast [use with high number of detections]
|
||||
#align_opts = [0, 0, 0, 0, 0, 1] # bleu
|
||||
#align_opts = [0, 0, 0, 0, 1, 0] # ransac
|
||||
#align_opts = [0, 0, 0, 1, 0, 0] # align
|
||||
|
||||
assert(np.sum(align_opts) <= 1)
|
||||
|
||||
# prepare score mats
|
||||
if align_opts[0]:
|
||||
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
|
||||
compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['ransac', 'gm matching']
|
||||
multi_score = True
|
||||
if align_opts[1]:
|
||||
score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag),
|
||||
compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['bleu', 'gm matching'] # bleu
|
||||
multi_score = True
|
||||
if align_opts[2]:
|
||||
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
|
||||
compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['ransac', 'bleu']
|
||||
multi_score = True
|
||||
if align_opts[3]:
|
||||
score_mats = [compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['gm matching']
|
||||
multi_score = False
|
||||
if align_opts[4]:
|
||||
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['ransac']
|
||||
multi_score = False
|
||||
if align_opts[5]:
|
||||
score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
|
||||
title_strs = ['bleu']
|
||||
multi_score = False
|
||||
|
||||
|
||||
if visualize:
|
||||
# prepare plot
|
||||
fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # 15, 5
|
||||
ax = axes.ravel()
|
||||
|
||||
best_paths = []
|
||||
for i, score_mat in enumerate(score_mats):
|
||||
best_path = pathfinder(hypo_line_indices, tl_line_indices, score_mat)
|
||||
best_paths.append(best_path)
|
||||
path_pts = np.asarray(best_path)
|
||||
|
||||
if visualize:
|
||||
# plot score mats with shortest path
|
||||
if len(path_pts) > 0:
|
||||
ax[i].plot(path_pts[:, 1], path_pts[:, 0])
|
||||
ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
|
||||
ax[i].imshow(score_mat)
|
||||
ax[i].set_title(title_strs[i])
|
||||
|
||||
# compute joint
|
||||
if multi_score:
|
||||
path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
|
||||
# path_pts = np.asarray(best_paths[1]) # use gm_matching only
|
||||
else:
|
||||
path_pts = np.asarray(best_paths[0])
|
||||
|
||||
if visualize:
|
||||
# plot score mats with shortest path
|
||||
if len(path_pts) > 0:
|
||||
ax[2].plot(path_pts[:, 1], path_pts[:, 0])
|
||||
ax[2].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
|
||||
ax[2].imshow((score_mats[0] + score_mats[1]) / 2.)
|
||||
ax[2].set_title('joint')
|
||||
|
||||
if len(path_pts) > 0:
|
||||
# map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
|
||||
# hl_indices = hypo_line_indices[path_pts[:, 0]] # already equals index to dataframe
|
||||
tl_indices = tl_line_indices[path_pts[:, 1]]
|
||||
|
||||
# print tl_line_assignment, path_pts[:, 0], tl_indices
|
||||
|
||||
# create assignment table for join
|
||||
basic_index = line_frag.line_hypos.tl_line.sort_values().unique()
|
||||
tl_line_assignment = pd.DataFrame({'hypo_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
|
||||
tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
|
||||
# join line_hypos on tl_line
|
||||
line_hypos['tl_line'] = line_hypos.join(tl_line_assignment.set_index('hypo_tl_line'), on='tl_line')[
|
||||
'tl_line_update']
|
||||
|
||||
return line_hypos, path_pts
|
||||
|
||||
|
||||
#### full pipeline to solve the line-transliteration alignment problem ####
|
||||
|
||||
def compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment, segm_labels, stats, center_im, sign_detections,
|
||||
visualize=True, align_opt=[False, False, True]):
|
||||
|
||||
path_pts = None
|
||||
|
||||
# BASIC:
|
||||
# use line_models sorted by dist as basic alignment
|
||||
line_hypos = align_lines_tl_by_sort(line_hypos, tl_df)
|
||||
|
||||
# OPTION I:
|
||||
# find basic alignment using line lengths
|
||||
# apply Gale-Church algorithm (implicit update tl_line column in line_hypos)
|
||||
if align_opt[0]: # False
|
||||
line_hypos = align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=6.0)
|
||||
|
||||
# OPTION II:
|
||||
# use gt line annotations (implicit update tl_line column in line_hypos)
|
||||
if align_opt[1]: # False
|
||||
if len(gt_line_assignment) > 0:
|
||||
line_hypos = align_lines_tl_by_ground_truth(line_hypos, tl_df)
|
||||
|
||||
# OPTION III:
|
||||
# alignment based on longest path through score mat (topological sort)
|
||||
if align_opt[2]: # True
|
||||
# create line fragment (tl_line should be assigned before!)
|
||||
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
|
||||
# compute lines tl alignment based on score
|
||||
(line_hypos, path_pts) = align_lines_tl_by_score(line_hypos, line_frag, visualize=visualize)
|
||||
|
||||
return line_hypos, path_pts
|
||||
|
||||
|
||||
|
||||
## GT function
|
||||
|
||||
|
||||
def gt_align_lines_tl_by_ed(line_gt, visualize=True):
|
||||
# alignment based on longest path through score mat (topological sort)
|
||||
|
||||
# get assignment space (cartesian product of tl_line_indices and gt_line_indices)
|
||||
gt_line_indices, tl_line_indices = line_gt.get_alignment_space()
|
||||
|
||||
# prepare score mats
|
||||
score_mats = [compute_bleu_score_mat(gt_line_indices, tl_line_indices, line_gt)]
|
||||
title_strs = ['edit distance']
|
||||
multi_score = False
|
||||
|
||||
if visualize:
|
||||
# prepare plot
|
||||
fig, axes = plt.subplots(1, 1, figsize=(15, 5), squeeze=False) # 1,3
|
||||
ax = axes.ravel()
|
||||
|
||||
best_paths = []
|
||||
for i, score_mat in enumerate(score_mats):
|
||||
best_path = pathfinder(gt_line_indices, tl_line_indices, score_mat)
|
||||
best_paths.append(best_path)
|
||||
path_pts = np.asarray(best_path)
|
||||
|
||||
if visualize:
|
||||
# plot score mats with shortest path
|
||||
if len(path_pts) > 0:
|
||||
ax[i].plot(path_pts[:, 1], path_pts[:, 0])
|
||||
ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
|
||||
ax[i].imshow(score_mat)
|
||||
ax[i].set_title(title_strs[i])
|
||||
|
||||
# compute joint
|
||||
if multi_score:
|
||||
path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
|
||||
# path_pts = np.asarray(best_paths[1]) # use gm_matching only
|
||||
else:
|
||||
path_pts = np.asarray(best_paths[0])
|
||||
|
||||
if len(path_pts) > 0:
|
||||
# map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
|
||||
# gt_indices = gt_line_indices[path_pts[:, 0]] # already equals index to dataframe
|
||||
tl_indices = tl_line_indices[path_pts[:, 1]]
|
||||
# print tl_line_assignment, path_pts[:, 0], tl_indices
|
||||
|
||||
lines_df = line_gt.lines_df
|
||||
# create assignment table for join
|
||||
#basic_index = lines_df.tl_line.sort_values().unique() # this is not necessary, because gt_line_idx
|
||||
basic_index = lines_df.gt_line_idx.sort_values().unique()
|
||||
tl_line_assignment = pd.DataFrame({'gt_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
|
||||
tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
|
||||
# print tl_line_assignment, path_pts
|
||||
|
||||
# join line_hypos on tl_line
|
||||
line_gt.lines_df['tl_line'] = lines_df.join(tl_line_assignment.set_index('gt_tl_line'), on='gt_line_idx')['tl_line_update']
|
||||
|
||||
return line_gt, path_pts
|
||||
|
||||
|
||||
@@ -0,0 +1,289 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from tqdm import tqdm
|
||||
|
||||
from skimage.color import label2rgb
|
||||
|
||||
from ..transliteration.TransliterationSet import TransliterationSet
|
||||
from ..transliteration.SignsStats import SignsStats
|
||||
|
||||
from ..evaluations.sign_tl_evaluation import compute_accuracy
|
||||
from ..evaluations.line_tl_evaluation import eval_line_tl_alignment
|
||||
from ..evaluations.sign_evaluation_prep import get_pred_boxes_df, get_gt_boxes_df
|
||||
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
|
||||
from ..evaluations.sign_evaluation import eval_detector_on_collection
|
||||
from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
|
||||
|
||||
from ..alignment.line_tl_alignment import compute_line_tl_alignment
|
||||
from ..alignment.LineFragment import (LineFragment, compute_line_points, compute_line_polygon, plot_boxes)
|
||||
|
||||
from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector,
|
||||
post_process_line_detections, compute_image_label_map)
|
||||
from ..detection.detection_helpers import (visualize_net_output, radius_in_image, convert_detections_to_array,
|
||||
label_map2image, vis_detections, coord_in_image)
|
||||
#from ..detection.tablet_scale_estimation import print_scale_stats
|
||||
|
||||
from ..visualizations.line_visuals import (show_hough_transform_w_lines, show_line_segms, show_line_skeleton, show_probabilistic_hough)
|
||||
from ..visualizations.line_tl_visuals import show_lines_tl_alignment, show_score_mats_with_paths
|
||||
|
||||
|
||||
def gen_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version, re_transform,
|
||||
sign_model_version, model_fcn, device,
|
||||
generate_and_save, show_sign_alignments, collection_subfolder, train_data_ext_file, lbl_list,
|
||||
line_model_version='v007', use_precomp_lines=False, param_dict=None,
|
||||
show_line_matching=False, verbose=True):
|
||||
"""
|
||||
Generate tl-line pairs for seq model training. Store pairs in file.
|
||||
Additionally compute some useful filter criterion for generated pairs.
|
||||
"""
|
||||
|
||||
# config tl_line matching
|
||||
# 1: line length
|
||||
# 2: use gt line anno (if available)
|
||||
# 3: shortest path through score matrix
|
||||
align_opt = [False, False, True]
|
||||
visualize_tl_line_matching = show_line_matching
|
||||
|
||||
# setup evaluators
|
||||
use_new_eval = True
|
||||
num_classes = 240
|
||||
eval_ovthresh = 0.5
|
||||
eval_basic = SignEvalBasic(sign_model_version, saa_version, eval_ovthresh)
|
||||
eval_fast = SignEvalFast(sign_model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
|
||||
|
||||
# setup transliteration set
|
||||
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
|
||||
# setup sign statistics
|
||||
stats = SignsStats(tblSignHeight=128)
|
||||
|
||||
list_pred_boxes_df, list_gt_boxes_df = [], []
|
||||
acc_array = np.zeros(len(didx_list))
|
||||
naligned_array = np.zeros(len(didx_list))
|
||||
for didx in tqdm(didx_list, desc=saa_version):
|
||||
seg_im, seg_idx = dataset[didx]
|
||||
# access meta
|
||||
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
|
||||
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
|
||||
print(didx, image_name, view_desc)
|
||||
|
||||
# load transliteration dataframe
|
||||
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=verbose)
|
||||
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
|
||||
#print(float(len_min) / len_max, num_vis_lines)
|
||||
|
||||
# boxes file
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
res_path = "{}results/results_ssd/{}/{}".format(relative_path, sign_model_version, saa_version)
|
||||
boxes_file = "{}/{}_all_boxes.npy".format(res_path, res_name)
|
||||
|
||||
# load detections
|
||||
all_boxes = np.load(boxes_file)
|
||||
sign_detections = convert_detections_to_array(all_boxes)
|
||||
|
||||
# load and prepare annotations of segment
|
||||
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
|
||||
with_star_crop=False) # depends on sign_detections!
|
||||
if verbose:
|
||||
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
|
||||
|
||||
# make seg image is large enough for line detector
|
||||
if seg_im.size[0] > 224 and seg_im.size[1] > 224:
|
||||
|
||||
if use_precomp_lines:
|
||||
# to numpy
|
||||
center_im = np.asarray(seg_im)
|
||||
# lbl_ind
|
||||
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
|
||||
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
|
||||
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
|
||||
lbl_ind_x = np.load(lines_file).astype(int)
|
||||
else:
|
||||
# prepare input
|
||||
inputs = preprocess_line_input(seg_im, 1, shift=0)
|
||||
center_im = re_transform(inputs[4]) # to pil image
|
||||
center_im = np.asarray(center_im) # to numpy
|
||||
# apply network
|
||||
#print(inputs.shape)
|
||||
output = apply_detector(inputs, model_fcn, device)
|
||||
# visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
|
||||
# plt.show()
|
||||
|
||||
# prepare output
|
||||
outprob = np.mean(output, axis=0)
|
||||
lbl_ind = np.argmax(outprob, axis=0)
|
||||
|
||||
lbl_ind_x = lbl_ind.copy()
|
||||
lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0 # line detector dependent (VIP) # outprob.squeeze() # this fixes a bug!
|
||||
|
||||
lbl_ind_80 = lbl_ind.copy()
|
||||
lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0 # outprob.squeeze() # this fixes a bug!
|
||||
|
||||
# only continue if there is a positive line detection
|
||||
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
|
||||
if np.any(lbl_ind_x):
|
||||
|
||||
# for line detection apply postprocessing pipeline
|
||||
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
|
||||
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max, verbose=verbose)
|
||||
|
||||
if len(line_segs) > 0:
|
||||
# compute overlay
|
||||
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
|
||||
image_label_overlay = label2rgb(seg_canvas, image=center_im)
|
||||
|
||||
# using line annotations: gt_line_idx for hypo_lines
|
||||
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx, line_hypos.groupby('label').mean())
|
||||
|
||||
if len(gt_line_assignment) > 0:
|
||||
# clean join on line_hypos
|
||||
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
|
||||
## clean join on line_hypos_agg
|
||||
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
|
||||
|
||||
if len(tl_df) > 0:
|
||||
|
||||
# abort if obvious transliteration / lines mismatch
|
||||
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
|
||||
print("CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
|
||||
continue
|
||||
|
||||
#### line-transliteration alignment problem ####
|
||||
|
||||
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
|
||||
segm_labels, stats, center_im, sign_detections,
|
||||
visualize=visualize_tl_line_matching,
|
||||
align_opt=align_opt)
|
||||
|
||||
# FINISH lines-tl alignment
|
||||
|
||||
# create line fragment (tl_line should be assigned before?!)
|
||||
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
|
||||
# get assigned tl indices
|
||||
assigned_tl_indices = line_frag.get_assigned_lines_idx()
|
||||
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
|
||||
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
|
||||
|
||||
# evaluate line-tl alignment using gt-line annotations; only quality indicator because unreliable
|
||||
if len(gt_line_assignment) > 0 and verbose:
|
||||
eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines)
|
||||
|
||||
# common colormap
|
||||
# color = plt.cm.jet(np.linspace(0,1,len(angles)))
|
||||
cmap = plt.get_cmap('nipy_spectral')
|
||||
color = cmap(np.linspace(0, 1, len(line_hypos)))
|
||||
|
||||
# estimate scale
|
||||
if False:
|
||||
if len(tl_df) == 0:
|
||||
# use line detection estimates
|
||||
num_lines = line_hypos.label.nunique()
|
||||
len_max = line_hypos.groupby('label').mean().accum.max() / dist_interline_median
|
||||
|
||||
# get scales using different approaches
|
||||
# use num_lines for scale estimation (NOT num_vis_lines!)
|
||||
print_scale_stats(seg_rec, scale, lbl_ind_x, lbl_ind_80, num_lines, len_max,
|
||||
line_hypos, dist_interline_median)
|
||||
|
||||
if False:
|
||||
show_line_skeleton(lbl_ind_x, skeleton)
|
||||
plt.show()
|
||||
|
||||
if False:
|
||||
show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color)
|
||||
|
||||
if len(line_segs) > 0:
|
||||
if False:
|
||||
show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color)
|
||||
|
||||
if False:
|
||||
show_line_segms(image_label_overlay, segm_labels)
|
||||
|
||||
if len(tl_df) > 0:
|
||||
if False:
|
||||
show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color)
|
||||
|
||||
if False:
|
||||
show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag)
|
||||
|
||||
if True:
|
||||
|
||||
if show_sign_alignments:
|
||||
aligned_list, tablet_tl_df = line_frag.tab_visualize_gm_alignments(refined=True) # refined=True, does not help/hurt
|
||||
else:
|
||||
refined = False
|
||||
if param_dict is not None:
|
||||
if 'refined' in param_dict:
|
||||
refined = param_dict['refined']
|
||||
aligned_list, tablet_tl_df = line_frag.tab_get_gm_alignments(refined=refined,
|
||||
param_dict=param_dict) # refined=True, does not help/hurt
|
||||
if len(gt_boxes) > 0:
|
||||
|
||||
if use_new_eval:
|
||||
if len(aligned_list) > 0:
|
||||
all_boxes = [[el] for el in aligned_list]
|
||||
if False:
|
||||
# standard mAP eval
|
||||
eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
|
||||
# fast evaluation
|
||||
eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
|
||||
# get segment statistics of current segment [-1]
|
||||
num_tp, num_fp, _, acc, mean_ap, global_ap = eval_fast.get_seg_summary(-1)
|
||||
# save acc to array
|
||||
acc_array[didx_list.index(didx)] = acc
|
||||
# save naligned to array
|
||||
naligned_array[didx_list.index(didx)] = num_tp + num_fp
|
||||
else:
|
||||
# prepare full collection evaluation
|
||||
list_pred_boxes_df.append(get_pred_boxes_df([[el] for el in aligned_list], seg_idx))
|
||||
list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
|
||||
|
||||
# get num aligned across all classes
|
||||
naligned = np.sum([len(el) for i, el in enumerate(aligned_list) if i > 0])
|
||||
|
||||
if verbose and len(aligned_list) > 0:
|
||||
# [METHOD B]: evaluate mAP and print stats for a single segment
|
||||
# (these results can strongly differ from collection-wise evaluation)
|
||||
acc, df_stats = compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=True)
|
||||
# save acc to array
|
||||
acc_array[didx_list.index(didx)] = acc
|
||||
|
||||
ntfpos = df_stats.tp.sum() + df_stats.fp.sum()
|
||||
# print ntfpos, naligned
|
||||
|
||||
# save naligned to array
|
||||
naligned_array[didx_list.index(didx)] = ntfpos # naligned
|
||||
|
||||
if generate_and_save:
|
||||
line_frag.tab_generate_training_data(collection_subfolder, train_data_ext_file,
|
||||
image_name, image_path, scale, seg_idx, seg_bbox,
|
||||
tablet_tl_df, lbl_list, append=True)
|
||||
|
||||
else:
|
||||
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
|
||||
|
||||
else:
|
||||
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
|
||||
|
||||
# make plots appear
|
||||
plt.show()
|
||||
|
||||
# full collection eval
|
||||
acc = 0
|
||||
df_stats = []
|
||||
if use_new_eval:
|
||||
eval_fast.prepare_eval_collection()
|
||||
df_stats, global_ap = eval_fast.eval_collection(verbose=verbose)
|
||||
num_tp, num_fp, num_fp_global, acc = eval_fast.get_col_summary()
|
||||
else:
|
||||
if len(list_gt_boxes_df) > 0:
|
||||
# [METHOD C]: compute mAP across all instances of individual classes
|
||||
# (these results can strongly differ from segment-wise evaluation)
|
||||
gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
|
||||
pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
|
||||
acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None) # set fixed!
|
||||
return acc, df_stats # acc_array, naligned_array
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,187 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from tqdm import tqdm
|
||||
|
||||
from skimage.color import label2rgb
|
||||
|
||||
from ..transliteration.TransliterationSet import TransliterationSet
|
||||
from ..transliteration.SignsStats import SignsStats
|
||||
|
||||
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
|
||||
|
||||
from ..alignment.line_tl_alignment import compute_line_tl_alignment
|
||||
from ..alignment.LineFragment import LineFragment, plot_boxes
|
||||
|
||||
from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
|
||||
|
||||
from ..utils.bbox_utils import convert_bbox_global2local, box_iou
|
||||
from ..utils.nms import nms
|
||||
|
||||
|
||||
def convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale):
|
||||
""" Maintains data frame index in last column """
|
||||
list_detections = []
|
||||
for anno_idx, anno_rec in seg_gen_annos.iterrows():
|
||||
# [ID, cx, cy, score, x1, y1, x2, y2, idx]
|
||||
temp = np.zeros(9)
|
||||
box = np.array(relative_bboxes[anno_idx]) * scale
|
||||
temp[0] = anno_rec.newLabel
|
||||
temp[1] = (box[2] + box[0]) / 2
|
||||
temp[2] = (box[3] + box[1]) / 2
|
||||
temp[3] = anno_rec.det_score
|
||||
temp[4:8] = box[0:4]
|
||||
temp[8] = anno_idx
|
||||
list_detections.append(temp)
|
||||
# stack
|
||||
detections_arr = np.vstack(list_detections)
|
||||
return detections_arr
|
||||
|
||||
|
||||
def gen_cond_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, anno_df, relative_path, saa_version,
|
||||
collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
|
||||
min_dets_inline=2, ncompl_thresh=20, smooth_y=True, max_dist_det=3,
|
||||
line_model_version='v007', visualize_hypos=False):
|
||||
|
||||
# setup transliteration set
|
||||
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
|
||||
# setup sign statistics
|
||||
stats = SignsStats(tblSignHeight=128)
|
||||
|
||||
# for seg_im, seg_idx in dataset:
|
||||
for didx in tqdm(didx_list, desc=saa_version):
|
||||
seg_im, seg_idx = dataset[didx]
|
||||
# access meta
|
||||
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
|
||||
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
# load transliteration dataframe
|
||||
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
|
||||
|
||||
if len(tl_df) > 0: # only continue if transliteration is available
|
||||
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
|
||||
print(float(len_min) / len_max, num_vis_lines)
|
||||
|
||||
# boxes file
|
||||
|
||||
# select generated annos per segment
|
||||
seg_gen_annos = anno_df[anno_df.seg_idx == seg_idx]
|
||||
|
||||
if False:
|
||||
# control completeness filter (redundant - additional filter inside create conditional hypos)
|
||||
filter_nms = False
|
||||
compl_thresh = -1 # 0, 2, 3, 4, 5, 6 disable: -1
|
||||
ncompl_thresh = -1 # 10, 15, 20 disable: -1
|
||||
# filter using nms
|
||||
if filter_nms:
|
||||
seg_gen_annos = seg_gen_annos[seg_gen_annos.nms_keep]
|
||||
if compl_thresh > -1:
|
||||
# filter using compl
|
||||
seg_gen_annos = seg_gen_annos[seg_gen_annos.compl > compl_thresh]
|
||||
if ncompl_thresh > -1:
|
||||
# filter using compl
|
||||
seg_gen_annos = seg_gen_annos[seg_gen_annos.ncompl > ncompl_thresh]
|
||||
|
||||
if len(seg_gen_annos) > 0:
|
||||
|
||||
# convert to all boxes
|
||||
relative_bboxes = seg_gen_annos.bbox.apply(lambda x: convert_bbox_global2local(x, list(seg_bbox)))
|
||||
sign_detections = convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale)
|
||||
|
||||
# load and prepare annotations of segment
|
||||
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
|
||||
with_star_crop=False) # depends on sign_detections!
|
||||
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
|
||||
|
||||
# make seg image is large enough for line detector
|
||||
if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
|
||||
|
||||
# prepare input
|
||||
# to numpy
|
||||
center_im = np.asarray(seg_im)
|
||||
# lbl_ind
|
||||
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
|
||||
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
|
||||
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
|
||||
lbl_ind_x = np.load(lines_file).astype(int)
|
||||
|
||||
# only continue if there is a positive line detection
|
||||
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
|
||||
if np.any(lbl_ind_x):
|
||||
|
||||
# for line detection apply postprocessing pipeline
|
||||
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
|
||||
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
|
||||
|
||||
if len(line_segs) > 0:
|
||||
# compute overlay
|
||||
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
|
||||
image_label_overlay = label2rgb(seg_canvas, image=center_im)
|
||||
|
||||
# using line annotations: gt_line_idx for hypo_lines
|
||||
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
|
||||
line_hypos.groupby('label').mean())
|
||||
|
||||
if len(gt_line_assignment) > 0:
|
||||
# clean join on line_hypos
|
||||
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
|
||||
## clean join on line_hypos_agg
|
||||
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
|
||||
|
||||
if len(tl_df) > 0:
|
||||
|
||||
# abort if obvious transliteration / lines mismatch
|
||||
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
|
||||
print(
|
||||
"CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
|
||||
continue
|
||||
|
||||
#### line-transliteration alignment problem ####
|
||||
# for train use: align_opt=[False, True, False] (use line annos)
|
||||
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
|
||||
segm_labels, stats, center_im, sign_detections,
|
||||
visualize=False,
|
||||
align_opt=[False, False, True]) # CHANGE HERE
|
||||
|
||||
# FINISH lines-tl alignment
|
||||
|
||||
# create line fragment (tl_line should be assigned before?!)
|
||||
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
|
||||
# get assigned tl indices
|
||||
assigned_tl_indices = line_frag.get_assigned_lines_idx()
|
||||
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
|
||||
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
|
||||
|
||||
if visualize_hypos:
|
||||
# generate conditional hypo
|
||||
(tab_t_hypos, tab_t_anno_idx,
|
||||
tab_t_meta) = line_frag.tab_create_conditional_hypo_alignments(anno_df=anno_df,
|
||||
min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
|
||||
smooth_y=smooth_y, max_dist_det=max_dist_det)
|
||||
if len(tab_t_hypos) > 0:
|
||||
if False:
|
||||
# filter using nms
|
||||
nms_th = 0.6
|
||||
keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
|
||||
tab_t_hypos = tab_t_hypos[keep]
|
||||
# visualize
|
||||
plot_boxes(tab_t_hypos[:, 4:8])
|
||||
plt.imshow(line_frag.input_im, cmap='gray')
|
||||
|
||||
# save to test
|
||||
if generate_and_save:
|
||||
line_frag.tab_generate_cond_hypo_training_data(collection_subfolder, train_data_ext_file,
|
||||
image_name, image_path, scale, seg_idx, seg_bbox,
|
||||
lbl_list, append=True, anno_df=anno_df,
|
||||
min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
|
||||
smooth_y=smooth_y, max_dist_det=max_dist_det)
|
||||
else:
|
||||
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
|
||||
else:
|
||||
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
|
||||
else:
|
||||
print('No detections for {}[{}]!'.format(image_name, seg_idx))
|
||||
|
||||
plt.show()
|
||||
|
||||
@@ -0,0 +1,136 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from tqdm import tqdm
|
||||
|
||||
from skimage.color import label2rgb
|
||||
|
||||
from ..transliteration.TransliterationSet import TransliterationSet
|
||||
from ..transliteration.SignsStats import SignsStats
|
||||
|
||||
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
|
||||
|
||||
from ..alignment.line_tl_alignment import compute_line_tl_alignment
|
||||
from ..alignment.LineFragment import LineFragment, plot_boxes
|
||||
|
||||
from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
|
||||
|
||||
|
||||
def gen_null_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version,
|
||||
collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
|
||||
line_model_version='v007', visualize_hypos=False):
|
||||
|
||||
# setup transliteration set
|
||||
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
|
||||
# setup sign statistics
|
||||
stats = SignsStats(tblSignHeight=128)
|
||||
|
||||
# for seg_im, seg_idx in dataset:
|
||||
for didx in tqdm(didx_list, desc=saa_version):
|
||||
seg_im, seg_idx = dataset[didx]
|
||||
# access meta
|
||||
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
|
||||
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
# load transliteration dataframe
|
||||
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
|
||||
|
||||
if len(tl_df) > 0: # only continue if transliteration is available
|
||||
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
|
||||
print(float(len_min) / len_max, num_vis_lines)
|
||||
|
||||
# load and prepare annotations of segment
|
||||
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
|
||||
with_star_crop=False) # depends on sign_detections!
|
||||
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
|
||||
|
||||
sign_detections = None
|
||||
|
||||
# make seg image is large enough for line detector
|
||||
if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
|
||||
|
||||
# prepare input
|
||||
# to numpy
|
||||
center_im = np.asarray(seg_im)
|
||||
# lbl_ind
|
||||
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
|
||||
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
|
||||
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
|
||||
lbl_ind_x = np.load(lines_file).astype(int)
|
||||
|
||||
# only continue if there is a positive line detection
|
||||
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
|
||||
if np.any(lbl_ind_x):
|
||||
|
||||
# for line detection apply postprocessing pipeline
|
||||
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
|
||||
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
|
||||
|
||||
if len(line_segs) > 0:
|
||||
# compute overlay
|
||||
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
|
||||
image_label_overlay = label2rgb(seg_canvas, image=center_im)
|
||||
|
||||
# using line annotations: gt_line_idx for hypo_lines
|
||||
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
|
||||
line_hypos.groupby('label').mean())
|
||||
|
||||
if len(gt_line_assignment) > 0:
|
||||
# clean join on line_hypos
|
||||
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
|
||||
## clean join on line_hypos_agg
|
||||
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
|
||||
|
||||
if len(tl_df) > 0:
|
||||
|
||||
# abort if obvious transliteration / lines mismatch
|
||||
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
|
||||
print(
|
||||
"CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
|
||||
continue
|
||||
|
||||
#### line-transliteration alignment problem ####
|
||||
# for train use: align_opt=[False, True, False] (use line annos)
|
||||
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
|
||||
segm_labels, stats, center_im, sign_detections,
|
||||
visualize=False,
|
||||
align_opt=[True, False, False]) # CHANGE HERE
|
||||
|
||||
# FINISH lines-tl alignment
|
||||
|
||||
# create line fragment (tl_line should be assigned before?!)
|
||||
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
|
||||
# get assigned tl indices
|
||||
assigned_tl_indices = line_frag.get_assigned_lines_idx()
|
||||
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
|
||||
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
|
||||
|
||||
if visualize_hypos:
|
||||
# generate conditional hypo
|
||||
tab_t_hypos = line_frag.tab_create_null_hypo_alignments()
|
||||
if len(tab_t_hypos) > 0:
|
||||
if False:
|
||||
# filter using nms
|
||||
nms_th = 0.6
|
||||
keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
|
||||
tab_t_hypos = tab_t_hypos[keep]
|
||||
# visualize
|
||||
plot_boxes(tab_t_hypos[:, 4:8])
|
||||
plt.imshow(line_frag.input_im, cmap='gray')
|
||||
|
||||
# save to test
|
||||
if generate_and_save:
|
||||
line_frag.tab_generate_null_hypo_training_data(collection_subfolder,
|
||||
train_data_ext_file,
|
||||
image_name, image_path, scale, seg_idx,
|
||||
seg_bbox,
|
||||
lbl_list, append=True)
|
||||
else:
|
||||
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
|
||||
else:
|
||||
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
|
||||
|
||||
# print plot
|
||||
plt.show()
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
### Dataset classes
|
||||
|
||||
- `lines_dataset.py`: used for line segmentation training
|
||||
- `cunei_dataset.py` : used for sign classification training
|
||||
- `cunei_dataset_ssd.py` : used for sign detector training
|
||||
- `cunei_dataset_segments.py` : used for evaluation on full tablet image (image + bbox annotations)
|
||||
- `segments_dataset.py` : used for evaluation on full tablet image (image only)
|
||||
@@ -0,0 +1,237 @@
|
||||
import pandas as pd
|
||||
from future.utils import iteritems
|
||||
from tqdm import tqdm
|
||||
from ast import literal_eval
|
||||
|
||||
from PIL import Image
|
||||
import torch.utils.data as data
|
||||
|
||||
from ..utils.bbox_utils import *
|
||||
from ..utils.transform_utils import crop_pil_image, spatial_sample
|
||||
|
||||
DEBUG_MODE = False
|
||||
|
||||
|
||||
class CuneiformCollection(data.Dataset):
|
||||
|
||||
def __init__(self, params, transform=None, target_transform=None, relative_path='../', split='train', top_k=-1, top_k_pick=-1, pad_to_square=True):
|
||||
|
||||
self.gray_mean = params['gray_mean']
|
||||
self.context_pad = params['context_pad']
|
||||
if 'test' in split:
|
||||
self.context_pad = 0 # no padding needed
|
||||
self.num_classes = params['num_classes']
|
||||
self.min_align_ratio = 0.6
|
||||
if 'min_align_ratio' in params:
|
||||
self.min_align_ratio = params['min_align_ratio']
|
||||
|
||||
# transforms for data preparation
|
||||
self.transform = transform
|
||||
self.target_transform = target_transform
|
||||
self.pad_to_square = pad_to_square
|
||||
|
||||
self.compl_thresh, self.ncompl_thresh = -1, -1
|
||||
if 'compl_thresh' in params:
|
||||
self.compl_thresh = params['compl_thresh']
|
||||
if 'ncompl_thresh' in params:
|
||||
self.ncompl_thresh = params['ncompl_thresh']
|
||||
|
||||
# load annotations
|
||||
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, split)
|
||||
meta_df = pd.read_csv(annotation_file, engine='python') # read annotation file
|
||||
|
||||
# additional annos (investigate impact of additional train data)
|
||||
if 'train' in split and 'extra_collections' in params:
|
||||
list_annos = [meta_df]
|
||||
for collection in params['extra_collections']:
|
||||
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
|
||||
anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
|
||||
list_annos.append(anno_df)
|
||||
meta_df = pd.concat(list_annos, ignore_index=True)
|
||||
|
||||
# add missing columns to meta_df
|
||||
nd_bbox = np.array(meta_df['bbox'].apply(literal_eval).tolist()) # convert to ndarray
|
||||
meta_df['x1'] = nd_bbox[:, 0]
|
||||
meta_df['y1'] = nd_bbox[:, 1]
|
||||
meta_df['x2'] = nd_bbox[:, 2]
|
||||
meta_df['y2'] = nd_bbox[:, 3]
|
||||
meta_df['imageName'] = meta_df['tablet_CDLI'] + '.jpg'
|
||||
meta_df['image_path'] = '{}data/images/'.format(relative_path) + meta_df['collection'] \
|
||||
+ '/' + meta_df['imageName']
|
||||
|
||||
### load and prepare gen_df
|
||||
# append with gen alignments
|
||||
gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
|
||||
'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
|
||||
'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
|
||||
|
||||
# segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
|
||||
|
||||
collections_ext = [split]
|
||||
|
||||
if 'train' in split:
|
||||
# OPT I : use csv file that contains list of generated boxes
|
||||
if 'gen_file' in params:
|
||||
gen_df = pd.read_csv(params['gen_file'], engine='python', header=None, delimiter=', ', names=gen_cols) # delimiter might need to be removed?!
|
||||
# OPT II : load csv files for collection specific collections and concatenate
|
||||
elif 'gen_collections' in params:
|
||||
assert params['gen_folder'] is not None, 'When using gen_collections, user needs to provide gen_model!'
|
||||
df_list = []
|
||||
for gen_coll in params['gen_collections']:
|
||||
gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path,
|
||||
params['gen_folder'], gen_coll)
|
||||
gen_df = pd.read_csv(gen_file_path, delimiter=',\s*', engine='python', header=None, names=gen_cols) # delimiter=', ', delimiter=',\s*',
|
||||
df_list.append(gen_df)
|
||||
gen_df = pd.concat(df_list, ignore_index=True)
|
||||
# prepare gen_df
|
||||
if ('gen_file' in params) or ('gen_collections' in params):
|
||||
|
||||
# IMPORTANT: filter gen data according to align ratio
|
||||
gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
|
||||
|
||||
# IMPORTANT: fill nan values in a way that avoids filtering
|
||||
gen_df.compl = gen_df.compl.fillna(50)
|
||||
gen_df.ncompl = gen_df.ncompl.fillna(100)
|
||||
|
||||
num_before_filter = len(gen_df)
|
||||
if self.compl_thresh > -1:
|
||||
# filter using compl
|
||||
gen_df = gen_df[gen_df.compl > self.compl_thresh] # 0, 2, 4, 5
|
||||
print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh,
|
||||
num_before_filter - len(gen_df),
|
||||
len(gen_df)))
|
||||
elif self.ncompl_thresh > -1:
|
||||
# filter using compl
|
||||
gen_df = gen_df[gen_df.ncompl > self.ncompl_thresh] # 0, 2, 4, 5
|
||||
print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh,
|
||||
num_before_filter - len(gen_df),
|
||||
len(gen_df)))
|
||||
print('class sample count stats: ')
|
||||
print(gen_df.train_label.value_counts().describe())
|
||||
|
||||
# add/update additional columns
|
||||
gen_df['collection'] = gen_df.folder.str.split('/').str[0]
|
||||
gen_df['generated'] = True
|
||||
gen_df['imageName'] = gen_df['imageName'].astype(str) + '.jpg'
|
||||
|
||||
# identify all collections with generated annotations
|
||||
list_gen_collection = gen_df.collection.unique().tolist()
|
||||
collections_ext += list_gen_collection
|
||||
|
||||
# concatenate
|
||||
meta_df = pd.concat([meta_df, gen_df], ignore_index=True)
|
||||
|
||||
# drop outlier classes for now (dirty fix)
|
||||
class_outlier_select = meta_df.train_label < 240
|
||||
if np.any(class_outlier_select):
|
||||
print('Drop {} outlier samples!'.format(np.sum(~class_outlier_select)))
|
||||
meta_df = meta_df[class_outlier_select]
|
||||
|
||||
# reset index
|
||||
self.meta_df = meta_df.reset_index(drop=True)
|
||||
|
||||
# make sure there is width and height
|
||||
self.meta_df['width'] = self.meta_df['x2'] - self.meta_df['x1'] + 1
|
||||
self.meta_df['height'] = self.meta_df['y2'] - self.meta_df['y1'] + 1
|
||||
|
||||
# only keep top 100 classes
|
||||
if top_k > 0:
|
||||
top_labels = self.meta_df.label.value_counts()[:top_k].index.values
|
||||
top_select = self.meta_df.label.isin(top_labels)
|
||||
self.meta_df = self.meta_df[top_select].reset_index()
|
||||
if top_k > top_k_pick >= 0:
|
||||
print(top_labels)
|
||||
print('Only select samples from class {}'.format(top_labels[top_k_pick]))
|
||||
class_select = self.meta_df.label == top_labels[top_k_pick]
|
||||
self.meta_df = self.meta_df[class_select].reset_index(drop=True)
|
||||
|
||||
# all annotations are used
|
||||
self.osd_valid_ind = self.meta_df.index
|
||||
|
||||
# crop pre-processing
|
||||
# save longest side of each sign
|
||||
self.meta_df['square'] = self.meta_df[['width', 'height']].max(axis=1)
|
||||
# for each tablet compute median of longest side, and assign it to each sign
|
||||
median_table = self.meta_df[self.meta_df.train_label > 0].groupby('imageName')[['square']].median()
|
||||
self.meta_df = self.meta_df.join(median_table, on='imageName', rsuffix='_md')
|
||||
# self.meta_df['square_new'] = self.meta_df[['square', 'square_md']].max(axis=1)
|
||||
|
||||
# pre-load all images
|
||||
self.use_preload = True
|
||||
if self.use_preload:
|
||||
map = {key: value for (key, value) in enumerate(self.meta_df['image_path'][self.osd_valid_ind].unique())}
|
||||
inv_map = {value: key for key, value in iteritems(map)} # use items
|
||||
self.meta_df['mem_idx'] = self.meta_df['image_path'].replace(inv_map)
|
||||
self.image_data_list = []
|
||||
for key, impath in tqdm(iteritems(map), total=len(map)):
|
||||
im_ref = None
|
||||
try:
|
||||
im_ref = Image.open(impath)
|
||||
except IOError:
|
||||
print('could not read image: {}'.format(impath))
|
||||
# due to memory constraints not .convert('RGB')
|
||||
im_ref = im_ref.convert('L')
|
||||
self.image_data_list.append(im_ref)
|
||||
|
||||
# setup finished
|
||||
print("Setup {} dataset spanning {} collections.".format(split, collections_ext))
|
||||
num_segs = self.meta_df['image_path'].nunique()
|
||||
print("Select {} bboxes from {} tablets.".format(len(self), num_segs))
|
||||
|
||||
def __getitem__(self, index):
|
||||
|
||||
# map index to csv index
|
||||
csv_idx = self.osd_valid_ind[index]
|
||||
|
||||
impath = self.meta_df.iloc[csv_idx]['image_path']
|
||||
target = self.meta_df.iloc[csv_idx]['train_label']
|
||||
|
||||
square = self.meta_df.iloc[csv_idx]['square']
|
||||
square_md = self.meta_df.iloc[csv_idx]['square_md']
|
||||
|
||||
# load image data
|
||||
if self.use_preload:
|
||||
mem_idx = self.meta_df.iloc[csv_idx]['mem_idx']
|
||||
im_ref = self.image_data_list[mem_idx]
|
||||
else:
|
||||
im_ref = None
|
||||
try:
|
||||
im_ref = Image.open(impath).convert('L') # due to memory constraints not .convert('RGB')
|
||||
except IOError:
|
||||
print('could not read image: {}'.format(impath))
|
||||
|
||||
# bounding box meta
|
||||
bb = [self.meta_df.iloc[csv_idx]['x1'], self.meta_df.iloc[csv_idx]['y1'],
|
||||
self.meta_df.iloc[csv_idx]['x2'], self.meta_df.iloc[csv_idx]['y2']]
|
||||
|
||||
# context crop
|
||||
context_pad = self.context_pad # int(square * self.context_pad) #
|
||||
|
||||
if self.pad_to_square:
|
||||
# if background, context_pad = 0
|
||||
if target == 0:
|
||||
context_pad = 0
|
||||
# if largest side of bbox is smaller than median of tablet, add additional context pad
|
||||
elif square_md > square:
|
||||
context_pad += (square_md - square) / 2. # divide by 2, because w,h of im_pad grow by 2 * context_pad
|
||||
|
||||
# new fast
|
||||
im, bb_pad = crop_pil_image(im_ref, bb, context_pad=context_pad, pad_to_square=self.pad_to_square)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im = self.transform(im)
|
||||
|
||||
if self.target_transform is not None:
|
||||
target = self.target_transform(target)
|
||||
|
||||
return im, target
|
||||
|
||||
def __len__(self):
|
||||
return len(self.osd_valid_ind)
|
||||
|
||||
|
||||
def test(params, split='train', top_k=-1, top_k_pick=-1, pad_to_square=True, relative_path='../../'):
|
||||
dataset = CuneiformCollection(params, relative_path=relative_path, split=split, top_k=top_k, top_k_pick=top_k_pick, pad_to_square=pad_to_square)
|
||||
|
||||
return dataset
|
||||
@@ -0,0 +1,334 @@
|
||||
import torch
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from PIL import Image
|
||||
from ast import literal_eval
|
||||
import os.path
|
||||
from tqdm import tqdm
|
||||
|
||||
import torch.utils.data as data
|
||||
|
||||
from ..detection.sign_detection import crop_segment_from_tablet_im
|
||||
|
||||
# from utils.cython_bbox import bbox_overlaps
|
||||
from ..utils.bbox_utils import clip_boxes
|
||||
from ..utils.torchcv.transforms.crop_box import crop_box
|
||||
from ..utils.torchcv.transforms.resize import resize
|
||||
|
||||
|
||||
# helper functions
|
||||
|
||||
def convert_bbox_global2local(gbbox, seg_bbox):
|
||||
x, y = seg_bbox[:2]
|
||||
relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
|
||||
return relative_bbox.tolist()
|
||||
|
||||
|
||||
def get_segment_meta(segment_rec):
|
||||
image_name = segment_rec.tablet_CDLI
|
||||
|
||||
# this should control which scale is used in consecutive processing
|
||||
scale = segment_rec.scale #* self.rescale
|
||||
|
||||
seg_bbox = segment_rec.bbox
|
||||
path_to_image = segment_rec.im_path
|
||||
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
|
||||
|
||||
return image_name, scale, seg_bbox, path_to_image, view_desc
|
||||
|
||||
|
||||
def bbox_ctr_overlaps(boxes1, boxes2):
|
||||
# check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
|
||||
overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
|
||||
for ii, box in enumerate(boxes1):
|
||||
x, y, x2, y2 = box
|
||||
# check if center is still inside tile_box, otherwise ignore box
|
||||
# if center is not inside tile box,
|
||||
# not possible to get IoU >= 0.5 --> treated as background anyways
|
||||
center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
|
||||
& (center[:, 1] >= y) & (center[:, 1] <= y2)
|
||||
overlaps_mat[ii, :] = mask
|
||||
return overlaps_mat
|
||||
|
||||
|
||||
# Cuneiform SSD dataset
|
||||
|
||||
|
||||
class CuneiformSegments(data.Dataset):
|
||||
|
||||
def __init__(self, collections=['train'], transform=None, relative_path='../',
|
||||
only_annotated=True, only_assigned=True, preload_segments=True, use_gray_scale=True):
|
||||
|
||||
# merge multiple data sources in order to provide following function:
|
||||
# f(idx) -> image, boxes, labels
|
||||
# uses gt annotations only
|
||||
# if no annotations available boxes and labels are empty lists
|
||||
|
||||
# transforms for data preparation
|
||||
self.transform = transform
|
||||
self.preload_segments = preload_segments
|
||||
self.use_gray_scale = use_gray_scale
|
||||
|
||||
### load and prepare list_sign_anno_df
|
||||
# manual annotation files may be based on multiple collections
|
||||
# for each collection
|
||||
# store in list_sign_anno_df
|
||||
|
||||
# load bbox annotations
|
||||
list_anno_collections = []
|
||||
sign_anno_df_list = []
|
||||
for collection in collections:
|
||||
# load sign annotations
|
||||
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
|
||||
# ATTENTION: only use gt annotations if collection is provided in collections parameter
|
||||
if os.path.exists(annotation_file):
|
||||
sign_anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
|
||||
# add additional columns
|
||||
sign_anno_df['generated'] = False
|
||||
sign_anno_df['global_segm_idx'] = -1
|
||||
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
|
||||
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array) # convert to ndarray
|
||||
|
||||
# slice sign_anno_df if there are multiple different collections contained
|
||||
for sub_collection in sign_anno_df.collection.unique():
|
||||
# store collection name
|
||||
list_anno_collections.append(sub_collection)
|
||||
# store collection specific slice of data frame
|
||||
sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
|
||||
sign_anno_df_list.append(sub_sign_anno_df)
|
||||
|
||||
### extend collections
|
||||
# create list of elementary collections
|
||||
collections_ext = np.unique(list_anno_collections).tolist()
|
||||
|
||||
###################
|
||||
# II) on collection level: load annotations and meta data
|
||||
|
||||
### load segment, sign meta information
|
||||
# for each collection
|
||||
# store in segments_df_list
|
||||
|
||||
# reduced set of columns - only keep what is needed and maintained
|
||||
segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
|
||||
|
||||
segments_df_list = []
|
||||
#sign_anno_df_list = []
|
||||
for collection in collections_ext:
|
||||
|
||||
# load segment metadata
|
||||
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
|
||||
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
|
||||
# convert string of list to list
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
|
||||
# add collection column
|
||||
file_names = tablet_segments_df['tablet_CDLI'] + '.jpg'
|
||||
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + tablet_segments_df['collection'] + '/' + file_names
|
||||
# get assigned segment (can be edited from outside without harm)
|
||||
if only_assigned:
|
||||
assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
|
||||
else:
|
||||
assigned_segments_df = tablet_segments_df
|
||||
|
||||
# collect data frames in lists
|
||||
segments_df_list.append(assigned_segments_df[segments_df_columns])
|
||||
|
||||
|
||||
### assemble ssd_segments_df with new index
|
||||
# search all segments with annotations
|
||||
|
||||
list_segments_df_anno = []
|
||||
for collection in collections_ext:
|
||||
coll_idx = collections_ext.index(collection)
|
||||
|
||||
list_segm_indices = []
|
||||
# get all segment indices for this collection that contain annotations
|
||||
if only_annotated:
|
||||
# if there are gt annotations
|
||||
if collection in list_anno_collections:
|
||||
anno_coll_idx = list_anno_collections.index(collection)
|
||||
# if there are gt annotations
|
||||
if len(sign_anno_df_list[anno_coll_idx]) > 0:
|
||||
# load their indices
|
||||
segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
|
||||
# filter annotations without assigned segment
|
||||
segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
|
||||
list_segm_indices.append(segm_indices_anno)
|
||||
# append only segments with anno
|
||||
if len(list_segm_indices) > 0:
|
||||
# stack to obtain list of segment indices with annotations
|
||||
segm_indices = np.unique(np.hstack(list_segm_indices))
|
||||
# append
|
||||
list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
|
||||
else:
|
||||
# append all segments from collection
|
||||
list_segments_df_anno.append(segments_df_list[coll_idx])
|
||||
|
||||
# create new datasets ssd_segment_df
|
||||
# concat dataframes and use reset_index to create column with old indices
|
||||
ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
|
||||
|
||||
# rename column to segm_idx
|
||||
ssd_segments_df.columns.values[0] = 'segm_idx'
|
||||
|
||||
|
||||
###################
|
||||
# III) on segment level: load data and prepare dataset index
|
||||
|
||||
### assemble ssd_sign_anno_df and update ssd_segments_df
|
||||
# additional column for ssd_sign_anno_df: global_segm_idx
|
||||
# additional column for ssd_segments_df: with num_anno
|
||||
|
||||
sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
|
||||
'generated', 'relative_bbox', 'global_segm_idx']
|
||||
# segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
|
||||
list_ssd_sign_anno_df = []
|
||||
|
||||
list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
|
||||
list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
|
||||
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
segm_idx = seg_rec.segm_idx
|
||||
collection = seg_rec.collection
|
||||
# print(image_name, view_desc, segm_idx)
|
||||
coll_idx = collections_ext.index(collection)
|
||||
|
||||
### if annotations available for segment, append to list
|
||||
if collection in list_anno_collections:
|
||||
anno_coll_idx = list_anno_collections.index(collection)
|
||||
if len(sign_anno_df_list[anno_coll_idx]) > 0:
|
||||
sign_anno_df = sign_anno_df_list[anno_coll_idx]
|
||||
# select sign annos for segment
|
||||
segm_select = sign_anno_df.segm_idx == segm_idx
|
||||
if len(sign_anno_df[segm_select]) > 0:
|
||||
# update data frame column
|
||||
sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
|
||||
# collect information
|
||||
sign_anno_seg = sign_anno_df[segm_select]
|
||||
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
|
||||
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
|
||||
|
||||
# add columns to ssd_segments_df
|
||||
ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
|
||||
|
||||
if len(list_ssd_sign_anno_df) > 0:
|
||||
# assemble ssd_sign_anno_df (drop old index)
|
||||
ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
|
||||
else:
|
||||
# create empty data frame with correct columns
|
||||
ssd_sign_anno_df = pd.DataFrame(columns=sign_anno_df_cols)
|
||||
|
||||
###################
|
||||
# IV) Preload: line detections and segment images
|
||||
|
||||
### preload segment images
|
||||
# crop segment and convert to gray scale
|
||||
# IMPORTANT: preload segment crops (without scaling, because memory)
|
||||
|
||||
image_data_list = []
|
||||
if self.preload_segments:
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
# load composite image
|
||||
pil_im = Image.open(image_path)
|
||||
# crop segment
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
# convert to gray scale and store in list
|
||||
if self.use_gray_scale:
|
||||
# convert to gray scale
|
||||
image_data_list.append(tablet_seg.convert('L'))
|
||||
else:
|
||||
image_data_list.append(tablet_seg)
|
||||
|
||||
|
||||
###################
|
||||
# VI) Dataset index
|
||||
|
||||
sample2tile_list = ssd_segments_df.index.values
|
||||
|
||||
###################
|
||||
# attach resulting data structures to class
|
||||
self.collections = collections
|
||||
self.collections_ext = collections_ext
|
||||
|
||||
self.ssd_segments_df = ssd_segments_df
|
||||
self.ssd_sign_anno_df = ssd_sign_anno_df
|
||||
|
||||
self.image_data_list = image_data_list
|
||||
|
||||
# self.sign_anno_df_list = sign_anno_df_list
|
||||
# self.segments_df_list = segments_df_list
|
||||
|
||||
self.sample2tile_list = sample2tile_list
|
||||
|
||||
# map from seg idx to dataset idx
|
||||
self.sidx2didx = dict(zip(ssd_segments_df.segm_idx.values, range(len(ssd_segments_df))))
|
||||
|
||||
# setup finished
|
||||
print("Setup dataset spanning {} collections with {} annotations [{} segments, {} indices]".format(
|
||||
len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df), len(sample2tile_list)))
|
||||
|
||||
def __getitem__(self, index):
|
||||
# get segment
|
||||
global_seg_idx = self.sample2tile_list[index]
|
||||
seg_rec = self.ssd_segments_df.loc[global_seg_idx]
|
||||
|
||||
# load segment meta data
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
|
||||
# get sign annos
|
||||
select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
|
||||
segm_annos = self.ssd_sign_anno_df[select_segm]
|
||||
|
||||
# get annotated boxes and their labels
|
||||
if len(segm_annos) > 0:
|
||||
seg_boxes = np.stack(segm_annos.relative_bbox)
|
||||
labels = segm_annos.train_label.values
|
||||
# convert to torch tensors
|
||||
seg_boxes = torch.from_numpy(seg_boxes).float()
|
||||
labels = torch.from_numpy(labels)
|
||||
else:
|
||||
seg_boxes = None
|
||||
labels = None
|
||||
|
||||
# get segment image
|
||||
if self.preload_segments:
|
||||
pil_im = self.image_data_list[global_seg_idx]
|
||||
else:
|
||||
# load composite image
|
||||
pil_im = Image.open(image_path)
|
||||
# crop segment
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
if self.use_gray_scale:
|
||||
# convert to gray scale
|
||||
pil_im = tablet_seg.convert('L')
|
||||
else:
|
||||
pil_im = tablet_seg
|
||||
|
||||
# tensor functions adapted from kuangliu's code
|
||||
# https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
|
||||
|
||||
# scale segment
|
||||
im, boxes = resize(pil_im, seg_boxes, None, scale=scale)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im, boxes, labels = self.transform(im, boxes, labels)
|
||||
|
||||
return im, boxes, labels
|
||||
|
||||
def get_seg_rec(self, index):
|
||||
# get segment
|
||||
global_seg_idx = self.sample2tile_list[index]
|
||||
return self.ssd_segments_df.loc[global_seg_idx]
|
||||
|
||||
def __len__(self):
|
||||
return len(self.sample2tile_list)
|
||||
|
||||
@@ -0,0 +1,696 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from PIL import Image
|
||||
from ast import literal_eval
|
||||
import os.path
|
||||
from tqdm import tqdm
|
||||
|
||||
import torch.utils.data as data
|
||||
|
||||
from ..detection.sign_detection import *
|
||||
|
||||
# from utils.cython_bbox import bbox_overlaps
|
||||
from ..utils.bbox_utils import clip_boxes
|
||||
from ..utils.transform_utils import convert2binaryPIL
|
||||
from ..utils.torchcv.transforms.crop_box import crop_box
|
||||
from ..utils.torchcv.transforms.resize import resize
|
||||
from ..utils.torchcv.transforms_lm.crop_box import crop_box_lm
|
||||
from ..utils.torchcv.transforms_lm.resize import resize_lm
|
||||
|
||||
from .lines_dataset import collect_line_coords, create_line_trafo
|
||||
|
||||
from ..detection.line_detection import compute_image_label_map
|
||||
|
||||
|
||||
# helper functions
|
||||
|
||||
|
||||
def convert_bbox_global2local(gbbox, seg_bbox):
|
||||
x, y = seg_bbox[:2]
|
||||
relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
|
||||
return relative_bbox.tolist()
|
||||
|
||||
|
||||
def get_segment_meta(segment_rec):
|
||||
image_name = segment_rec.tablet_CDLI
|
||||
|
||||
# this should control which scale is used in consecutive processing
|
||||
scale = segment_rec.scale #* self.rescale
|
||||
|
||||
seg_bbox = segment_rec.bbox
|
||||
path_to_image = segment_rec.im_path
|
||||
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
|
||||
|
||||
return image_name, scale, seg_bbox, path_to_image, view_desc
|
||||
|
||||
|
||||
def compute_tiles(imw, imh, scale, tile_shape=[600, 600], border_sz=100, w_step_sz=300, h_step_sz=400):
|
||||
# TODO: improve using linespace and allow overlap to vary
|
||||
|
||||
# signs height should be around 130px, however, length can be up to 300px
|
||||
# -> overlap along lines (300px) should be larger than between lines (200px)
|
||||
# -> this means for step sizes: w_step_sz < h_step_sz
|
||||
inv_scale = 1. / scale
|
||||
tile_shape = np.array(tile_shape) * inv_scale
|
||||
border_sz *= inv_scale
|
||||
w_step_sz *= inv_scale
|
||||
h_step_sz *= inv_scale
|
||||
|
||||
tile_ol_w = tile_shape[0] - w_step_sz
|
||||
tile_ol_h = tile_shape[0] - h_step_sz
|
||||
w_list = np.arange(border_sz, imw - border_sz - tile_ol_w, step=w_step_sz)
|
||||
h_list = np.arange(border_sz, imh - border_sz - tile_ol_h, step=h_step_sz)
|
||||
|
||||
# grid pts represent upper left corner of tile box
|
||||
# tiles can be larger than image and need to be padded
|
||||
XX, YY = np.meshgrid(w_list, h_list)
|
||||
|
||||
# compute bboxes
|
||||
ul_corner = np.rint(np.stack([XX.ravel(), YY.ravel()], axis=1)).astype(int)
|
||||
lr_corner = ul_corner + np.rint(tile_shape)
|
||||
bboxes = np.hstack([ul_corner, lr_corner])
|
||||
# make sure tiles inside image boundaries
|
||||
bboxes = clip_boxes(bboxes, [imh, imw]) # [imh, imw] is correct order for this function
|
||||
|
||||
return bboxes, XX, YY
|
||||
|
||||
|
||||
def bbox_ctr_overlaps(boxes1, boxes2):
|
||||
# check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
|
||||
overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
|
||||
for ii, box in enumerate(boxes1):
|
||||
x, y, x2, y2 = box
|
||||
# check if center is still inside tile_box, otherwise ignore box
|
||||
# if center is not inside tile box,
|
||||
# not possible to get IoU >= 0.5 --> treated as background anyways
|
||||
center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
|
||||
& (center[:, 1] >= y) & (center[:, 1] <= y2)
|
||||
overlaps_mat[ii, :] = mask
|
||||
return overlaps_mat
|
||||
|
||||
|
||||
# Cuneiform SSD dataset
|
||||
|
||||
|
||||
class CuneiformSSD(data.Dataset):
|
||||
|
||||
def __init__(self, collections=['train'], gen_file_path=None, gen_collections=[], gen_folder=None, transform=None,
|
||||
relative_path='../', use_balanced_idx=True, tile_shape=[600, 600], use_linemaps=False,
|
||||
remove_empty_tiles=False, min_align_ratio=0.6, filter_nms=False, compl_thresh=-1, ncompl_thresh=-1,
|
||||
num_top_ncompl=0, min_ncompl_thresh=10):
|
||||
|
||||
# merge multiple data sources in order to form a single dataset that can be used for SSD style detector training
|
||||
# provides following function:
|
||||
# f(idx) -> image, bboxes, labels
|
||||
# or more general:
|
||||
# f(idx) -> image, bboxes, labels, line_map
|
||||
|
||||
# join multiple levels of supervision: three cases for sign annotations
|
||||
# 1) tablets completely annotated (no need to load line annotations nor line detections)
|
||||
# 2) tablets partly annotated and line annotations available (no need to load line detections)
|
||||
# 3) tablets partly annotated and line detections required
|
||||
|
||||
# transforms for data preparation
|
||||
self.transform = transform
|
||||
self.line_model_version = None
|
||||
self.use_linemaps = use_linemaps
|
||||
self.min_align_ratio = min_align_ratio
|
||||
self.filter_nms = filter_nms
|
||||
self.compl_thresh = compl_thresh
|
||||
self.ncompl_thresh = ncompl_thresh
|
||||
self.num_top_ncompl = num_top_ncompl
|
||||
self.min_ncompl_thresh = min_ncompl_thresh
|
||||
|
||||
line_model_version = 'v007'
|
||||
num_classes = 240
|
||||
|
||||
###################
|
||||
# I) load generated and manual annotations
|
||||
|
||||
### load and prepare gen_df
|
||||
# generated annotations may be based on multiple collections
|
||||
|
||||
gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
|
||||
'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
|
||||
'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
|
||||
|
||||
# OPT I : use csv file that contains list of generated boxes
|
||||
if gen_file_path:
|
||||
gen_file_path = "{}results{}".format(relative_path, gen_file_path)
|
||||
gen_df = pd.read_csv(gen_file_path, engine='python', header=None, names=gen_cols)
|
||||
# OPT II : load csv files for collection specific collections and concatenate
|
||||
elif len(gen_collections) > 0:
|
||||
assert gen_folder is not None, 'When using gen_collections, user needs to provide gen_model!'
|
||||
df_list = []
|
||||
for gen_coll in gen_collections:
|
||||
gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path, gen_folder, gen_coll)
|
||||
# special delimiter because of legacy support, thanks to regex possible to support new and old formats
|
||||
gen_df = pd.read_csv(gen_file_path, engine='python', delimiter=',\s*', header=None, names=gen_cols) #delimiter=', ',
|
||||
df_list.append(gen_df)
|
||||
gen_df = pd.concat(df_list, ignore_index=True)
|
||||
|
||||
# prepare gen_df
|
||||
list_gen_collection = []
|
||||
if gen_file_path or (len(gen_collections) > 0):
|
||||
|
||||
num_before_filter = len(gen_df)
|
||||
# IMPORTANT: filter gen data according to align ratio
|
||||
gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
|
||||
print('Align Ratio {} :: Removed {} samples. [{}]'.format(self.min_align_ratio, num_before_filter - len(gen_df), len(gen_df)))
|
||||
num_before_filter = len(gen_df)
|
||||
# only keep inlier classes [0-240] (only required when using null hypos)
|
||||
gen_df = gen_df[gen_df.train_label < num_classes]
|
||||
print('Class Range {} :: Removed {} samples. [{}]'.format(num_classes, num_before_filter - len(gen_df), len(gen_df)))
|
||||
|
||||
# IMPORTANT: fill nan values in a way that avoids filtering
|
||||
gen_df.nms_keep = gen_df.nms_keep.fillna(1).astype(bool)
|
||||
gen_df.compl = gen_df.compl.fillna(50)
|
||||
gen_df.ncompl = gen_df.ncompl.fillna(100)
|
||||
|
||||
num_before_filter = len(gen_df)
|
||||
if self.filter_nms:
|
||||
# filter using nms
|
||||
gen_df = gen_df[gen_df.nms_keep]
|
||||
print('NMS :: Removed {} samples. [{}]'.format(num_before_filter - len(gen_df), len(gen_df)))
|
||||
num_before_filter = len(gen_df)
|
||||
|
||||
select_topn = False
|
||||
if self.num_top_ncompl > 0:
|
||||
# find top 5 for each class with more relaxed ncompl condition
|
||||
select_min_ncompl = (gen_df.ncompl > self.min_ncompl_thresh) # necessary condition
|
||||
index_list = gen_df[select_min_ncompl].groupby('train_label').ncompl.nlargest(self.num_top_ncompl).index.values
|
||||
select_topn = gen_df.index.isin(np.stack(index_list)[:, 1])
|
||||
|
||||
if self.compl_thresh > -1:
|
||||
# filter using compl
|
||||
gen_df = gen_df[gen_df.compl > self.compl_thresh] # 0, 2, 4, 5
|
||||
print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh, num_before_filter - len(gen_df), len(gen_df)))
|
||||
elif self.ncompl_thresh > -1:
|
||||
# filter using compl
|
||||
gen_df = gen_df[(gen_df.ncompl > self.ncompl_thresh) | select_topn] # 0, 2, 4, 5
|
||||
print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh, num_before_filter - len(gen_df), len(gen_df)))
|
||||
print('class sample count stats: ')
|
||||
print(gen_df.train_label.value_counts().describe())
|
||||
|
||||
# add additional columns
|
||||
gen_df['collection'] = gen_df.folder.str.split('/').str[0]
|
||||
gen_df['generated'] = True
|
||||
gen_df['global_segm_idx'] = -1
|
||||
gen_df['relative_bbox'] = gen_df[['x1', 'y1', 'x2', 'y2']].values.tolist()
|
||||
gen_df['relative_bbox'] = gen_df['relative_bbox'].apply(np.array)
|
||||
gen_df['mzl_label'] = gen_df['label']
|
||||
gen_df['tablet_CDLI'] = gen_df['imageName']
|
||||
|
||||
# identify all collections with generated annotations
|
||||
list_gen_collection = gen_df.collection.unique().tolist()
|
||||
|
||||
|
||||
### load and prepare list_sign_anno_df
|
||||
# manual annotation files may be based on multiple collections
|
||||
# for each collection
|
||||
# store in list_sign_anno_df
|
||||
|
||||
# load bbox annotations
|
||||
list_anno_collections = []
|
||||
sign_anno_df_list = []
|
||||
for collection in collections:
|
||||
# load sign annotations
|
||||
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
|
||||
# ATTENTION: only use gt annotations if collection is provided in collections parameter
|
||||
if os.path.exists(annotation_file):
|
||||
sign_anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
|
||||
# add additional columns
|
||||
sign_anno_df['generated'] = False
|
||||
sign_anno_df['global_segm_idx'] = -1
|
||||
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
|
||||
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array) # convert to ndarray
|
||||
|
||||
# only keep inlier classes [0-240]
|
||||
class_outlier_select = sign_anno_df.train_label < num_classes
|
||||
if np.any(class_outlier_select):
|
||||
print('Drop {} outlier class samples from {}!'.format(np.sum(~class_outlier_select), collection))
|
||||
sign_anno_df = sign_anno_df[class_outlier_select]
|
||||
# slice sign_anno_df if there are multiple different collections contained
|
||||
for sub_collection in sign_anno_df.collection.unique():
|
||||
# store collection name
|
||||
list_anno_collections.append(sub_collection)
|
||||
# store collection specific slice of data frame
|
||||
sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
|
||||
sign_anno_df_list.append(sub_sign_anno_df)
|
||||
|
||||
|
||||
### extend collections
|
||||
# create list of elementary collections
|
||||
collections_ext = np.unique(list_gen_collection + list_anno_collections).tolist()
|
||||
#collections_ext
|
||||
|
||||
###################
|
||||
# II) on collection level: load segments meta data and line annotation (optional)
|
||||
|
||||
### load segment, line
|
||||
# for each collection
|
||||
# store in segments_df_list, line_anno_df_list
|
||||
|
||||
# reduced set of columns - only keep what is needed and maintained
|
||||
# segments_df_columns = ['tablet_CDLI', 'view_desc', 'padded_bbox', 'collection', 'line_scale', 'scale',
|
||||
# 'im_path',
|
||||
# 'num_dets_hd', 'num_signs_visible']
|
||||
|
||||
segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
|
||||
|
||||
segments_df_list = []
|
||||
line_anno_df_list = []
|
||||
for collection in collections_ext:
|
||||
|
||||
# load segment metadata
|
||||
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
|
||||
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
|
||||
# convert string of list to list
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
|
||||
# add additional columns
|
||||
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
|
||||
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
|
||||
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
|
||||
# get assigned segment (can be edited from outside without harm)
|
||||
assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
|
||||
|
||||
# load line annotations
|
||||
annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, collection)
|
||||
if os.path.exists(annotation_file):
|
||||
line_anno_df = pd.read_csv(annotation_file, engine='python')
|
||||
else:
|
||||
line_anno_df = []
|
||||
|
||||
# collect data frames in lists
|
||||
segments_df_list.append(assigned_segments_df[segments_df_columns])
|
||||
line_anno_df_list.append(line_anno_df)
|
||||
|
||||
|
||||
### assemble ssd_segments_df with new index
|
||||
# search all segments with annotations
|
||||
|
||||
list_segments_df_anno = []
|
||||
for collection in collections_ext:
|
||||
coll_idx = collections_ext.index(collection)
|
||||
#print(collection)
|
||||
|
||||
# get all segment indices for this collection that contain annotations
|
||||
list_segm_indices = []
|
||||
|
||||
# if there are gt annotations
|
||||
if collection in list_anno_collections:
|
||||
anno_coll_idx = list_anno_collections.index(collection)
|
||||
if len(sign_anno_df_list[anno_coll_idx]) > 0:
|
||||
# load their indices
|
||||
segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
|
||||
# filter annotations without assigned segment
|
||||
segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
|
||||
list_segm_indices.append(segm_indices_anno)
|
||||
|
||||
# if there are generated annotations
|
||||
if collection in list_gen_collection:
|
||||
# select gen annotations by collection
|
||||
col_gen_df = gen_df[gen_df.collection == collection]
|
||||
# load their indices
|
||||
segm_indices_anno = col_gen_df.segm_idx.unique()
|
||||
list_segm_indices.append(segm_indices_anno)
|
||||
|
||||
# stack to obtain list of segment indices with annotations
|
||||
segm_indices = np.unique(np.hstack(list_segm_indices))
|
||||
|
||||
# append only segments with anno
|
||||
if len(segm_indices) > 0:
|
||||
list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
|
||||
|
||||
# create new datasets ssd_segment_df
|
||||
# concat dataframes and use reset_index to create column with old indices
|
||||
ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
|
||||
# rename column to segm_idx
|
||||
ssd_segments_df.columns.values[0] = 'segm_idx'
|
||||
|
||||
|
||||
###################
|
||||
# III) on segment level: load data and prepare dataset index
|
||||
|
||||
### assemble ssd_sign_anno_df and update ssd_segments_df
|
||||
# make sure all annos have relative_bbox
|
||||
# additional column for ssd_sign_anno_df: global_segm_idx
|
||||
# add two columns to ssd_segments_df: with num_anno, with_line_anno
|
||||
# type of annotation: full, partly_w_line_anno, partly_w_line_dect
|
||||
|
||||
# sign_anno_df_cols = ['imageName', 'image_path', 'label', 'train_label', 'segm_idx', 'collection',
|
||||
# 'generated', 'relative_bbox', 'global_segm_idx']
|
||||
sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
|
||||
'generated', 'relative_bbox', 'global_segm_idx']
|
||||
list_ssd_sign_anno_df = []
|
||||
|
||||
list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
|
||||
list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
|
||||
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
collection = seg_rec.collection
|
||||
segm_idx = seg_rec.segm_idx
|
||||
coll_idx = collections_ext.index(collection)
|
||||
|
||||
### if annotations available for segment, append to list
|
||||
if collection in list_anno_collections:
|
||||
anno_coll_idx = list_anno_collections.index(collection)
|
||||
if len(sign_anno_df_list[anno_coll_idx]) > 0:
|
||||
sign_anno_df = sign_anno_df_list[anno_coll_idx]
|
||||
# select sign annos for segment
|
||||
segm_select = sign_anno_df.segm_idx == segm_idx
|
||||
|
||||
if len(sign_anno_df[segm_select]) > 0:
|
||||
# update data frame column
|
||||
sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
|
||||
# collect information
|
||||
sign_anno_seg = sign_anno_df[segm_select]
|
||||
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
|
||||
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
|
||||
|
||||
### if generated annotations available, append to list
|
||||
if collection in list_gen_collection:
|
||||
# select sign annos for segment AND collection
|
||||
segm_select = (gen_df.segm_idx == segm_idx) & (gen_df.collection == seg_rec.collection)
|
||||
if len(gen_df[segm_select]) > 0:
|
||||
# update data frame columns
|
||||
gen_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
|
||||
# compute relative_bbox
|
||||
relative_boxes = gen_df[segm_select].relative_bbox.apply(
|
||||
lambda x: np.rint(convert_bbox_global2local(x, list(seg_bbox))).astype(int))
|
||||
gen_df.loc[segm_select, 'relative_bbox'] = relative_boxes
|
||||
|
||||
# collect information
|
||||
sign_anno_seg = gen_df[segm_select]
|
||||
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
|
||||
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
|
||||
|
||||
### check for line annotations
|
||||
if len(line_anno_df_list[coll_idx]) > 0:
|
||||
line_anno_df = line_anno_df_list[coll_idx]
|
||||
# select line annos for segment
|
||||
segm_select = line_anno_df.segm_idx == segm_idx
|
||||
# if there are line annotations for segment
|
||||
if len(line_anno_df[segm_select]) > 0:
|
||||
# assume all lines are annotated and remember type of line data
|
||||
list_lines_annotated_per_segm[global_seg_idx] = True
|
||||
|
||||
# add columns to ssd_segments_df
|
||||
ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
|
||||
ssd_segments_df['with_line_anno'] = list_lines_annotated_per_segm
|
||||
|
||||
# assemble ssd_sign_anno_df (drop old index)
|
||||
ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
|
||||
|
||||
# this is deprecated, since bug fix
|
||||
#assert np.sum(ssd_sign_anno_df.groupby('global_segm_idx').collection.nunique() > 1) == 0
|
||||
|
||||
###################
|
||||
# IV) Preload: segment images and line detections
|
||||
|
||||
|
||||
### preload segment images
|
||||
# crop segment and convert to gray scale
|
||||
# IMPORTANT: preload segment crops (without scaling, because memory)
|
||||
|
||||
image_data_list = []
|
||||
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
# load composite image
|
||||
pil_im = Image.open(image_path)
|
||||
# crop segment
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
# convert to gray scale and store in list
|
||||
image_data_list.append(tablet_seg.convert('L'))
|
||||
|
||||
|
||||
|
||||
### preload line detections
|
||||
# could pre-compute line annotations->line map
|
||||
# this is a speed memory trade-off
|
||||
|
||||
line_detection_dict = {}
|
||||
line_map_dict = {}
|
||||
|
||||
# only required if there are any generated detections
|
||||
if self.use_linemaps:
|
||||
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
# get collection idx
|
||||
coll_idx = collections_ext.index(seg_rec.collection)
|
||||
# get seg image shape
|
||||
input_shape = np.array(image_data_list[global_seg_idx].size[::-1])
|
||||
|
||||
# if annotations are generated, need to create line map
|
||||
#if seg_rec.collection in list_gen_collection:
|
||||
|
||||
# if no line annotations available
|
||||
if True: # ALWAYS use generated annotations not seg_rec.with_line_anno: # if seg_rec.collection != 'train'
|
||||
# either skeleton or lbl_ind
|
||||
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, seg_rec.collection)
|
||||
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
|
||||
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
|
||||
lbl_ind_x = np.load(lines_file).astype(int)
|
||||
# store in dictionary
|
||||
line_detection_dict[global_seg_idx] = lbl_ind_x
|
||||
|
||||
# create line map from detections -> PIL binary
|
||||
lbl_im = create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape)
|
||||
|
||||
else:
|
||||
# create line map from line annotations -> PIL binary
|
||||
lbl_im = create_line_map_from_line_anno(line_anno_df_list, coll_idx, seg_rec.segm_idx, input_shape)
|
||||
|
||||
# resize to image size (do here or in next iter
|
||||
# lbl_im = lbl_im.resize(input_shape[::-1])
|
||||
|
||||
# store in dictionary
|
||||
line_map_dict[global_seg_idx] = lbl_im
|
||||
|
||||
|
||||
###################
|
||||
# V) Tiling
|
||||
|
||||
### compute ssd_tile_df
|
||||
list_tile_boxes = []
|
||||
list_tile_support = []
|
||||
list_tile_seg_idx = []
|
||||
|
||||
# iterate over segments
|
||||
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
|
||||
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
## compute tiles
|
||||
# get segment shape
|
||||
imw, imh = image_data_list[global_seg_idx].size
|
||||
# compute tile boxes
|
||||
tile_boxes, _, _ = compute_tiles(imw, imh, scale, tile_shape=tile_shape)
|
||||
# append
|
||||
list_tile_boxes.append(tile_boxes)
|
||||
list_tile_seg_idx.append([global_seg_idx] * len(tile_boxes))
|
||||
|
||||
## check overlap of tile boxes and sign boxes
|
||||
# get annotations
|
||||
seg_sign_annos = ssd_sign_anno_df[ssd_sign_anno_df.global_segm_idx == global_seg_idx]
|
||||
sign_bboxes = np.stack(seg_sign_annos.relative_bbox.values)
|
||||
|
||||
# OPT I: compute IOU
|
||||
# tiles_sign_iou = bbox_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
|
||||
# tile_support = np.sum(tiles_sign_iou > 0.005, axis=1) # 0.01 or 0.005
|
||||
|
||||
# OPT II: compute ctr overlap (strict)
|
||||
tiles_sign_ctrs = bbox_ctr_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
|
||||
tile_support = np.sum(tiles_sign_ctrs, axis=1).astype(int)
|
||||
list_tile_support.append(tile_support)
|
||||
|
||||
# stack tile boxes
|
||||
tile_boxes_arr = np.vstack(list_tile_boxes)
|
||||
tile_global_seg_idx = np.hstack(list_tile_seg_idx).astype(int)
|
||||
tile_support_arr = np.hstack(list_tile_support)
|
||||
|
||||
# create tile_df
|
||||
tile_df = pd.DataFrame({'global_segm_idx': tile_global_seg_idx,
|
||||
'tile_bbox': tile_boxes_arr.tolist(),
|
||||
'num_anno': tile_support_arr})
|
||||
|
||||
# OPTIONAL: filter tiles with little support
|
||||
if remove_empty_tiles and not use_balanced_idx:
|
||||
tile_df = tile_df[tile_df.num_anno > 0] # 0
|
||||
tile_df.reset_index(drop=True)
|
||||
|
||||
###################
|
||||
# VI) Dataset index
|
||||
|
||||
## Balance sampling of tiles with anno per tile
|
||||
# create an dataset index which is proportional to annotations per tile
|
||||
# attention: tiles without support will be ignored!
|
||||
use_balanced_idx = use_balanced_idx # good for debug
|
||||
|
||||
# 1) get tile factors
|
||||
tile_factors = tile_df.num_anno.values
|
||||
# 2) compute list to sample from
|
||||
if use_balanced_idx:
|
||||
sample2tile_list = []
|
||||
for ii, tile_factor in enumerate(tile_factors):
|
||||
sample2tile_list.extend([ii] * tile_factor)
|
||||
else:
|
||||
sample2tile_list = tile_df.index.values
|
||||
|
||||
###################
|
||||
# attach resulting data structures to class
|
||||
self.collections = collections
|
||||
self.collections_ext = collections_ext
|
||||
|
||||
self.ssd_segments_df = ssd_segments_df
|
||||
self.ssd_sign_anno_df = ssd_sign_anno_df
|
||||
self.tile_df = tile_df
|
||||
|
||||
self.image_data_list = image_data_list
|
||||
# self.line_detection_dict = line_detection_dict
|
||||
self.line_map_dict = line_map_dict
|
||||
|
||||
self.line_anno_df_list = line_anno_df_list
|
||||
# self.sign_anno_df_list = sign_anno_df_list
|
||||
# self.segments_df_list = segments_df_list
|
||||
|
||||
self.sample2tile_list = sample2tile_list
|
||||
|
||||
# setup finished
|
||||
print("Setup dataset spanning {} collections with {} annotations [{} segments, {} tiles, {} indices]".format(
|
||||
len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df), len(tile_df), len(sample2tile_list)))
|
||||
|
||||
def __getitem__(self, index):
|
||||
# get tile
|
||||
tile_index = self.sample2tile_list[index]
|
||||
tile_rec = self.tile_df.loc[tile_index]
|
||||
tile_bbox = tile_rec.tile_bbox
|
||||
|
||||
# get segment
|
||||
global_seg_idx = tile_rec.global_segm_idx
|
||||
seg_rec = self.ssd_segments_df.loc[global_seg_idx]
|
||||
coll_idx = self.collections_ext.index(seg_rec.collection)
|
||||
|
||||
# load segment meta data
|
||||
image_name, scale, seg_bbox, path_to_image, view_desc = get_segment_meta(seg_rec)
|
||||
with_line_anno = seg_rec.with_line_anno
|
||||
|
||||
# get segment image
|
||||
pil_im = self.image_data_list[global_seg_idx]
|
||||
|
||||
# get sign annos
|
||||
select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
|
||||
segm_annos = self.ssd_sign_anno_df[select_segm]
|
||||
seg_boxes = np.stack(segm_annos.relative_bbox)
|
||||
labels = segm_annos.train_label.values
|
||||
are_generated = segm_annos.generated.any()
|
||||
|
||||
# OPT II: tensor functions adapted from kuangliu's code
|
||||
# https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
|
||||
|
||||
# convert to torch tensors
|
||||
seg_boxes = torch.from_numpy(seg_boxes).float()
|
||||
labels = torch.from_numpy(labels)
|
||||
|
||||
if self.use_linemaps:
|
||||
|
||||
if are_generated:
|
||||
# incomplete annotations -> use line detections to avoid false negatives
|
||||
lbl_im = self.line_map_dict[global_seg_idx]
|
||||
# resize to crop
|
||||
lbl_im = lbl_im.resize(pil_im.size)
|
||||
else:
|
||||
# assume all ground truth signs are annotated
|
||||
# provide dummy label map
|
||||
lbl_im = Image.new('1', pil_im.size, 0)
|
||||
|
||||
if False:
|
||||
from skimage.color import label2rgb
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
plt.figure(figsize=(10, 10))
|
||||
# plt.imshow(lbl_ind)
|
||||
plt.imshow(label2rgb(np.asarray(lbl_im), np.asarray(pil_im)))
|
||||
plt.show()
|
||||
|
||||
# crop tile
|
||||
# print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
|
||||
im, boxes, labels, linemap = crop_box_lm(pil_im, seg_boxes, labels, lbl_im, tile_bbox)
|
||||
# scale tile
|
||||
im, boxes, linemap = resize_lm(im, boxes, linemap, None, scale=scale)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im, boxes, labels, linemap = self.transform(im, boxes, labels, linemap)
|
||||
|
||||
return im, boxes, labels, linemap
|
||||
|
||||
else:
|
||||
|
||||
# crop tile
|
||||
#print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
|
||||
im, boxes, labels = crop_box(pil_im, seg_boxes, labels, tile_bbox)
|
||||
# scale tile
|
||||
im, boxes = resize(im, boxes, None, scale=scale)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im, boxes, labels = self.transform(im, boxes, labels)
|
||||
|
||||
return im, boxes, labels
|
||||
|
||||
def __len__(self):
|
||||
return len(self.sample2tile_list)
|
||||
|
||||
|
||||
# helper functions
|
||||
|
||||
def create_line_map_from_line_anno(line_anno_df_list, coll_idx, segm_idx, input_shape):
|
||||
line_height = 3
|
||||
|
||||
# select line annotations
|
||||
line_anno_df = line_anno_df_list[coll_idx]
|
||||
seg_line_df = line_anno_df[line_anno_df.segm_idx == segm_idx]
|
||||
# # collect all line coordinates
|
||||
rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=1 / 16.)
|
||||
# compute line trafo
|
||||
line_trafo = create_line_trafo(rr, cc, input_shape / 16)
|
||||
# # compute masks
|
||||
line_mask = line_trafo < line_height
|
||||
# convert to binary PIL image
|
||||
lbl_im = convert2binaryPIL(line_mask)
|
||||
|
||||
return lbl_im
|
||||
|
||||
|
||||
def create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape):
|
||||
# get line detection
|
||||
lbl_ind = line_detection_dict[global_seg_idx]
|
||||
# compute line map
|
||||
lbl_ind = compute_image_label_map(lbl_ind, np.array(input_shape * scale, dtype=int), padding=5) # default:16, other padding=16 20 24
|
||||
# convert to binary PIL image
|
||||
lbl_im = convert2binaryPIL(lbl_ind)
|
||||
|
||||
return lbl_im
|
||||
|
||||
|
||||
# run test
|
||||
def test(collections=['train'], gen_collections=[], gen_folder=None, use_balanced_idx=True, use_linemaps=False,
|
||||
remove_empty_tiles=False, min_align_ratio=0.2, relative_path='../../'):
|
||||
ssd_dataset = CuneiformSSD(collections=collections, gen_file_path=None, gen_collections=gen_collections,
|
||||
gen_folder=gen_folder, relative_path=relative_path,
|
||||
use_balanced_idx=use_balanced_idx, tile_shape=[600, 600], use_linemaps=use_linemaps,
|
||||
remove_empty_tiles=remove_empty_tiles, min_align_ratio=min_align_ratio)
|
||||
return ssd_dataset
|
||||
@@ -0,0 +1,286 @@
|
||||
import os
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from PIL import Image
|
||||
from ast import literal_eval
|
||||
|
||||
from scipy import ndimage as ndi
|
||||
from skimage.util import invert
|
||||
from skimage.draw import line, line_aa
|
||||
|
||||
import torch.utils.data as data
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
from ..utils.bbox_utils import clip_boxes
|
||||
from ..utils.transform_utils import crop_pil_image
|
||||
from ..detection.sign_detection import *
|
||||
|
||||
|
||||
### helper functions
|
||||
|
||||
def collect_line_coords(seg_line_df, scale=1):
|
||||
# group according to line idx
|
||||
grouped = seg_line_df.groupby('line_idx')
|
||||
|
||||
# collect all line coordinates
|
||||
rr_list, cc_list, lbbox_list = [], [], []
|
||||
for i, line_rec in grouped:
|
||||
xx = np.rint(line_rec.x.values * scale).astype(int)
|
||||
yy = np.rint(line_rec.y.values * scale).astype(int)
|
||||
lbbox = np.array([np.min(xx), np.min(yy), np.max(xx), np.max(yy)])
|
||||
lbbox_list.append(lbbox)
|
||||
for li in range(len(xx) - 1):
|
||||
rr, cc, _ = line_aa(yy[li], xx[li], yy[li + 1], xx[li + 1])
|
||||
# rr, cc = line(yy[li], xx[li], yy[li+1], xx[li+1])
|
||||
rr_list.append(rr)
|
||||
cc_list.append(cc)
|
||||
|
||||
# stack coordinates
|
||||
rr = np.hstack(rr_list)
|
||||
cc = np.hstack(cc_list)
|
||||
lbboxes = np.stack(lbbox_list)
|
||||
return rr, cc, lbboxes
|
||||
|
||||
|
||||
def create_line_trafo(rr, cc, input_shape):
|
||||
# create mask
|
||||
line_mask = np.zeros(input_shape).astype(bool)
|
||||
line_mask[rr, cc] = 1
|
||||
# compute distance transform after inverting
|
||||
line_trafo = ndi.distance_transform_edt(invert(line_mask))
|
||||
return line_trafo
|
||||
|
||||
|
||||
def compute_sampling_freq(line_trafo, sample_mask, sample_radius, expo=2):
|
||||
sample_freq = line_trafo
|
||||
# convert to probs
|
||||
sample_freq = (-sample_freq / sample_radius + 1) ** expo
|
||||
# sample_freq = -sample_freq/sample_radius + 1
|
||||
# sample_freq = np.exp(-sample_freq/sample_radius * 2)
|
||||
|
||||
# set area that is not sampled from to 'zero'
|
||||
sample_freq[sample_mask < 1] = 0
|
||||
return sample_freq
|
||||
|
||||
|
||||
def spatial_sample(sample_freq):
|
||||
thresh = np.random.random_sample()
|
||||
ylist, xlist = np.where(sample_freq > thresh)
|
||||
select_idx = np.random.randint(len(xlist))
|
||||
return xlist[select_idx], ylist[select_idx]
|
||||
|
||||
|
||||
def spatial_sample_negative(sample_freq):
|
||||
# too slow
|
||||
if 0:
|
||||
# remove samples close to border
|
||||
border_mask = np.zeros_like(sample_freq, dtype=bool)
|
||||
bdist = 150
|
||||
border_mask[bdist:-bdist, bdist:-bdist] = True
|
||||
# apply masks
|
||||
ylist, xlist = np.where((sample_freq == 0) & (border_mask))
|
||||
select_idx = np.random.randint(len(xlist))
|
||||
return xlist[select_idx], ylist[select_idx]
|
||||
# faster
|
||||
if 1:
|
||||
# remove samples close to border
|
||||
border_mask = np.zeros_like(sample_freq, dtype=bool)
|
||||
bdist = 150
|
||||
border_mask[bdist:-bdist, bdist:-bdist] = True
|
||||
x, y = 0, 0
|
||||
# (line_map[x, y] is True) results in overlap with hard negative samples
|
||||
for i in range(100):
|
||||
# pick coordinate
|
||||
select_idx = np.random.randint(np.prod(sample_freq.shape))
|
||||
# back to matrix index
|
||||
x, y = np.unravel_index(select_idx, sample_freq.shape)
|
||||
if (sample_freq[x, y] == 0) and (border_mask[x, y] == True):
|
||||
break
|
||||
return y, x
|
||||
|
||||
|
||||
def pad_bboxes(lbboxes, context_pad):
|
||||
# works inplace, so need to return
|
||||
for bb in lbboxes:
|
||||
bb[:2] = bb[:2] - context_pad
|
||||
bb[2:4] = bb[2:4] + context_pad
|
||||
# return lbboxes
|
||||
|
||||
|
||||
def spatial_sample_line(sample_freq, lbbox):
|
||||
thresh = np.random.random_sample()
|
||||
ylist, xlist = np.where(sample_freq[lbbox[1]:lbbox[3], lbbox[0]:lbbox[2]] >= thresh)
|
||||
if len(xlist) == 0:
|
||||
print lbbox, sample_freq.shape
|
||||
select_idx = np.random.randint(len(xlist))
|
||||
return lbbox[0] + xlist[select_idx], lbbox[1] + ylist[select_idx]
|
||||
|
||||
|
||||
### CuneiformLine Class
|
||||
|
||||
class CuneiformLines(data.Dataset):
|
||||
|
||||
def __init__(self, dataset_params, transform=None, target_transform=None, relative_path='../', split='train'):
|
||||
# annotation_path, params,
|
||||
|
||||
# set params
|
||||
self.line_height = dataset_params['line_height']
|
||||
self.sample_radius = dataset_params['sample_radius'] # self.line_height * 3
|
||||
self.expo = dataset_params['expo']
|
||||
if 'train' in split:
|
||||
self.soft_bg_frac = dataset_params['soft_bg_frac'][0]
|
||||
else:
|
||||
self.soft_bg_frac = dataset_params['soft_bg_frac'][1]
|
||||
|
||||
self.crop_size = dataset_params['crop_size']
|
||||
self.patch_size = dataset_params['patch_size']
|
||||
|
||||
# transforms for data preparation
|
||||
self.transform = transform
|
||||
self.target_transform = target_transform
|
||||
|
||||
# load line annotation
|
||||
annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, split)
|
||||
line_anno_df = pd.read_csv(annotation_file, engine='python')
|
||||
|
||||
# load segment metadata
|
||||
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, split)
|
||||
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
|
||||
# convert string of list to list
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
|
||||
# additional columns
|
||||
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
|
||||
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
|
||||
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
|
||||
|
||||
# select assigned
|
||||
assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
|
||||
|
||||
# pre-load segments and compute line and sampling maps
|
||||
self.valid_indices = []
|
||||
self.num_lines_list = []
|
||||
self.image_data_list = []
|
||||
self.line_map_list = []
|
||||
self.sample_freq_list = []
|
||||
lbboxes_list = []
|
||||
for segment_idx, segment_rec in tqdm(assigned_segments_df.iterrows(), total=len(assigned_segments_df)):
|
||||
imageName = segment_rec.tablet_CDLI
|
||||
scale = segment_rec.scale
|
||||
seg_bbox = segment_rec.bbox
|
||||
path_to_image = segment_rec.im_path
|
||||
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
|
||||
|
||||
# select line annotations
|
||||
seg_line_df = line_anno_df[line_anno_df.segm_idx == segment_idx]
|
||||
|
||||
# check if any annotations available
|
||||
if len(seg_line_df) > 0:
|
||||
# print(split, imageName, view_desc)
|
||||
|
||||
### 1) load segment
|
||||
# prepare input tablet
|
||||
pil_im = Image.open(path_to_image)
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
# scale image
|
||||
input_im = rescale_segment_single(tablet_seg, scale)
|
||||
input_shape = input_im.size[::-1]
|
||||
|
||||
### 2) line map
|
||||
# compute interpolated line coordinates
|
||||
|
||||
# collect all line coordinates
|
||||
rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=scale)
|
||||
# pad with sample radius
|
||||
pad_bboxes(lbboxes, self.sample_radius)
|
||||
clip_boxes(lbboxes, input_shape)
|
||||
# compute line trafo
|
||||
line_trafo = create_line_trafo(rr, cc, input_shape)
|
||||
# compute masks
|
||||
line_mask = line_trafo < self.line_height
|
||||
sample_mask = line_trafo < self.sample_radius
|
||||
# compute frequency
|
||||
sample_freq = compute_sampling_freq(line_trafo, sample_mask, self.sample_radius, self.expo)
|
||||
|
||||
### 3) save data
|
||||
# append to list
|
||||
self.valid_indices.append(segment_idx)
|
||||
self.num_lines_list.append(len(seg_line_df.line_idx.unique()))
|
||||
self.image_data_list.append(input_im)
|
||||
self.line_map_list.append(line_mask)
|
||||
self.sample_freq_list.append(sample_freq)
|
||||
lbboxes_list.append(lbboxes)
|
||||
|
||||
# stack lbboxes
|
||||
self.lbboxes = np.vstack(lbboxes_list)
|
||||
|
||||
self.line2mem_list = []
|
||||
# for valid_idx, num_lines in zip(self.valid_indices, self.num_lines_list):
|
||||
for men_idx, num_lines in enumerate(self.num_lines_list):
|
||||
self.line2mem_list.extend([men_idx] * num_lines)
|
||||
|
||||
# Balance sampling with line length
|
||||
# 1) get line factors by line width and normalisation
|
||||
widths = self.lbboxes[:, 2] - self.lbboxes[:, 0]
|
||||
# factor required to make smallest length larger equal 1
|
||||
norm_factor_int = np.ceil(float(widths.sum()) / widths.min())
|
||||
norm_widths = widths / float(widths.sum())
|
||||
line_factors = np.rint(norm_factor_int * norm_widths).astype(int)
|
||||
# 2) compute list to sample from
|
||||
self.sample2line_list = []
|
||||
for ii, line_factor in enumerate(line_factors):
|
||||
self.sample2line_list.extend([ii] * line_factor)
|
||||
|
||||
# increase test set size to obtain more stable error
|
||||
if split == 'test':
|
||||
self.sample2line_list = self.sample2line_list * 5
|
||||
|
||||
# setup finished
|
||||
print("Setup {} dataset with {} rows and {} samples".format(split, len(self.line2mem_list), len(self)))
|
||||
|
||||
def __getitem__(self, index):
|
||||
|
||||
# line_index = index
|
||||
line_index = self.sample2line_list[index]
|
||||
|
||||
lbbox = self.lbboxes[line_index]
|
||||
mem_idx = self.line2mem_list[line_index]
|
||||
# get required data
|
||||
segm_im = self.image_data_list[mem_idx]
|
||||
line_map = self.line_map_list[mem_idx]
|
||||
sample_freq = self.sample_freq_list[mem_idx]
|
||||
|
||||
if np.random.random() > self.soft_bg_frac:
|
||||
# sample spatial location
|
||||
# y, x = spatial_sample(sample_freq) # coordinates need to be inverted
|
||||
y, x = spatial_sample_line(sample_freq, lbbox)
|
||||
# compute target label
|
||||
target = int(line_map[x, y])
|
||||
else:
|
||||
y, x = spatial_sample_negative(sample_freq)
|
||||
# compute target label
|
||||
target = int(line_map[x, y]) # should be always negative
|
||||
|
||||
# crop patch at sampled location (use PIL for that)
|
||||
hw, hh = self.patch_size[0] / 2., self.patch_size[1] / 2.
|
||||
bbox = [y - hw, x - hh, y + hw, x + hh]
|
||||
|
||||
# new fast
|
||||
im, bb = crop_pil_image(segm_im, bbox, context_pad=0, pad_to_square=False)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im = self.transform(im)
|
||||
|
||||
if self.target_transform is not None:
|
||||
target = self.target_transform(target)
|
||||
|
||||
return im, target
|
||||
|
||||
def __len__(self):
|
||||
# return total lines
|
||||
# return len(self.sample_indices)
|
||||
return len(self.sample2line_list)
|
||||
|
||||
|
||||
@@ -0,0 +1,149 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from PIL import Image
|
||||
from ast import literal_eval
|
||||
from tqdm import tqdm
|
||||
|
||||
import torch.utils.data as data
|
||||
|
||||
|
||||
from ..detection.sign_detection import crop_segment_from_tablet_im, rescale_segment_single
|
||||
from ..utils.torchcv.transforms.resize import resize
|
||||
|
||||
|
||||
class CuneiformSegments(data.Dataset):
|
||||
# lightweight version of cunei_dataset_segments
|
||||
# no annotations processing
|
||||
# no preloading
|
||||
|
||||
def __init__(self, transform=None, target_transform=None, collection='train', collections=[],
|
||||
relative_path='../', rescale=1.0, only_assigned=True, preload_segments=False):
|
||||
|
||||
self.rescale = rescale
|
||||
self.relative_path = relative_path
|
||||
self.collection = collection
|
||||
self.preload_segments = preload_segments
|
||||
|
||||
# transforms for data preparation
|
||||
self.transform = transform
|
||||
self.target_transform = target_transform
|
||||
|
||||
if len(collections) > 0:
|
||||
# load segment metadata for multiple collections
|
||||
df_list = []
|
||||
for collection in collections:
|
||||
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
|
||||
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
|
||||
df_list.append(tablet_segments_df)
|
||||
# concatenate to single df
|
||||
tablet_segments_df = pd.concat(df_list, ignore_index=True)
|
||||
else:
|
||||
# load segment metadata for single collection
|
||||
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
|
||||
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
|
||||
|
||||
# convert string of list to list
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
|
||||
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
|
||||
# add additional columns
|
||||
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
|
||||
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
|
||||
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
|
||||
|
||||
# get assigned segment (can be edited from outside without harm)
|
||||
if only_assigned:
|
||||
self.assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
|
||||
else:
|
||||
self.assigned_segments_df = tablet_segments_df
|
||||
|
||||
# make available for outside
|
||||
self.tablet_segments_df = tablet_segments_df
|
||||
|
||||
self.image_data_list = []
|
||||
self.sample2seg_list = []
|
||||
self.sidx2didx = []
|
||||
self.setup_sample_list()
|
||||
|
||||
def setup_sample_list(self, updated_df=None):
|
||||
if updated_df is not None:
|
||||
self.assigned_segments_df = updated_df
|
||||
|
||||
### preload segment images
|
||||
# crop segment and convert to gray scale
|
||||
# IMPORTANT: preload segment crops (without scaling, because memory)
|
||||
image_data_list = []
|
||||
if self.preload_segments:
|
||||
# iterate over segments
|
||||
for seg_idx, seg_rec in tqdm(self.assigned_segments_df.iterrows(), total=len(self.assigned_segments_df)):
|
||||
# load segment meta data
|
||||
image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
|
||||
# prepare input tablet
|
||||
pil_im = Image.open(path_to_image)
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
# store in list
|
||||
image_data_list.append(tablet_seg)
|
||||
self.image_data_list = image_data_list
|
||||
|
||||
self.sample2seg_list = self.assigned_segments_df.index.values
|
||||
|
||||
# map from seg idx to dataset idx
|
||||
self.sidx2didx = dict(zip(self.sample2seg_list, range(len(self.sample2seg_list))))
|
||||
|
||||
# setup finished
|
||||
print("Setup {} dataset with {} elements".format(self.collection, len(self)))
|
||||
|
||||
def __getitem__(self, index):
|
||||
|
||||
seg_idx = self.sample2seg_list[index]
|
||||
seg_rec = self.assigned_segments_df.loc[seg_idx]
|
||||
|
||||
# load segment meta data
|
||||
image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
|
||||
|
||||
# specify target
|
||||
target = seg_idx
|
||||
|
||||
# get segment image
|
||||
if self.preload_segments:
|
||||
tablet_seg = self.image_data_list[index]
|
||||
else:
|
||||
# prepare input tablet
|
||||
pil_im = Image.open(path_to_image)
|
||||
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
|
||||
|
||||
# scale image
|
||||
if 0:
|
||||
# scale image
|
||||
im = rescale_segment_single(tablet_seg, scale)
|
||||
else:
|
||||
# convert to gray scale
|
||||
# tablet_seg = tablet_seg.convert('L')
|
||||
# scale segment
|
||||
im, _ = resize(tablet_seg, None, None, scale=scale)
|
||||
|
||||
# apply augmentation pipeline and convert from PIL to numpy
|
||||
if self.transform is not None:
|
||||
im = self.transform(im)
|
||||
|
||||
if self.target_transform is not None:
|
||||
target = self.target_transform(target)
|
||||
|
||||
return im, target
|
||||
|
||||
def __len__(self):
|
||||
# return total lines
|
||||
return len(self.assigned_segments_df)
|
||||
|
||||
|
||||
def get_segment_meta(self, segment_rec):
|
||||
image_name = segment_rec.tablet_CDLI
|
||||
|
||||
# this should control which scale is used in consecutive processing
|
||||
scale = segment_rec.scale * self.rescale
|
||||
|
||||
seg_bbox = segment_rec.bbox
|
||||
path_to_image = segment_rec.im_path
|
||||
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
|
||||
|
||||
return image_name, scale, seg_bbox, path_to_image, view_desc
|
||||
|
||||
@@ -0,0 +1,725 @@
|
||||
import numpy as np
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import matplotlib.ticker as ticker
|
||||
from matplotlib import cm
|
||||
|
||||
import PIL.Image as Image
|
||||
from ..utils.transform_utils import crop_pil_image
|
||||
|
||||
from ..evaluations.config import cfg
|
||||
from ..utils.bbox_utils import clip_boxes
|
||||
|
||||
|
||||
def visualize_net_output_single(im, predicted, cunei_id=30, num_classes=None, min_prob=0.95):
|
||||
# visualize output of single crop detections
|
||||
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
|
||||
# cross-image products
|
||||
output = np.mean(predicted, axis=0)
|
||||
# cross-channel products
|
||||
lbl_ind = np.argmax(output, axis=0)
|
||||
|
||||
ctr_crop = predicted[0, ...]
|
||||
|
||||
plt.figure(figsize=(16, 24))
|
||||
plt.subplot(4, 2, 1)
|
||||
plt.imshow(im, cmap=cm.Greys_r)
|
||||
plt.title('input')
|
||||
plt.subplot(4, 2, 2)
|
||||
plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
|
||||
plt.colorbar()
|
||||
plt.title('class #{}'.format(cunei_id))
|
||||
plt.subplot(4, 2, 3)
|
||||
cmap = plt.get_cmap('Paired')
|
||||
plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
|
||||
plt.colorbar()
|
||||
plt.title('argmax class')
|
||||
plt.subplot(4, 2, 4)
|
||||
test = np.argmax(ctr_crop.squeeze(), axis=0)
|
||||
test[np.max(ctr_crop.squeeze(), axis=0) < min_prob] = 0
|
||||
plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
|
||||
plt.colorbar()
|
||||
plt.title('argmax class ( {} confidence)'.format(min_prob))
|
||||
|
||||
|
||||
def _refine_detections(predicted):
|
||||
# single image product from center crop
|
||||
ctr_crop = predicted[4, ...]
|
||||
# cross-image products
|
||||
output = np.mean(predicted, axis=0)
|
||||
max_output = np.max(predicted, axis=0)
|
||||
uncertainty = np.var(predicted, axis=0)
|
||||
# cross-channel products
|
||||
lbl_ind = np.argmax(output, axis=0)
|
||||
average_unc = np.mean(uncertainty, axis=0)
|
||||
min_average_unc = np.min(average_unc)
|
||||
max_average_unc = np.max(average_unc)
|
||||
max_unc = np.max(uncertainty)
|
||||
|
||||
# save products
|
||||
# sio.savemat('results/test_tablet_cuneiNet_{}_{}_scale_{}_{}.mat'.format(training_round, imageName, scale, negatives_used),
|
||||
# {#'probs':ctr_crop,
|
||||
# #'pred_labels': np.argmax(ctr_crop,axis=0),
|
||||
# #'entropy': -np.sum(ctr_crop * np.log(ctr_crop)),
|
||||
# 'predicted': predicted,
|
||||
# 'avg_probs': output,
|
||||
# 'avg_unc': average_unc,
|
||||
# 'avg_pred_labels': lbl_ind})
|
||||
|
||||
return ctr_crop, output, max_output, uncertainty, lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc
|
||||
|
||||
|
||||
def visualize_net_output(im, predicted, cunei_id=30, num_classes=None):
|
||||
# visualize output of 5 star crop detections
|
||||
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
|
||||
ctr_crop, output, max_output, uncertainty, \
|
||||
lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc = _refine_detections(predicted)
|
||||
|
||||
plt.figure(figsize=(16, 24))
|
||||
plt.subplot(3, 2, 1)
|
||||
plt.imshow(im, cmap=cm.Greys_r)
|
||||
plt.title('input')
|
||||
plt.subplot(3, 2, 2)
|
||||
plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
|
||||
plt.colorbar()
|
||||
plt.title('class #{}'.format(cunei_id))
|
||||
plt.subplot(3, 2, 3)
|
||||
cmap = plt.get_cmap('Paired')
|
||||
plt.imshow(np.argmax(ctr_crop.squeeze(), axis=0), cmap=cmap, vmin=0, vmax=num_classes)
|
||||
plt.colorbar()
|
||||
plt.title('argmax class')
|
||||
plt.subplot(3, 2, 4)
|
||||
test = np.argmax(ctr_crop.squeeze(), axis=0)
|
||||
test[np.max(ctr_crop.squeeze(), axis=0) < 0.95] = 0
|
||||
plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
|
||||
plt.colorbar()
|
||||
plt.title('argmax class (0.95 confidence)')
|
||||
#plt.subplot(4, 2, 5)
|
||||
#cmap = plt.get_cmap('Paired')
|
||||
#plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
|
||||
#plt.colorbar()
|
||||
#plt.title('avg argmax class')
|
||||
plt.subplot(3, 2, 5)
|
||||
plt.imshow(average_unc, vmin=0, vmax=max_average_unc)
|
||||
plt.colorbar()
|
||||
plt.title('shift induced uncertainty')
|
||||
plt.subplot(3, 2, 6)
|
||||
# entropy
|
||||
plt.imshow(-np.sum(ctr_crop.squeeze() * np.log(ctr_crop.squeeze()), axis=0))
|
||||
plt.colorbar()
|
||||
plt.title('entropy')
|
||||
|
||||
|
||||
def _im_to_pyra_coords(pyra, boxes):
|
||||
# boxes is N x 4 where each row is a box in the image specified
|
||||
# by [x1 y1 x2 y2].
|
||||
#
|
||||
# Output is a cell array where cell i holds the pyramid boxes
|
||||
# coming from the image box
|
||||
boxes = boxes - 1
|
||||
pyra_boxes = []
|
||||
for level in range(pyra['num_levels']):
|
||||
level_boxes = boxes * pyra['scales'][level]
|
||||
level_boxes = np.round(level_boxes / pyra['stride'])
|
||||
level_boxes = level_boxes
|
||||
# add padding
|
||||
level_boxes[:, 0] = level_boxes[:, 0] + pyra['padx']
|
||||
level_boxes[:, 2] = level_boxes[:, 2] + pyra['padx']
|
||||
level_boxes[:, 1] = level_boxes[:, 1] + pyra['pady']
|
||||
level_boxes[:, 3] = level_boxes[:, 3] + pyra['pady']
|
||||
pyra_boxes.append(level_boxes)
|
||||
return pyra_boxes
|
||||
|
||||
|
||||
def _pyra_to_im_coords(pyra, boxes):
|
||||
# boxes is N x 5 where each row is a box in the format [x1 y1 x2 y2 pyra_level]
|
||||
# where (x1, y1) is the upper-left corner of the box in pyramid level pyra_level
|
||||
# and (x2, y2) is the lower-right corner of the box in pyramid level pyra_level
|
||||
# Assumes 1-based indexing.
|
||||
# pyramid to im scale factors for each scale
|
||||
|
||||
scales = pyra['stride'] / pyra['scales'][0]
|
||||
|
||||
# pyramid to im scale factors for each pyra level in boxes
|
||||
if len(scales.shape) > 0:
|
||||
scales = scales[boxes[:, -1]];
|
||||
|
||||
# Remove padding from pyramid boxes
|
||||
boxes[:, 0] = boxes[:, 0] - pyra['padx']
|
||||
boxes[:, 2] = boxes[:, 2] - pyra['padx']
|
||||
boxes[:, 1] = boxes[:, 1] - pyra['pady']
|
||||
boxes[:, 3] = boxes[:, 3] - pyra['pady']
|
||||
|
||||
im_boxes = boxes[:, :4] * scales
|
||||
return im_boxes
|
||||
|
||||
|
||||
def _pyramid_patch_box(x1, y1, feat_map_sz, pyra, lvl_idx, opt='A'):
|
||||
# compute image patch box coordinates in original image
|
||||
# should also work for all features of one image at once
|
||||
# REQUIREMENTS:
|
||||
# position of feature in feature map: x1, y1
|
||||
# dimension of feature map: feat_map_sz
|
||||
# scale of input image (relative to original scale): pyra, lvl_idx
|
||||
# stride that is determined by network architecture: pyra
|
||||
|
||||
# OPTION A
|
||||
if opt == 'A':
|
||||
boxes = np.array([x1 - 0.5, y1 - 0.5, x1 + 0.5, y1 + 0.5]).transpose([1, 0])
|
||||
boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
|
||||
# OPTION B - more accurate
|
||||
elif opt == 'B':
|
||||
x_step = (feat_map_sz[1] - 1) / float(feat_map_sz[1])
|
||||
y_step = (feat_map_sz[0] - 1) / float(feat_map_sz[0])
|
||||
|
||||
boxes = np.array(
|
||||
[(x1 - 0.5) * x_step, (y1 - 0.5) * y_step, (x1 + 0.5) * x_step, (y1 + 0.5) * y_step]).transpose([1, 0])
|
||||
boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
|
||||
|
||||
im_patch_box = np.floor(_pyra_to_im_coords(pyra, boxes))
|
||||
return im_patch_box
|
||||
|
||||
|
||||
def _pyramid_rf_box(im_sz, im_patch_box, rf_size, scales, lvl_idx):
|
||||
# compute receptive field box coordinates in original image
|
||||
# (given patch_box coordinates in original image)
|
||||
# REQUIREMENTS:
|
||||
# receptive field size determined by network architecture: rf_size [H, W]
|
||||
# original image size: im_sz
|
||||
# patch box size in original image: im_patch_box
|
||||
# scale of input image relative to original image: scales, lvl_idx
|
||||
|
||||
scaled_rf_sz = rf_size / scales[lvl_idx]
|
||||
|
||||
im_rf_box = np.zeros_like(im_patch_box)
|
||||
im_rf_box[:, 0] = im_patch_box[:, 0] - scaled_rf_sz[1] / 2.
|
||||
im_rf_box[:, 1] = im_patch_box[:, 1] - scaled_rf_sz[0] / 2.
|
||||
im_rf_box[:, 2] = im_patch_box[:, 2] + scaled_rf_sz[1] / 2.
|
||||
im_rf_box[:, 3] = im_patch_box[:, 3] + scaled_rf_sz[0] / 2.
|
||||
|
||||
# should not be required!!
|
||||
# im_rf_box = clip_boxes(im_rf_box, im_sz)
|
||||
|
||||
return np.round(im_rf_box)
|
||||
|
||||
|
||||
def compute_bbox_grids(map_shape, im_shape, arch_type='alexnet'):
|
||||
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
|
||||
if arch_type is 'alexnet':
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
|
||||
else:
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
|
||||
|
||||
# pyra = {'stride': 16, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
# 'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
|
||||
|
||||
# blobs in caffe and images in opencv are H,W formatted. This results in YX format
|
||||
# since all bounding boxes use XY convention, then accessing images or blobs this needs to be taken into account
|
||||
x = np.arange(0, map_shape[1])
|
||||
y = np.arange(0, map_shape[0])
|
||||
xv, yv = np.meshgrid(x, y, sparse=False, indexing='xy')
|
||||
|
||||
# print lbl_ind.shape, im.shape, xv.shape
|
||||
|
||||
# compute basic patch boxes
|
||||
# each score in the score map corresponds to a single non-overlapping box
|
||||
patch_boxes = _pyramid_patch_box(xv.flatten(), yv.flatten(), map_shape, pyra, 0, opt='A') + pyra[
|
||||
'offset']
|
||||
|
||||
# compute receptive field sized boxes (overlapping)
|
||||
# due to the way pyramid_rf_box is implemented one stride needs to be subtracted from rf_size
|
||||
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
|
||||
rf_boxes = _pyramid_rf_box(im_shape, patch_boxes, rf_sz - pyra['stride'], pyra['scales'], 0)
|
||||
|
||||
return patch_boxes, rf_boxes
|
||||
|
||||
|
||||
def label_map2image(feat_x, feat_y, map_shape, arch_type='alexnet'):
|
||||
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
|
||||
if arch_type is 'alexnet':
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
|
||||
else:
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
|
||||
|
||||
# compute basic patch boxes
|
||||
# each score in the score map corresponds to a single non-overlapping box
|
||||
patch_boxes = _pyramid_patch_box(feat_x, feat_y, map_shape, pyra, 0, opt='A') + pyra[
|
||||
'offset']
|
||||
|
||||
return patch_boxes
|
||||
|
||||
|
||||
def radius_in_image(feat_radius, dim=0, arch_type='alexnet'):
|
||||
# dim defines along which dimension to compute (only important, if rf_size not square)
|
||||
|
||||
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
|
||||
if arch_type is 'alexnet':
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
|
||||
else:
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
|
||||
|
||||
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
|
||||
|
||||
# compute radii in image
|
||||
patch_radius = feat_radius * pyra['stride']
|
||||
rf_radius = patch_radius + (rf_sz[dim] - pyra['stride'])
|
||||
return patch_radius, rf_radius
|
||||
|
||||
|
||||
def coord_in_image(coord, add_rf=False, arch_type='alexnet'):
|
||||
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
|
||||
if arch_type is 'alexnet':
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
|
||||
else:
|
||||
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
|
||||
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
|
||||
|
||||
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
|
||||
|
||||
# compute coordinate in image
|
||||
im_coord = coord * pyra['stride'] + pyra['offset']
|
||||
if add_rf:
|
||||
im_coord += (rf_sz[0] - pyra['stride'])
|
||||
return im_coord
|
||||
|
||||
|
||||
def _bbox_transform_inv(boxes, deltas):
|
||||
if boxes.shape[0] == 0:
|
||||
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
|
||||
|
||||
boxes = boxes.astype(deltas.dtype, copy=False)
|
||||
|
||||
widths = boxes[:, 2] - boxes[:, 0] + 1.0
|
||||
heights = boxes[:, 3] - boxes[:, 1] + 1.0
|
||||
ctr_x = boxes[:, 0] + 0.5 * widths
|
||||
ctr_y = boxes[:, 1] + 0.5 * heights
|
||||
|
||||
dx = deltas[:, 0::4]
|
||||
dy = deltas[:, 1::4]
|
||||
dw = deltas[:, 2::4]
|
||||
dh = deltas[:, 3::4]
|
||||
|
||||
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
|
||||
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
|
||||
pred_w = np.exp(dw) * widths[:, np.newaxis]
|
||||
pred_h = np.exp(dh) * heights[:, np.newaxis]
|
||||
|
||||
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
|
||||
# x1
|
||||
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
|
||||
# y1
|
||||
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
|
||||
# x2
|
||||
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
|
||||
# y2
|
||||
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
|
||||
|
||||
return pred_boxes
|
||||
|
||||
|
||||
def nms(dets, thresh):
|
||||
x1 = dets[:, 0]
|
||||
y1 = dets[:, 1]
|
||||
x2 = dets[:, 2]
|
||||
y2 = dets[:, 3]
|
||||
scores = dets[:, 4]
|
||||
|
||||
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
xx1 = np.maximum(x1[i], x1[order[1:]])
|
||||
yy1 = np.maximum(y1[i], y1[order[1:]])
|
||||
xx2 = np.minimum(x2[i], x2[order[1:]])
|
||||
yy2 = np.minimum(y2[i], y2[order[1:]])
|
||||
|
||||
w = np.maximum(0.0, xx2 - xx1 + 1)
|
||||
h = np.maximum(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
ovr = inter / (areas[i] + areas[order[1:]] - inter)
|
||||
|
||||
inds = np.where(ovr <= thresh)[0]
|
||||
order = order[inds + 1]
|
||||
|
||||
return keep
|
||||
|
||||
|
||||
def crop_bboxes_from_im(im, bboxes, context_pad=0, is_pil=False):
|
||||
"""
|
||||
Crop a bbox from the image for detection.
|
||||
im: crop target
|
||||
bboxes: bounding box coordinates as xmin, ymin, xmax, ymax.
|
||||
"""
|
||||
# iterate over boxes
|
||||
im_crop_list = []
|
||||
for i in xrange(bboxes.shape[0]):
|
||||
# format bbox
|
||||
bbox = np.round(bboxes[i, :]).astype(int)
|
||||
# crop bbox from image
|
||||
if context_pad <= 0:
|
||||
im_crop = im[bbox[1]:bbox[3], bbox[0]:bbox[2]]
|
||||
else:
|
||||
if is_pil:
|
||||
im_crop = np.asarray(crop_pil_image(im, bbox, context_pad=context_pad)[0])
|
||||
else:
|
||||
im_crop = np.asarray(crop_pil_image(Image.fromarray(im), bbox, context_pad=context_pad)[0]) # Image.fromarray(im, 'L')
|
||||
# append to list
|
||||
im_crop_list.append(im_crop)
|
||||
return im_crop_list
|
||||
|
||||
|
||||
def apply_bbox_regression(predicted_roi, rf_boxes, im_shape, num_classes=None, with_star_crop=True):
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
|
||||
## if use_bbox_reg:
|
||||
# select roi deltas
|
||||
if with_star_crop:
|
||||
roi_deltas = predicted_roi[4, ...].reshape([num_classes * 4, -1]).transpose()
|
||||
else:
|
||||
roi_deltas = predicted_roi.reshape([num_classes * 4, -1]).transpose()
|
||||
# apply bounding-box regression deltas
|
||||
pred_boxes = _bbox_transform_inv(rf_boxes, roi_deltas)
|
||||
# make sure everything stays inside its limits
|
||||
pred_boxes = clip_boxes(pred_boxes, im_shape)
|
||||
return pred_boxes
|
||||
|
||||
|
||||
def _split_detections(detections, boxes, axis=1, nsplits=2, sid=1):
|
||||
assert (axis == 1) | (axis == 2)
|
||||
boxes_split = np.array_split(boxes, nsplits, axis=axis-1)
|
||||
dets_split = np.array_split(detections, nsplits, axis=axis)
|
||||
# reshape to original format
|
||||
det_vec = dets_split[sid].reshape([dets_split[sid].shape[0], -1]).transpose()
|
||||
box_vec = boxes_split[sid].reshape([-1, boxes_split[sid].shape[-1]])
|
||||
return det_vec, box_vec
|
||||
|
||||
|
||||
def split_detections(detections, pred_boxes, rf_boxes, lbl_map_shape,
|
||||
split_axis='h', nsplits=2, sid=1, num_classes=cfg.TEST.NUM_CLASSES):
|
||||
if split_axis == 'h':
|
||||
# horizontal
|
||||
axis = 1
|
||||
elif split_axis == 'v':
|
||||
# vertical
|
||||
axis = 2
|
||||
# split detections
|
||||
if cfg.TEST.BBOX_REG:
|
||||
det_vec, box_vec = _split_detections(detections,
|
||||
pred_boxes.reshape(list(lbl_map_shape) + [4 * num_classes]),
|
||||
axis=axis, nsplits=nsplits, sid=sid)
|
||||
else:
|
||||
det_vec, box_vec = _split_detections(detections,
|
||||
rf_boxes.reshape(list(lbl_map_shape) + [num_classes]),
|
||||
axis=axis, nsplits=nsplits, sid=sid)
|
||||
|
||||
return det_vec, box_vec
|
||||
|
||||
|
||||
def vis_detections(im, bboxes, scores=None, labels=None, thresh=0.3, max_vis=20, figs_sz=(14, 14), ax=None):
|
||||
"""
|
||||
Visualize bounding boxes on top of input image including labels / scores.
|
||||
im: input image
|
||||
bboxes: ndarray of bounding boxes
|
||||
scores: list of scores with length equal bboxes.shape[0]
|
||||
labels: list of integer labels with length equal bboxes.shape[0]
|
||||
etc.
|
||||
"""
|
||||
if scores is None:
|
||||
nvis = min(max_vis, bboxes.shape[0])
|
||||
else:
|
||||
assert len(scores) == bboxes.shape[0]
|
||||
inds = np.where(scores > thresh)[0]
|
||||
nvis = min(max_vis, len(inds))
|
||||
|
||||
# return if no bboxes to visualize
|
||||
if nvis == 0:
|
||||
return
|
||||
|
||||
# plot base figure
|
||||
if ax is None:
|
||||
fig, ax = plt.subplots(1, 1, figsize=figs_sz)
|
||||
|
||||
ax.imshow(im, cmap=cm.Greys_r)
|
||||
|
||||
# iterate over bboxes and add them
|
||||
for i in xrange(nvis):
|
||||
bbox = bboxes[i, :4]
|
||||
# deal with scores
|
||||
if scores is not None:
|
||||
score = scores[i]
|
||||
# only show boxes with score above threshold
|
||||
if score <= thresh:
|
||||
continue
|
||||
# deal with labels
|
||||
if isinstance(labels, str):
|
||||
# if label is string
|
||||
cls_name = labels
|
||||
title_txt = labels
|
||||
else:
|
||||
# else assume index array
|
||||
assert len(labels) == bboxes.shape[0]
|
||||
cls_name = '{:.0f}'.format(labels[i])
|
||||
title_txt = 'X'
|
||||
|
||||
# plt.cla()
|
||||
# plt.imshow(im, cmap = cm.Greys_r)
|
||||
ax.add_patch(
|
||||
plt.Rectangle((bbox[0], bbox[1]),
|
||||
bbox[2] - bbox[0],
|
||||
bbox[3] - bbox[1], fill=False,
|
||||
edgecolor='blue', alpha=0.5, linewidth=2.0)
|
||||
)
|
||||
if scores is None:
|
||||
ax.text(bbox[0], bbox[1] - 2, '{:s}'.format(cls_name),
|
||||
bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
|
||||
ax.set_title('{}'.format(cls_name))
|
||||
else:
|
||||
ax.text(bbox[0], bbox[1] - 2, '{:s} {:.2f}'.format(cls_name, score),
|
||||
bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
|
||||
ax.set_title('{} {:.3f}'.format(cls_name, score))
|
||||
ax.set_title('{} detections with p({} | box) >= {:.2f}'.format(nvis, title_txt, thresh), fontsize=14)
|
||||
ax.xaxis.set_major_locator(ticker.MultipleLocator(200))
|
||||
ax.yaxis.set_major_locator(ticker.MultipleLocator(200))
|
||||
# plt.axis('off')
|
||||
# plt.tight_layout()
|
||||
# plt.draw()
|
||||
|
||||
|
||||
def scale_detection_boxes(boxes, scale_factor):
|
||||
# scale boxes depending on scale factor
|
||||
return boxes * scale_factor
|
||||
|
||||
|
||||
def correct_for_shift(boxes, correction):
|
||||
# correct shift due to oversampling
|
||||
# in order to correct ground truth boxes, e.g. center crop -> subtract half the shift from gt boxes
|
||||
# in order to correct detection boxes, e.g. center crop -> add half the shift to detection boxes
|
||||
return boxes + correction
|
||||
|
||||
|
||||
def reverse_scaling(rf_boxes, pred_boxes, scaling=1):
|
||||
# if used, should be applied right before post-processing detections
|
||||
|
||||
# reverse scaling of detection boxes
|
||||
rf_boxes = scale_detection_boxes(rf_boxes, scaling)
|
||||
pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
|
||||
|
||||
return rf_boxes, pred_boxes
|
||||
|
||||
|
||||
def reverse_shift_and_scaling(rf_boxes, pred_boxes, shift=0, scaling=1):
|
||||
# if used, should be applied right before post-processing detections
|
||||
|
||||
# correct shift of detection boxes due to center crop
|
||||
rf_boxes = correct_for_shift(rf_boxes, shift)
|
||||
pred_boxes = correct_for_shift(np.array(pred_boxes), shift)
|
||||
|
||||
# reverse scaling of detection boxes
|
||||
rf_boxes = scale_detection_boxes(rf_boxes, scaling)
|
||||
pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
|
||||
|
||||
return rf_boxes, pred_boxes
|
||||
|
||||
|
||||
def post_process_detections(scores, pred_boxes, rf_boxes, num_classes=None, use_bbox_reg=None, nms_thresh=None):
|
||||
# apply nms and filter low confidence boxes
|
||||
# return list of good candidates
|
||||
# all detections are collected into:
|
||||
# all_boxes[cls][image] = N x 5 array of detections in
|
||||
# (x1, y1, x2, y2, score)
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
if use_bbox_reg is None:
|
||||
use_bbox_reg = cfg.TEST.BBOX_REG
|
||||
if nms_thresh is None:
|
||||
nms_thresh = cfg.TEST.NMS
|
||||
|
||||
score_min_thresh = cfg.TEST.SCORE_MIN_THRESH
|
||||
score_bg_thresh = cfg.TEST.SCORE_BG_THRESH
|
||||
|
||||
num_images = 1
|
||||
|
||||
all_boxes = [[[] for _ in range(num_images)]
|
||||
for _ in range(num_classes)] # xrange vs range
|
||||
|
||||
for i in range(num_images): # xrange vs range
|
||||
# load image and get detections from network
|
||||
# [.....]
|
||||
# skip j = 0, because it's the background class
|
||||
for j in range(1, num_classes): # xrange vs range
|
||||
# selection of boxes before NMS
|
||||
inds = np.where((scores[:, j] > score_min_thresh) & (scores[:, 0] < score_bg_thresh))[0]
|
||||
cls_scores = scores[inds, j]
|
||||
if use_bbox_reg:
|
||||
cls_boxes = pred_boxes[inds, j * 4:(j + 1) * 4] # bbox regression
|
||||
else:
|
||||
cls_boxes = rf_boxes[inds, :] # without bbox regression
|
||||
cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
|
||||
.astype(np.float32, copy=False)
|
||||
# apply nms suppression
|
||||
keep = nms(cls_dets, nms_thresh)
|
||||
cls_dets = cls_dets[keep, :]
|
||||
all_boxes[j][i] = cls_dets
|
||||
|
||||
return all_boxes
|
||||
|
||||
|
||||
def get_all_bboxes(all_boxes):
|
||||
# take detections and all_boxes
|
||||
# return enriched list of detections including bbox, score, and max label
|
||||
num_classes = len(all_boxes)
|
||||
dets_list = [[] for _ in xrange(num_classes)]
|
||||
for j in xrange(1, num_classes):
|
||||
if len(all_boxes[j][0]) > 0:
|
||||
# get boxes
|
||||
BB = all_boxes[j][0]
|
||||
confidence = all_boxes[j][0][:, -1]
|
||||
# sort boxes by confidence
|
||||
sorted_ind = np.argsort(-confidence)
|
||||
# sorted_scores = np.sort(-confidence)
|
||||
BB = BB[sorted_ind, :]
|
||||
# append together with class label
|
||||
dets_list[j] = np.concatenate([BB, np.tile(j, reps=(BB.shape[0], 1))], axis=1)
|
||||
# concatenate lists from different classes
|
||||
return dets_list
|
||||
|
||||
|
||||
def get_detection_bboxes(detections, all_boxes):
|
||||
# take detections and all_boxes
|
||||
# return enriched list of detections including bbox, score, and max label
|
||||
num_classes = len(all_boxes)
|
||||
dets_list = [[] for _ in xrange(num_classes)]
|
||||
for j in xrange(1, num_classes):
|
||||
if len(detections[j][0]) > 0:
|
||||
# get boxes
|
||||
BB = all_boxes[j][0]
|
||||
confidence = all_boxes[j][0][:, -1]
|
||||
# sort boxes by confidence
|
||||
sorted_ind = np.argsort(-confidence)
|
||||
# sorted_scores = np.sort(-confidence)
|
||||
BB = BB[sorted_ind, :]
|
||||
# select detections (indices require sorted detections - since sorted in evaluate)
|
||||
inds = detections[j][0]
|
||||
# append together with class label
|
||||
dets_list[j] = np.concatenate([BB[inds, :], np.tile(j, reps=(inds.shape[0], 1))], axis=1)
|
||||
# concatenate lists from different classes
|
||||
return dets_list
|
||||
|
||||
|
||||
def collect_detection_crops(input_im, dets_list, max_vis=5, context_pad=0):
|
||||
# take tablet(input_im) and list of bboxes(dets_list)
|
||||
# return cropped patches(dets_crops)
|
||||
num_classes = len(dets_list)
|
||||
dets_crops = [[] for _ in xrange(num_classes)]
|
||||
for j in xrange(1, num_classes):
|
||||
if len(dets_list[j]) > 0:
|
||||
# get boxes
|
||||
cls_dets = dets_list[j] # select class list
|
||||
bboxes = cls_dets[:, :4] # remove any additional dims
|
||||
ncrops = min(max_vis, bboxes.shape[0])
|
||||
dets_crops[j] = crop_bboxes_from_im(input_im, bboxes[:ncrops, ...], context_pad)
|
||||
return dets_crops
|
||||
|
||||
|
||||
def plot_crop_list(dets_crops, gt_crops, scores=None, k=8, cls_label='', figs_sz=(14, 4.5), context_pad=0):
|
||||
# plot co-detections of a single class
|
||||
# can handle dets_crops and gt_crops together or both on their own
|
||||
nvis = min(len(dets_crops), k)
|
||||
ngt = len(gt_crops)
|
||||
if nvis > 0:
|
||||
# slice crops and scores
|
||||
top_list = dets_crops[:nvis]
|
||||
top_vals = scores[:nvis]
|
||||
|
||||
# prepare subplots (nvis or nvis + 1)
|
||||
fig, axes = plt.subplots(1, nvis + (ngt > 0), figsize=figs_sz, squeeze=False) # , gridspec_kw={'wspace': 1}
|
||||
axes = axes.ravel()
|
||||
|
||||
# plot idx
|
||||
pid = 0
|
||||
|
||||
# plot ground truth in front if available
|
||||
if ngt > 0:
|
||||
axes[pid].imshow(gt_crops[0], cmap=cm.Greys_r)
|
||||
axes[pid].set_yticks([])
|
||||
axes[pid].set_xticks([])
|
||||
axes[pid].set_title("gt [{}]".format(cls_label))
|
||||
pid += 1
|
||||
|
||||
# iterate over top_list
|
||||
for i, imcrop in enumerate(top_list):
|
||||
axes[pid + i].imshow(imcrop, cmap=cm.Greys_r)
|
||||
axes[pid + i].set_yticks([])
|
||||
axes[pid + i].set_xticks([])
|
||||
bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
|
||||
# if there is no gt, add class label to title in first plot
|
||||
if pid + i == 0 and ngt == 0:
|
||||
axes[pid + i].set_title("class [{}] #{} p(x)={:.1f}".format(cls_label, i + 1, top_vals[i]))
|
||||
else:
|
||||
axes[pid + i].set_title("#{} p(x)={:.1f}".format(i + 1, top_vals[i]))
|
||||
|
||||
if context_pad > 0:
|
||||
imw, imh = imcrop.shape[:2]
|
||||
bbox = [context_pad, context_pad, imh - context_pad, imw - context_pad]
|
||||
axes[pid + i].add_patch(plt.Rectangle((bbox[0], bbox[1]),
|
||||
bbox[2] - bbox[0], bbox[3] - bbox[1],
|
||||
fill=False, edgecolor='blue', linestyle='-',
|
||||
alpha=0.3, linewidth=2.0))
|
||||
|
||||
elif ngt > 0:
|
||||
nvis = 1
|
||||
# plot top k
|
||||
fig, axes = plt.subplots(1, nvis, figsize=figs_sz, squeeze=False)
|
||||
axes = axes.ravel()
|
||||
|
||||
top_list = [gt_crops[0]] * nvis
|
||||
top_vals = [1] * len(top_list)
|
||||
for pid, imcrop in enumerate(top_list):
|
||||
axes[0, pid].imshow(imcrop, cmap=cm.Greys_r)
|
||||
bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
|
||||
axes[0, pid].set_title("gt [{}]".format(cls_label))
|
||||
axes[0, pid].set_yticks([])
|
||||
axes[0, pid].set_xticks([])
|
||||
|
||||
|
||||
def convert_detections_to_array(all_boxes, img_idx=0, idx_column=None):
|
||||
# all_boxes[cls][image] = N x 5 array of detections in
|
||||
# (x1, y1, x2, y2, score)
|
||||
total_labels = len(all_boxes) # all_boxes.shape[0]
|
||||
temp = [0, 0, 0, 0, 0, 0, 0, 0, 0] # [ID, cx, cy, score, x1, y1, x2, y2, idx]
|
||||
detections_arr = np.zeros((0, 9))
|
||||
idx = 0
|
||||
# convert to CLS, cx, cy, score
|
||||
for i in range(total_labels):
|
||||
for box in all_boxes[i][img_idx]:
|
||||
temp[0] = i
|
||||
temp[1] = (box[2] + box[0]) / 2
|
||||
temp[2] = (box[3] + box[1]) / 2
|
||||
temp[3] = box[4]
|
||||
temp[4:8] = box[0:4]
|
||||
if idx_column is None:
|
||||
temp[8] = idx
|
||||
else:
|
||||
temp[8] = idx_column[idx]
|
||||
idx += 1
|
||||
detections_arr = np.vstack((detections_arr, temp))
|
||||
# SORT BY SCORE!?
|
||||
return detections_arr
|
||||
|
||||
@@ -0,0 +1,519 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
import torch
|
||||
from torchvision import transforms as trafos
|
||||
|
||||
from skimage import draw
|
||||
from skimage.color import label2rgb
|
||||
from skimage.transform import hough_line, hough_line_peaks, probabilistic_hough_line
|
||||
from skimage.morphology import skeletonize, skeletonize_3d, thin, medial_axis, watershed
|
||||
|
||||
from scipy import ndimage as ndi
|
||||
from scipy.spatial.distance import pdist, cdist, squareform
|
||||
|
||||
from ..detection.detection_helpers import label_map2image, coord_in_image
|
||||
|
||||
|
||||
# prepare input for line detection
|
||||
|
||||
def preprocess_line_input(pil_im, scale, shift=None):
|
||||
""" produces five copies of the segment at slightly different offsets
|
||||
|
||||
:param pil_im: tablet segment that is to be processed
|
||||
:param scale: scale which should be used for resizing
|
||||
:param shift: offset shift used to produce five-fold oversampling
|
||||
:return: 4D tensor with 5xCxWxH
|
||||
"""
|
||||
if shift is None:
|
||||
shift = 0 # cfg.TEST.SHIFT
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
imw = int(imw * scale)
|
||||
imh = int(imh * scale)
|
||||
# determine crop size
|
||||
crop_sz = [int(imw - shift), int(imh - shift)]
|
||||
# tensor-space transforms
|
||||
ts_transform = trafos.Compose([
|
||||
trafos.ToTensor(),
|
||||
trafos.Normalize(mean=[0.5], std=[1]), # normalize
|
||||
])
|
||||
# compose transforms
|
||||
tablet_transform = trafos.Compose([
|
||||
trafos.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
trafos.Resize((imh, imw)), # resize according to scale
|
||||
trafos.FiveCrop((crop_sz[1], crop_sz[0])), # oversample
|
||||
trafos.Lambda(
|
||||
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
|
||||
])
|
||||
# apply transforms
|
||||
im_list = tablet_transform(pil_im)
|
||||
return im_list
|
||||
|
||||
|
||||
def apply_detector(inputs, model_fcn, device):
|
||||
|
||||
with torch.no_grad(): # faster, less memory usage
|
||||
inputs = inputs.to(device)
|
||||
# apply network
|
||||
output = model_fcn(inputs)
|
||||
# convert to numpy
|
||||
output = output.data.cpu().numpy()
|
||||
return output
|
||||
|
||||
|
||||
# prepare transliteration for line detection
|
||||
|
||||
def prepare_transliteration(tl_df, num_lines, stats):
|
||||
"""
|
||||
ATTENTION: this filters the transliteration according to status!
|
||||
"""
|
||||
|
||||
# prepare transliteration for line detection
|
||||
if num_lines > 0:
|
||||
# only visible/not broken
|
||||
tl_df = tl_df[tl_df.status > 0]
|
||||
# compute line length
|
||||
tl_df = tl_df.groupby('line_idx').apply(compute_line_length_from_tl, stats)
|
||||
# get line statistics
|
||||
num_vis_lines = tl_df.line_idx.nunique() # num visible lines (not broken lines)
|
||||
# len_lines = tl_df.groupby('line_idx').pos_idx.count()
|
||||
len_lines = tl_df.line_idx.value_counts()
|
||||
len_min, len_max = len_lines.min(), len_lines.max()
|
||||
else:
|
||||
# TODO: if no tl info available, use initial line detection results to set these parameters
|
||||
len_min, len_max = 4, 12
|
||||
num_vis_lines = 40
|
||||
|
||||
return tl_df, num_vis_lines, len_min, len_max
|
||||
|
||||
|
||||
# extract lines with hough transform
|
||||
|
||||
def compute_hough_transform(line_det_map1, line_det_map2, re_focus_angle=True):
|
||||
# focus theta for cuneiform horizontal lines
|
||||
theta_range = np.linspace(np.deg2rad(83), np.deg2rad(97), 50)
|
||||
# theta_range = np.linspace(np.deg2rad(-90) ,np.deg2rad(90), 180) # normal range
|
||||
|
||||
# Classic straight-line Hough transform (usually angles from -90 to +90)
|
||||
h, theta, d = hough_line(line_det_map1, theta=theta_range)
|
||||
|
||||
# debug
|
||||
# plt.imshow(np.log(1 + h), extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]], cmap='gray', aspect=1/1.5)
|
||||
# plt.show()
|
||||
|
||||
# focus angle and re-run
|
||||
if re_focus_angle:
|
||||
# get peaks
|
||||
accum, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=16, num_peaks=50)
|
||||
|
||||
# get median angle
|
||||
m_angle = np.median(np.rad2deg(angles))
|
||||
# modify theta
|
||||
theta_range = np.linspace(np.deg2rad(m_angle - 2), np.deg2rad(m_angle + 2), 50)
|
||||
theta_range2 = np.linspace(np.deg2rad(m_angle - 3), np.deg2rad(m_angle + 3), 50)
|
||||
# Classic straight-line Hough transform (usually angles from -90 to +90)
|
||||
h, theta, d = hough_line(line_det_map2, theta=theta_range)
|
||||
|
||||
return h, theta, d, theta_range, theta_range2
|
||||
|
||||
|
||||
# group lines together that are "close"
|
||||
|
||||
def shoelace_formula(points):
|
||||
''' compute are of polygon according to shoelace
|
||||
requires ordering of point coordinates
|
||||
https://en.wikipedia.org/wiki/Shoelace_formula
|
||||
|
||||
:param points: 2xn matrix, where n is number of points (points need to be ordered!!)
|
||||
:return: area of polygon
|
||||
'''
|
||||
area = 0
|
||||
dmat = np.ones((2, 2))
|
||||
for i in range(points.shape[1]):
|
||||
dmat[:, 0] = points[:, i]
|
||||
dmat[:, 1] = points[:, (i+1) % points.shape[1]]
|
||||
area += np.linalg.det(dmat.transpose())
|
||||
return np.abs(area) / 2.
|
||||
|
||||
|
||||
def area_between_two_line_segments(spt1, spt2, lpt1, lpt2):
|
||||
# compute area between line segments
|
||||
# assume: line segments do not intersect and
|
||||
# assume: pts should be order according to x-axis
|
||||
# this means a valid order would be [spt1, spt2, lpt2, lpt1]
|
||||
return shoelace_formula(np.stack([spt1, spt2, lpt2, lpt1], axis=1))
|
||||
|
||||
|
||||
def nearby_and_near_parallel_2(l1, l2, interline_distance, interval=[0, 10]):
|
||||
# compute area between line segments over interval
|
||||
angle1, rad1 = l1
|
||||
angle2, rad2 = l2
|
||||
spt1, spt2 = line_pts_from_polar_line(angle1, rad1, x0=interval[0], x1=interval[1])
|
||||
lpt1, lpt2 = line_pts_from_polar_line(angle2, rad2, x0=interval[0], x1=interval[1])
|
||||
# use shoelace method
|
||||
area = area_between_two_line_segments(spt1, spt2, lpt1, lpt2)
|
||||
# check threshold
|
||||
interval_interline_area = interline_distance * np.abs(interval[1] - interval[0])
|
||||
|
||||
# print area, interval_interline_area / 2.
|
||||
if area < interval_interline_area / 2.:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
|
||||
def nearby_and_near_parallel(l1, l2, interline_distance):
|
||||
# simple filter
|
||||
angle1, rad1 = l1
|
||||
angle2, rad2 = l2
|
||||
if np.abs(rad1 - rad2) < interline_distance/2. and np.abs(np.rad2deg(angle1-angle2)) < 1.0:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
|
||||
def do_intersect_in_interval(l1, l2, interval):
|
||||
# y = mx+c or in parametric form
|
||||
# \rho = x \cos \theta + y \sin \theta
|
||||
# \rho (radius) perpendicular distance from origin to the line
|
||||
# \theta is the angle formed by this perpendicular line
|
||||
|
||||
angle1, rad1 = l1
|
||||
angle2, rad2 = l2
|
||||
lower, upper = interval
|
||||
|
||||
quotient = (np.cos(angle1) - np.cos(angle2))
|
||||
|
||||
if quotient == 0: # same angles
|
||||
if np.abs(rad1 - rad2) < 3: # same radius
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
else:
|
||||
# compute intersection coordinate
|
||||
x_intersect = (rad1 - rad2) / quotient
|
||||
|
||||
# inside interval
|
||||
if (x_intersect >= lower) and (x_intersect <= upper):
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
|
||||
def compute_group_labels_from_dists(X_dist):
|
||||
# assign labels to groups
|
||||
# iterate over pairwise distances and
|
||||
|
||||
# get squareform
|
||||
XX = squareform(X_dist)
|
||||
# set dummy labels
|
||||
labels = -np.ones(XX.shape[0])
|
||||
# label lines while checking for neighbourhood
|
||||
for ii in range(len(labels)):
|
||||
if labels[ii] == -1:
|
||||
labels[ii] = ii
|
||||
# for each row in squareform indicates potential neighbors
|
||||
for idx in np.where(XX[ii, :] > 0)[0]:
|
||||
labels[idx] = labels[ii]
|
||||
return labels
|
||||
|
||||
|
||||
# associate lines with line segments
|
||||
|
||||
|
||||
def line_pts_from_polar_line(angle, dist, x0=0, x1=10):
|
||||
# computes two points defining a line from polar line representation
|
||||
x0, x1 = x0 * np.ones_like(angle), x1 * np.ones_like(angle) # x0 = np.zeros_like(angle)
|
||||
y0 = (dist - x0 * np.cos(angle)) / np.sin(angle)
|
||||
y1 = (dist - x1 * np.cos(angle)) / np.sin(angle)
|
||||
return (x0, y0), (x1, y1)
|
||||
|
||||
|
||||
def line_params_from_pts(lpt1, lpt2):
|
||||
# compute parameters a, b for line representation y = a * x + b
|
||||
a = (lpt2[1] - lpt1[1])/float(lpt2[0] - lpt1[0])
|
||||
b = lpt1[1] - lpt1[0] * a
|
||||
return a, b
|
||||
|
||||
|
||||
def normal_form_from_pts(p, q):
|
||||
# takes two points an computes
|
||||
# https://de.wikipedia.org/wiki/Normalenform#Aus_der_Zweipunkteform
|
||||
|
||||
# normal
|
||||
n = np.array([-(q[1]-p[1]),
|
||||
q[0]-p[0]], dtype=float)
|
||||
# normalize
|
||||
n_0 = n / np.linalg.norm(n)
|
||||
# distance from origin
|
||||
dist = np.dot(n_0, p)
|
||||
return n_0, dist
|
||||
|
||||
|
||||
def hess_normal_form_from_pts(p, q):
|
||||
n_0, dist = normal_form_from_pts(p, q)
|
||||
# angle in rad
|
||||
rad = np.arctan(n_0[1]/n_0[0])
|
||||
return rad, dist
|
||||
|
||||
|
||||
def _offset_pt_to_normal_form_line(pt, n_0, dist):
|
||||
return np.dot(n_0, pt) - dist
|
||||
|
||||
|
||||
def _shift_pt_to_normal_form_line(pt, n_0, shift_dist):
|
||||
return pt + n_0 * shift_dist
|
||||
|
||||
|
||||
def clip_pt_using_normal_form(pt, n_0, dist, min_dist):
|
||||
# compute offset
|
||||
offset_line = _offset_pt_to_normal_form_line(pt, n_0, dist)
|
||||
# check if correction necessary
|
||||
if np.abs(offset_line) > min_dist:
|
||||
# compute correction
|
||||
if offset_line >= 0:
|
||||
correction = offset_line - min_dist
|
||||
else:
|
||||
correction = offset_line + min_dist
|
||||
# apply correction
|
||||
pt = _shift_pt_to_normal_form_line(pt, n_0, -correction)
|
||||
return pt
|
||||
|
||||
|
||||
def clip_bbox_using_line(bbox, line_pts_arr, min_dist=128/2.):
|
||||
# get normal form of line
|
||||
n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
|
||||
# compute distance to line and decide if pt needs to be shifted
|
||||
pt_list = []
|
||||
# iterate over two bounding box coordinates
|
||||
for pt in [bbox[:2], bbox[2:]]:
|
||||
pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
|
||||
pt_list.append(pt)
|
||||
|
||||
return np.concatenate(pt_list)
|
||||
|
||||
|
||||
def clip_bbox_using_line_segmentation(bbox, line_pts_arr, skeleton, min_dist=128/2.):
|
||||
# get normal form of line
|
||||
n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
|
||||
|
||||
# use bbox boundaries to crop pts from line segmentation
|
||||
# seg_line_pts = np.nonzero(skeleton[:, int(bbox[0]):int(bbox[2])])[0]
|
||||
# faster but more exclusive (probably worth the speedup)
|
||||
seg_line_pts = np.nonzero(skeleton[int(bbox[1]):int(bbox[3]), int(bbox[0]):int(bbox[2])])[0] + int(bbox[1])
|
||||
|
||||
dist_delta = 0
|
||||
if len(seg_line_pts) > 3:
|
||||
# compute average y location of segmentation line pts
|
||||
seg_line_cy = np.mean(seg_line_pts)
|
||||
# determine local distance delta from linear model to skeleton
|
||||
pt = [(bbox[0] + bbox[2]) / 2., seg_line_cy]
|
||||
dist_delta = _offset_pt_to_normal_form_line(pt, n_0, dist)
|
||||
# correct normal form of line [n_0, dist] using delta, ie. alter dist
|
||||
dist = dist + dist_delta
|
||||
#print dist_delta, dist - dist_delta, dist
|
||||
|
||||
# compute distance to line and decide if pt needs to be shifted
|
||||
pt_list = []
|
||||
# iterate over two bounding box coordinates
|
||||
for pt in [bbox[:2], bbox[2:]]:
|
||||
pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
|
||||
pt_list.append(pt)
|
||||
|
||||
return np.concatenate(pt_list)
|
||||
|
||||
|
||||
def dist_pt_line(pt, lpt1, lpt2):
|
||||
# compute squared 'perpendicular distance'
|
||||
# pt is point
|
||||
# lpt are line points
|
||||
# returns minimum (perpendicular) distance from point to line
|
||||
|
||||
# assumes line representation of form y = a * x + b
|
||||
(a, b) = line_params_from_pts(lpt1, lpt2)
|
||||
# from: energy based geometric model fitting (2010)
|
||||
# https://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line#Another_formula
|
||||
return (np.abs(pt[1]-a*pt[0]-b)/np.sqrt(a**2+1))**2
|
||||
|
||||
|
||||
def dist_lineseg_line(spt1, spt2, lpt1, lpt2):
|
||||
# computes the distance between line (unbounded) and line segment (bounded)
|
||||
# spt are line segment points
|
||||
# lpt are line points
|
||||
# returns minimum distance from line segment to line
|
||||
return min(dist_pt_line(spt1, lpt1, lpt2),
|
||||
dist_pt_line(spt2, lpt1, lpt2))
|
||||
|
||||
|
||||
def assign_line_segments_to_lines(line_segs, line_hypos, x1=10):
|
||||
# get line pts from polar lines
|
||||
polar_lines = line_hypos.groupby('label').mean()[['angle', 'dist']].values
|
||||
line_pts = line_pts_from_polar_line(polar_lines[:, 0], polar_lines[:, 1], x1=x1)
|
||||
line_pts = np.transpose(np.concatenate(line_pts))
|
||||
# get line segments
|
||||
line_seg_pts = np.stack(line_segs).reshape(len(line_segs), -1)
|
||||
|
||||
# compute distance between line segments and lines
|
||||
X2_dist = cdist(line_pts, line_seg_pts,
|
||||
lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
|
||||
# assign line segments to nearest line
|
||||
ls_labels = np.argmin(X2_dist, axis=0)
|
||||
|
||||
return ls_labels
|
||||
|
||||
|
||||
# associate line segments with segments
|
||||
|
||||
|
||||
def associate_segments_with_lines(lbl_ind, line_segs, ls_labels, group2line):
|
||||
# create markers from line segments
|
||||
im_marker = np.zeros_like(lbl_ind)
|
||||
for line, li in zip(line_segs, ls_labels):
|
||||
p0, p1 = line
|
||||
rr, cc = draw.line(p0[1], p0[0], p1[1], p1[0])
|
||||
im_marker[rr, cc] = int(group2line[li]) + 1 # avoid background class
|
||||
# plt.imshow(im_marker)
|
||||
|
||||
# use water shed to assign labels to segments
|
||||
distance = ndi.distance_transform_edt(lbl_ind)
|
||||
segm_labels = watershed(-distance, im_marker, mask=lbl_ind)
|
||||
return segm_labels, im_marker
|
||||
|
||||
|
||||
# map segment lbls to image resolution (deal with network architecture with offset)
|
||||
|
||||
def compute_image_label_map(segm_labels, image_shape, padding=0):
|
||||
# collect patch boxes and their labels
|
||||
list_patch_boxes, list_patch_labels = [], []
|
||||
for lbl_idx in np.unique(segm_labels):
|
||||
if lbl_idx > 0:
|
||||
# for index compute coordinate boxes
|
||||
vx, vy = np.where(segm_labels == lbl_idx)
|
||||
patch_boxes = label_map2image(vy, vx, segm_labels.shape[::-1]).astype(int)
|
||||
# append
|
||||
list_patch_boxes.append(patch_boxes)
|
||||
list_patch_labels.append(patch_boxes.shape[0] * [lbl_idx])
|
||||
# vis_detections(center_im, patch_boxes, max_vis=200, labels="")
|
||||
|
||||
patch_boxes = np.concatenate(list_patch_boxes, axis=0)
|
||||
patch_labels = np.concatenate(list_patch_labels, axis=0)
|
||||
# vis_detections(center_im, np.concatenate(list_patch_boxes, axis=0) , max_vis=1000, labels="")
|
||||
|
||||
# create segmentation map from boxes and labels
|
||||
seg_canvas = np.zeros(image_shape[:2])
|
||||
for bb, lbl in zip(patch_boxes, patch_labels):
|
||||
pad = padding
|
||||
bb[:2] = bb[:2] - pad
|
||||
bb[2:] = bb[2:] + pad
|
||||
# print patch_box, patch_lbl
|
||||
seg_canvas[bb[1]:bb[3], bb[0]:bb[2]] = lbl
|
||||
|
||||
return seg_canvas
|
||||
|
||||
|
||||
def compute_line_length_from_tl(group, stats, b=128.): # 128 / (2 * 32)
|
||||
# collect widths
|
||||
widths = np.zeros(len(group))
|
||||
for ii, (sidx, sign_rec) in enumerate(group.iterrows()):
|
||||
widths[ii] = stats.get_sign_width(sign_rec.lbl, sign_width=1) * b
|
||||
# compute offsets and line length
|
||||
sign_xpos = widths.cumsum() - (widths / 2.)
|
||||
line_len = widths.sum()
|
||||
# add columns to group
|
||||
group['prior_line_len'] = np.rint(line_len)
|
||||
group.loc[group.index, 'prior_sign_xoff'] = np.rint(sign_xpos)
|
||||
group.loc[group.index, 'prior_sign_width'] = np.rint(widths)
|
||||
|
||||
return group
|
||||
|
||||
|
||||
##### full pipeline
|
||||
|
||||
|
||||
def post_process_line_detections(lbl_ind_x, num_lines, len_min, len_max, verbose=True):
|
||||
# identify lines and merge them if too close together
|
||||
# line hypothesis are stored in line_hypos dataframe
|
||||
# line_hypos.label indicates which lines are grouped together(merged) -> line_hypo_agg
|
||||
|
||||
# (0) perform skeletonization
|
||||
skeleton = skeletonize(lbl_ind_x)
|
||||
# skeleton = skeletonize_3d(lbl_ind)
|
||||
# skeleton = thin(lbl_ind)
|
||||
|
||||
# (1) compute hough transform
|
||||
h, theta, d, theta_range, theta_range2 = compute_hough_transform(skeleton, skeleton) # skeleton, lbl_ind_x,
|
||||
|
||||
# (I) find peaks in hough transform
|
||||
num_peaks_factor = 1.9 # 1.5 1.6 v007: 1.9 v047: 2.5 # line detector dependent (VIP)
|
||||
hl_peak_threshold = (h.max() / float(len_max)) / 2. * len_min # 2. # has impact on lenght of lines found
|
||||
accums, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=14,
|
||||
num_peaks=int(num_lines * num_peaks_factor),
|
||||
threshold=hl_peak_threshold)
|
||||
|
||||
# ugly patch for hough_line_peaks shortcomings
|
||||
# in rare cases len(accums) != len(angles) or len(dists
|
||||
if len(accums) != len(angles):
|
||||
angles = accums
|
||||
dists = accums
|
||||
|
||||
# (II) check if lines intersect close to the center and group them accordingly
|
||||
interval = [lbl_ind_x.shape[1] * 1 / 8., lbl_ind_x.shape[1] * 7 / 8.]
|
||||
X_dist = pdist(np.stack([angles, dists], axis=1), lambda l1, l2: do_intersect_in_interval(l1, l2, interval))
|
||||
labels = compute_group_labels_from_dists(X_dist).astype(int)
|
||||
if verbose:
|
||||
print('detected groups: {} | num lines: {}.'.format(len(np.unique(labels)), num_lines))
|
||||
# collect lines in dataframe
|
||||
line_hypos = pd.DataFrame({'accum': accums, 'angle': angles, 'dist': dists, 'label': labels})
|
||||
line_hypos_agg = line_hypos.groupby('label').mean()
|
||||
# add group diff column
|
||||
diffs = line_hypos_agg.dist.sort_values().diff()
|
||||
# compute interline median
|
||||
dist_interline_median = diffs.median()
|
||||
|
||||
# (III) check if remaining groups are very close
|
||||
X_dist = pdist(np.stack([line_hypos_agg.angle.values, line_hypos_agg.dist.values], axis=1),
|
||||
lambda l1, l2: nearby_and_near_parallel_2(l1, l2, dist_interline_median, interval))
|
||||
updated_labels = compute_group_labels_from_dists(X_dist).astype(int)
|
||||
# update dataframe and grouping
|
||||
line_hypos.label.replace(to_replace=line_hypos_agg.index.values, value=updated_labels, inplace=True)
|
||||
|
||||
# (IV) re-group with updated labels and get line meta for later usage
|
||||
line_hypos_agg = line_hypos.groupby('label').mean()
|
||||
# group lines and remember index (needs to be here)
|
||||
group2line = line_hypos_agg.index.values
|
||||
# add column group dist diff
|
||||
diffs = line_hypos_agg.dist.sort_values().diff()
|
||||
diffs.name = 'group_diff' # set name before join
|
||||
line_hypos = line_hypos.join(diffs, on='label')
|
||||
# add column group angle diff
|
||||
angle_diff = line_hypos_agg.sort_values('dist').angle.apply(np.rad2deg).diff()
|
||||
angle_diff.name = 'group_angle_diff' # set name before join
|
||||
line_hypos = line_hypos.join(angle_diff, on='label')
|
||||
# compute interline median
|
||||
dist_interline_median = diffs.median()
|
||||
if verbose:
|
||||
print('Update: detected groups: {} | num lines: {}.'.format(len(line_hypos_agg), num_lines))
|
||||
|
||||
# (V) label line detection segments according to line hypos
|
||||
# compute probabilistic hough transform for lines
|
||||
# line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=15,
|
||||
# line_gap=6, theta=basic_theta)
|
||||
line_length = 8 # v007: 8 v047: 5 # line detector dependent (VIP)
|
||||
if len_max < line_length:
|
||||
line_length = len_max
|
||||
line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=line_length,
|
||||
line_gap=6, theta=theta_range)
|
||||
if len(line_segs) > 0:
|
||||
# assign line segments to nearest line
|
||||
ls_labels = assign_line_segments_to_lines(line_segs, line_hypos)
|
||||
|
||||
# associate segments with lines
|
||||
segm_labels, im_marker = associate_segments_with_lines(lbl_ind_x, line_segs, ls_labels, group2line)
|
||||
else:
|
||||
segm_labels, ls_labels = lbl_ind_x, None # set segm_labels so that line_frag gets a shape
|
||||
|
||||
return line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line, h, theta, d, skeleton
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,70 @@
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
|
||||
from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
|
||||
from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector)
|
||||
from ..utils.path_utils import make_folder
|
||||
|
||||
from skimage.morphology import skeletonize
|
||||
|
||||
|
||||
def gen_line_detections(didx_list, dataset, saa_version, relative_path,
|
||||
line_model_version, model_fcn, re_transform, device,
|
||||
save_line_detections):
|
||||
|
||||
# for seg_im, seg_idx in dataset:
|
||||
# iterate over segments
|
||||
for didx in tqdm(didx_list, desc=saa_version):
|
||||
# print(didx)
|
||||
seg_im, gt_boxes, gt_labels = dataset[didx]
|
||||
|
||||
# access meta
|
||||
seg_rec = dataset.get_seg_rec(didx)
|
||||
image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
|
||||
# make seg image is large enough for line detector
|
||||
if seg_im.size[0] > 224 and seg_im.size[1] > 224:
|
||||
|
||||
# prepare input
|
||||
inputs = preprocess_line_input(seg_im, 1, shift=0)
|
||||
center_im = re_transform(inputs[4]) # to pil image
|
||||
center_im = np.asarray(center_im) # to numpy
|
||||
|
||||
try:
|
||||
# apply network
|
||||
output = apply_detector(inputs, model_fcn, device)
|
||||
# visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
|
||||
# plt.show()
|
||||
|
||||
# prepare output
|
||||
outprob = np.mean(output, axis=0)
|
||||
lbl_ind = np.argmax(outprob, axis=0)
|
||||
|
||||
lbl_ind_x = lbl_ind.copy()
|
||||
lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0 # 7
|
||||
|
||||
lbl_ind_80 = lbl_ind.copy()
|
||||
lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0 # remove squeeze() from outprob in order to fix a bug!
|
||||
|
||||
# save line detections
|
||||
if save_line_detections:
|
||||
# line result folder
|
||||
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
|
||||
make_folder(line_res_path)
|
||||
|
||||
# save lbl_ind_x
|
||||
outfile = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
|
||||
np.save(outfile, lbl_ind_x.astype(bool))
|
||||
|
||||
if False:
|
||||
# compute skeleton
|
||||
skeleton = skeletonize(lbl_ind_x)
|
||||
|
||||
# save skeleton
|
||||
outfile = "{}/{}_skeleton.npy".format(line_res_path, res_name)
|
||||
np.save(outfile, skeleton.astype(bool))
|
||||
except Exception as e:
|
||||
# Usually CUDA error: out of memory
|
||||
print res_name, e.message, e.args
|
||||
|
||||
@@ -0,0 +1,184 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from tqdm import tqdm
|
||||
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
import torchvision.transforms as transforms
|
||||
|
||||
from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
|
||||
|
||||
from ..alignment.LineFragment import plot_boxes
|
||||
from ..utils.path_utils import make_folder
|
||||
|
||||
from ..utils.torchcv.box_coder_retina import RetinaBoxCoder
|
||||
from ..utils.torchcv.box_coder_fpnssd import FPNSSDBoxCoder
|
||||
from ..utils.torchcv.box import box_nms
|
||||
from ..utils.torchcv.evaluations.voc_eval import voc_eval
|
||||
|
||||
from ..evaluations.sign_evaluation_prep import (prepare_ssd_outputs_for_eval, prepare_ssd_gt_for_eval,
|
||||
get_pred_boxes_df, get_gt_boxes_df)
|
||||
from ..evaluations.sign_evaluation import eval_detector, eval_detector_on_collection
|
||||
from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
|
||||
|
||||
|
||||
def gen_ssd_detections(didx_list, dataset, saa_version, relative_path,
|
||||
model_version, fpnssd_net, with_64, create_bg_class, device,
|
||||
test_min_score_thresh, test_nms_thresh, eval_ovthresh,
|
||||
save_detections, show_detections, with_4_aspects=False, verbose_mode=True, return_eval=False):
|
||||
|
||||
list_pred_boxes_df, list_gt_boxes_df = [], []
|
||||
list_seg_ap, list_seg_name_with_anno = [], []
|
||||
|
||||
# setup evaluators
|
||||
use_new_eval = True
|
||||
num_classes = 240
|
||||
# eval_basic = SignEvalBasic(model_version, saa_version, eval_ovthresh)
|
||||
eval_fast = SignEvalFast(model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
|
||||
|
||||
# iterate over segments
|
||||
for didx in tqdm(didx_list, desc=saa_version):
|
||||
# print(didx)
|
||||
seg_im, gt_boxes, gt_labels = dataset[didx]
|
||||
|
||||
# access meta
|
||||
seg_rec = dataset.get_seg_rec(didx)
|
||||
image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
|
||||
|
||||
# for plots
|
||||
input_im = np.asarray(seg_im)
|
||||
|
||||
# prepare box coder
|
||||
# box_coder = RetinaBoxCoder()
|
||||
box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects, create_bg_class=create_bg_class)
|
||||
|
||||
# prepare input
|
||||
inputs = transforms.Compose([transforms.ToTensor(),
|
||||
transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
|
||||
inputs = inputs.unsqueeze(0)
|
||||
|
||||
with torch.no_grad():
|
||||
loc_preds, cls_preds = fpnssd_net(inputs.to(device))
|
||||
|
||||
box_preds, label_preds, score_preds = box_coder.decode(
|
||||
loc_preds.cpu().data.squeeze(),
|
||||
F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
|
||||
score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
|
||||
|
||||
if show_detections:
|
||||
# plot prediction
|
||||
plt.figure(figsize=(10, 10))
|
||||
plot_boxes(box_preds, confidence=score_preds)
|
||||
plt.imshow(input_im, cmap='gray')
|
||||
plt.grid(True, color='w', linestyle=':')
|
||||
plt.show()
|
||||
|
||||
# vis_detections(input_im, box_preds, scores=score_preds, labels=label_preds,
|
||||
# thresh=0.01, max_vis=300, figs_sz=(15, 15)) #lbl2lbl[labels]
|
||||
# plt.show()
|
||||
|
||||
# convert detections to all boxes format
|
||||
all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
|
||||
|
||||
if save_detections:
|
||||
res_name = "{}{}".format(image_name, view_desc)
|
||||
res_path = "{}results/results_ssd/{}/{}".format(relative_path, model_version, saa_version)
|
||||
|
||||
# check folder
|
||||
make_folder(res_path)
|
||||
|
||||
if True:
|
||||
# Save detections
|
||||
# outfile = "{}/{}.npy".format(res_path, res_name)
|
||||
# np.save(outfile, scores)
|
||||
|
||||
# save all_boxes
|
||||
outfile = "{}/{}_all_boxes.npy".format(res_path, res_name)
|
||||
np.save(outfile, all_boxes)
|
||||
|
||||
if gt_boxes is not None:
|
||||
|
||||
if 0:
|
||||
if verbose_mode:
|
||||
# [METHOD A]: evaluate for a single segment (in tensor format)
|
||||
print(voc_eval([box_preds.clone()], [label_preds.clone()], [score_preds.clone()],
|
||||
[gt_boxes.clone()], [gt_labels.clone()], None,
|
||||
iou_thresh=eval_ovthresh, use_07_metric=False)['map'])
|
||||
|
||||
# convert gt to numpy format
|
||||
gt_boxes, gt_labels = prepare_ssd_gt_for_eval(gt_boxes, gt_labels)
|
||||
|
||||
if use_new_eval:
|
||||
list_seg_name_with_anno.append(image_name + view_desc)
|
||||
if verbose_mode:
|
||||
print(image_name, view_desc)
|
||||
# standard mAP eval
|
||||
# eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
|
||||
# fast evaluation
|
||||
eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
|
||||
else:
|
||||
if verbose_mode:
|
||||
# [METHOD B]: evaluate mAP and print stats for a single segment
|
||||
# (these results can strongly differ from collection-wise evaluation)
|
||||
acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=eval_ovthresh)
|
||||
# collect results
|
||||
list_seg_ap.append(df_stats['ap'].mean())
|
||||
list_seg_name_with_anno.append(image_name + view_desc)
|
||||
|
||||
# prepare full collection evaluation
|
||||
list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_rec.segm_idx))
|
||||
list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_rec.segm_idx))
|
||||
|
||||
# full collection eval
|
||||
if use_new_eval:
|
||||
eval_fast.prepare_eval_collection()
|
||||
df_stats, global_ap = eval_fast.eval_collection(verbose=verbose_mode)
|
||||
if return_eval:
|
||||
return global_ap, df_stats, eval_fast
|
||||
else:
|
||||
if verbose_mode:
|
||||
return eval_fast.list_seg_mean_ap, list_seg_name_with_anno
|
||||
else:
|
||||
return global_ap, df_stats
|
||||
else:
|
||||
acc = 0
|
||||
df_stats = pd.DataFrame()
|
||||
if len(list_gt_boxes_df) > 0:
|
||||
# [METHOD C]: compute mAP across all instances of individual classes
|
||||
# (these results can strongly differ from segment-wise evaluation)
|
||||
gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
|
||||
pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
|
||||
acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=eval_ovthresh)
|
||||
|
||||
if verbose_mode:
|
||||
return list_seg_ap, list_seg_name_with_anno
|
||||
else:
|
||||
return acc, df_stats
|
||||
|
||||
|
||||
def get_detections(fpnssd_net, device, seg_im, with_64, with_4_aspects, create_bg_class,
|
||||
test_nms_thresh, test_min_score_thresh):
|
||||
# prepare box coder
|
||||
# box_coder = RetinaBoxCoder()
|
||||
box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects,
|
||||
create_bg_class=create_bg_class)
|
||||
|
||||
# prepare input
|
||||
inputs = transforms.Compose([transforms.Lambda(lambda x: x.convert('L')),
|
||||
transforms.ToTensor(),
|
||||
transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
|
||||
inputs = inputs.unsqueeze(0)
|
||||
|
||||
with torch.no_grad():
|
||||
loc_preds, cls_preds = fpnssd_net(inputs.to(device))
|
||||
|
||||
box_preds, label_preds, score_preds = box_coder.decode(
|
||||
loc_preds.cpu().data.squeeze(),
|
||||
F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
|
||||
score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
|
||||
|
||||
# convert detections to all boxes format
|
||||
all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
|
||||
|
||||
return all_boxes
|
||||
@@ -0,0 +1,194 @@
|
||||
import torch
|
||||
from torch.autograd import Variable
|
||||
import torchvision
|
||||
from torchvision.transforms import Resize, FiveCrop, CenterCrop
|
||||
|
||||
from ..utils.transform_utils import crop_pil_image
|
||||
|
||||
|
||||
def crop_segment_from_tablet_im(pil_im, seg_bbox, context_pad_frac=0):
|
||||
"""
|
||||
|
||||
:param pil_im: full tablet image to crop from
|
||||
:param seg_bbox: bbox coordinates [xmin, ymin, xmax, ymax]
|
||||
:param context_pad_frac: the fraction of the minimum side length of bbox to use as padding
|
||||
:return: cropped segment as pil image
|
||||
"""
|
||||
min_side = min((seg_bbox[2] - seg_bbox[0], seg_bbox[3]-seg_bbox[1]))
|
||||
context_pad = min_side * context_pad_frac
|
||||
# crop segment
|
||||
segment_crop, new_bbox = crop_pil_image(pil_im, seg_bbox, context_pad=context_pad, pad_to_square=False)
|
||||
return segment_crop, new_bbox
|
||||
|
||||
|
||||
def rescale_segment_single(pil_im, scale):
|
||||
""" Produce PIL image of segment at selected scale
|
||||
|
||||
:param pil_im: tablet segment that is to be processed
|
||||
:param scale: scale used for resizing
|
||||
:return: PIL image
|
||||
"""
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
imw = int(imw * scale)
|
||||
imh = int(imh * scale)
|
||||
# compose transforms
|
||||
tablet_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
Resize((imh, imw)), # resize according to scale
|
||||
])
|
||||
# apply transforms
|
||||
input_im = tablet_transform(pil_im)
|
||||
return input_im
|
||||
|
||||
|
||||
def preprocess_segment_single(pil_im, scale):
|
||||
""" produce tensor of segment at selected scale
|
||||
|
||||
:param pil_im: tablet segment that is to be processed
|
||||
:param scale: scale used for resizing
|
||||
:return: 4D tensors
|
||||
"""
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
imw = int(imw * scale)
|
||||
imh = int(imh * scale)
|
||||
# compose transforms
|
||||
tablet_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
Resize((imh, imw)), # resize according to scale
|
||||
# tensor-space transforms
|
||||
torchvision.transforms.ToTensor(),
|
||||
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
|
||||
])
|
||||
# apply transforms
|
||||
input_tensor = tablet_transform(pil_im).unsqueeze(0)
|
||||
return input_tensor
|
||||
|
||||
|
||||
def preprocess_segment_multi_scale(pil_im, scales):
|
||||
""" produces multiple copies of the segment at different scales
|
||||
|
||||
:param pil_im: tablet segment that is to be processed
|
||||
:param scales: list of scales
|
||||
:return: list of 3D tensors with different shapes (according to scales)
|
||||
"""
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
# tensor-space transforms
|
||||
ts_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.ToTensor(),
|
||||
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
|
||||
])
|
||||
# compose transforms
|
||||
tablet_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
# resize according to scales
|
||||
torchvision.transforms.Lambda(
|
||||
lambda crop: [Resize((int(imh * scale), int(imw * scale)))(crop) for scale in scales]),
|
||||
torchvision.transforms.Lambda(
|
||||
lambda scaled_crops: [ts_transform(crop) for crop in scaled_crops]), # returns a 4D tensor
|
||||
])
|
||||
# apply transforms
|
||||
im_list = tablet_transform(pil_im)
|
||||
return im_list
|
||||
|
||||
|
||||
def preprocess_segment_for_eval(pil_im, scale, shift=0):
|
||||
""" produces five copies of the segment at slightly different offsets
|
||||
|
||||
:param pil_im: tablet segment that is to be processed
|
||||
:param scale: scale which should be used for resizing
|
||||
:param shift: offset shift used to produce five-fold oversampling
|
||||
:return: 4D tensor with 5xCxWxH
|
||||
"""
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
imw = int(imw * scale)
|
||||
imh = int(imh * scale)
|
||||
# determine crop size
|
||||
crop_sz = [int(imh - shift), int(imw - shift)]
|
||||
# tensor-space transforms
|
||||
ts_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.ToTensor(),
|
||||
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
|
||||
])
|
||||
# compose transforms
|
||||
tablet_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
Resize((imh, imw)), # resize according to scale
|
||||
FiveCrop((crop_sz[0], crop_sz[1])), # oversample
|
||||
torchvision.transforms.Lambda(
|
||||
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
|
||||
])
|
||||
# apply transforms
|
||||
input_tensor = tablet_transform(pil_im)
|
||||
return input_tensor
|
||||
|
||||
|
||||
def predict_im_list(model, im_list, use_gpu, min_sz=227):
|
||||
""" applies model to list of 3D tensors (unlike a 4D tensor in predict())
|
||||
|
||||
:param model: network module that is used for the prediction
|
||||
:param im_list: list of 3D tensors
|
||||
:param use_gpu: boolean that indicates whether GPU is available
|
||||
:param min_sz: minimum side length of input
|
||||
:return: list of result tensors
|
||||
"""
|
||||
# apply network model
|
||||
outputs = []
|
||||
for in_im in im_list:
|
||||
if (in_im.shape[1] >= min_sz) and (in_im.shape[2] >= min_sz):
|
||||
# prepare input
|
||||
if use_gpu:
|
||||
in_var = Variable(in_im.cuda(), volatile=True) # volatile=True -> faster, less memory usage
|
||||
else:
|
||||
in_var = Variable(in_im, volatile=True)
|
||||
output = model(in_var.unsqueeze(0))
|
||||
outputs.append(output.data.cpu().numpy())
|
||||
else:
|
||||
outputs.append(None)
|
||||
|
||||
# convert to numpy
|
||||
return outputs
|
||||
|
||||
|
||||
def predict(model, inputs, use_gpu, use_bbox_reg=False):
|
||||
""" applies model to 4D tensor (batch of images)
|
||||
|
||||
:param model: network module that is used for the prediction
|
||||
:param inputs: 4D tensor (batch of images)
|
||||
:param use_gpu: boolean that indicates whether GPU is available
|
||||
:param use_bbox_reg: boolean that indicates whether to use bbox regression
|
||||
:return: result tensor
|
||||
"""
|
||||
# prepare input
|
||||
if use_gpu:
|
||||
inputs = Variable(inputs.cuda(), volatile=True) # volatile=True -> faster, less memory usage
|
||||
else:
|
||||
inputs = Variable(inputs, volatile=True)
|
||||
# apply network model
|
||||
# output = model(inputs) # consumes to much memory
|
||||
|
||||
if use_bbox_reg:
|
||||
scores, bboxes = [], []
|
||||
for in_im in inputs:
|
||||
o1, o2 = model(in_im.unsqueeze(0))
|
||||
scores.append(o1)
|
||||
bboxes.append(o2)
|
||||
# concat and convert to numpy
|
||||
output = torch.cat(scores, dim=0)
|
||||
predicted = output.data.cpu().numpy()
|
||||
output = torch.cat(bboxes, dim=0)
|
||||
predicted_roi = output.data.cpu().numpy()
|
||||
else:
|
||||
scores = []
|
||||
for in_im in inputs:
|
||||
scores.append(model(in_im.unsqueeze(0)))
|
||||
# concat and convert to numpy
|
||||
output = torch.cat(scores, dim=0)
|
||||
predicted = output.data.cpu().numpy()
|
||||
predicted_roi = []
|
||||
|
||||
return predicted, predicted_roi
|
||||
|
||||
@@ -0,0 +1,121 @@
|
||||
# --------------------------------------------------------
|
||||
# Adapted from Ross Girshick's Fast/er R-CNN code
|
||||
# --------------------------------------------------------
|
||||
|
||||
import os.path as osp
|
||||
import numpy as np
|
||||
# `pip install easydict` if you don't have it
|
||||
from easydict import EasyDict as edict
|
||||
|
||||
__C = edict()
|
||||
# Consumers can get config by:
|
||||
# from fast_rcnn_config import cfg
|
||||
cfg = __C
|
||||
|
||||
#
|
||||
# Detector options [legacy support]
|
||||
# These options are only used for the Basic evaluation method, if not specified in the eval scripts directly.
|
||||
#
|
||||
|
||||
__C.TEST = edict()
|
||||
|
||||
# Number classes considered during testing
|
||||
__C.TEST.NUM_CLASSES = 240
|
||||
|
||||
# Min score for any class
|
||||
# (if not any score larger than thresh, suppress box)
|
||||
__C.TEST.SCORE_MIN_THRESH = 0.05 # 0.01
|
||||
|
||||
# Score threshold for ROI to be considered background
|
||||
# (if bg score in (THRESH, 1], suppress box)
|
||||
__C.TEST.SCORE_BG_THRESH = 0.7
|
||||
|
||||
# Overlap threshold used for non-maximum suppression (suppress boxes with
|
||||
# IoU >= this threshold)
|
||||
__C.TEST.NMS = 0.3
|
||||
|
||||
# Test using bounding-box regressors (only works if a network trained for bbox_reg is evaluated)
|
||||
__C.TEST.BBOX_REG = True
|
||||
|
||||
# Shift applied to the five different crops during oversampling
|
||||
__C.TEST.SHIFT = 24
|
||||
|
||||
# Min overlap with ground truth box for positive detection (if IoU < this threshold, detection is a false positive)
|
||||
__C.TEST.TP_MIN_OVERLAP = 0.5 # 0.4
|
||||
|
||||
|
||||
# Data directory
|
||||
__C.DATA_DIR = '/home/tobias/Datasets/cuneiform/'
|
||||
|
||||
# tablet directories
|
||||
__C.DATA_TEST_DIR = __C.DATA_DIR + 'test_images/'
|
||||
|
||||
|
||||
|
||||
# FUNCTIONS for loading cfg
|
||||
#
|
||||
|
||||
|
||||
def _merge_a_into_b(a, b):
|
||||
"""Merge config dictionary a into config dictionary b, clobbering the
|
||||
options in b whenever they are also specified in a.
|
||||
"""
|
||||
if type(a) is not edict:
|
||||
return
|
||||
|
||||
for k, v in a.iteritems():
|
||||
# a must specify keys that are in b
|
||||
if k not in b: # not b.has_key(k):
|
||||
raise KeyError('{} is not a valid config key'.format(k))
|
||||
|
||||
# the types must match, too
|
||||
old_type = type(b[k])
|
||||
if old_type is not type(v):
|
||||
if isinstance(b[k], np.ndarray):
|
||||
v = np.array(v, dtype=b[k].dtype)
|
||||
else:
|
||||
raise ValueError(('Type mismatch ({} vs. {}) '
|
||||
'for config key: {}').format(type(b[k]),
|
||||
type(v), k))
|
||||
|
||||
# recursively merge dicts
|
||||
if type(v) is edict:
|
||||
try:
|
||||
_merge_a_into_b(a[k], b[k])
|
||||
except:
|
||||
print('Error under config key: {}'.format(k))
|
||||
raise
|
||||
else:
|
||||
b[k] = v
|
||||
|
||||
|
||||
def cfg_from_file(filename):
|
||||
"""Load a config file and merge it into the default options."""
|
||||
import yaml
|
||||
with open(filename, 'r') as f:
|
||||
yaml_cfg = edict(yaml.load(f))
|
||||
|
||||
_merge_a_into_b(yaml_cfg, __C)
|
||||
|
||||
|
||||
def cfg_from_list(cfg_list):
|
||||
"""Set config keys via list (e.g., from command line)."""
|
||||
from ast import literal_eval
|
||||
assert len(cfg_list) % 2 == 0
|
||||
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
|
||||
key_list = k.split('.')
|
||||
d = __C
|
||||
for subkey in key_list[:-1]:
|
||||
assert subkey in d # d.has_key(subkey)
|
||||
d = d[subkey]
|
||||
subkey = key_list[-1]
|
||||
assert subkey in d # d.has_key(subkey)
|
||||
try:
|
||||
value = literal_eval(v)
|
||||
except:
|
||||
# handle the case when v is a string literal
|
||||
value = v
|
||||
assert type(value) == type(d[subkey]), \
|
||||
'type {} does not match original type {}'.format(
|
||||
type(value), type(d[subkey]))
|
||||
d[subkey] = value
|
||||
@@ -0,0 +1,448 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
from scipy.spatial.distance import cdist
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
from ast import literal_eval
|
||||
|
||||
import os.path
|
||||
|
||||
from ..alignment.LineFragment import compute_line_endpoints_by_hypo_idx
|
||||
from ..detection.detection_helpers import radius_in_image
|
||||
from ..detection.line_detection import line_params_from_pts, hess_normal_form_from_pts, dist_lineseg_line
|
||||
|
||||
|
||||
class LineAnnotations(object):
|
||||
|
||||
def __init__(self, collection_name, coll_scales=None, interline_dist=128/2., relative_path='../'):
|
||||
# basic paths
|
||||
self.num_classes = 2
|
||||
self.path_to_data_products = '{}data/annotations/'.format(relative_path)
|
||||
self.coll_scales = coll_scales
|
||||
self.interline_dist = interline_dist
|
||||
|
||||
# load collection annotations
|
||||
self.anno_df = self.load_collection_annotations(collection_name)
|
||||
|
||||
if len(self.anno_df) > 0:
|
||||
print('Load line annotations for {} dataset: {} found!'.format(collection_name,
|
||||
self.anno_df.segm_idx.nunique()))
|
||||
else:
|
||||
print('No line annotations for {} dataset'.format(collection_name))
|
||||
|
||||
def load_collection_annotations(self, collection_name):
|
||||
# assemble annotation file path
|
||||
annotation_file = 'line_annotations_{}.csv'.format(collection_name)
|
||||
annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
|
||||
|
||||
# check if annotation file exists
|
||||
if os.path.isfile(annotation_file_path):
|
||||
# read annotation file
|
||||
anno_df = pd.read_csv(annotation_file_path, engine='python')
|
||||
# apply scale
|
||||
if self.coll_scales is not None:
|
||||
scale_vec = self.coll_scales[anno_df.segm_idx].values
|
||||
anno_df.x = (anno_df.x * scale_vec).round().astype(int)
|
||||
anno_df.y = (anno_df.y * scale_vec).round().astype(int)
|
||||
# assemble line segs
|
||||
anno_df = anno_df.groupby('segm_idx').apply(assemble_line_segments)
|
||||
|
||||
## 0) prepare meta data columns
|
||||
# add ls_x_seperate column (depends on assemble_line_segments)
|
||||
anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(add_x_minmax)
|
||||
anno_df = anno_df.groupby('segm_idx').apply(mark_x_seperate)
|
||||
# add dist and dist_avg column
|
||||
anno_df['dist'] = anno_df.line_segs.apply(set_line_param)
|
||||
anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(set_mean)
|
||||
##print anno_df
|
||||
# add ls_vert_nb column (depends on assemble_line_segments)
|
||||
#anno_df = anno_df.groupby('segm_idx').apply(mark_vert_nb, self.interline_dist * 0.8)
|
||||
|
||||
## 1) group lines together
|
||||
# set inline
|
||||
#anno_df['inline'] = [np.intersect1d(*el) for el in anno_df[['ls_vert_nb', 'ls_x_separate']].values]
|
||||
#anno_df['inline'] = [np.empty(0, dtype=int)] * len(anno_df)
|
||||
anno_df['inline'] = pd.Series([np.empty(0, dtype=int)] * len(anno_df), index=anno_df.index)
|
||||
|
||||
# further group line segments by order and ls_x_separate (should be respected when annotating data!)
|
||||
anno_df = anno_df.groupby('segm_idx').apply(group_ls_by_order, self.interline_dist * 5) # * 3
|
||||
|
||||
# assign actual line idx
|
||||
anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
|
||||
|
||||
## 2) refine ordering
|
||||
# reset dist_avg based on gt_line_idx
|
||||
anno_df = anno_df.groupby(['segm_idx', 'gt_line_idx']).apply(set_mean)
|
||||
# assign actual line idx again
|
||||
anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
|
||||
|
||||
# return data frame
|
||||
return anno_df
|
||||
else:
|
||||
# return empty list (check later with len(.) to see if file exists)
|
||||
return []
|
||||
|
||||
def select_df_by_segm_idx(self, segm_idx):
|
||||
assert len(self.anno_df) > 0, 'No annotations available!'
|
||||
# wrap pandas logic
|
||||
return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
|
||||
|
||||
def visualize_line_annotations(self, segm_idx, input_im, show_line_seg_idx=False):
|
||||
# plot line annotations
|
||||
# get segment data frame
|
||||
seg_line_df = self.select_df_by_segm_idx(segm_idx)
|
||||
|
||||
# check if any anno
|
||||
if len(seg_line_df) > 0:
|
||||
# create basic plot
|
||||
fig, axes = plt.subplots(figsize=(10, 10))
|
||||
|
||||
grouped = seg_line_df.groupby('line_idx')
|
||||
|
||||
color = plt.cm.jet(np.linspace(0, 1, np.max(seg_line_df.line_idx) + 2))
|
||||
for i, line_rec in grouped:
|
||||
gt_line_idx = line_rec.gt_line_idx.values[0]
|
||||
line_idx = line_rec.line_idx.values[0]
|
||||
# print line_rec
|
||||
axes.plot(line_rec.x.values, line_rec.y.values, linewidth=5, color=color[gt_line_idx],)
|
||||
axes.text(line_rec.x.values[0], line_rec.y.values[0], '{}'.format(gt_line_idx),
|
||||
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
|
||||
if show_line_seg_idx:
|
||||
axes.text(line_rec.x.values[1], line_rec.y.values[1], '{}'.format(line_idx),
|
||||
bbox=dict(facecolor='red', alpha=0.5), fontsize=8, color='white')
|
||||
# axes.set_yticks([])
|
||||
# axes.set_xticks([])
|
||||
|
||||
# plot last so that axis get overwritten (no need to remove ticks :)
|
||||
axes.imshow(input_im, cmap='gray')
|
||||
plt.show()
|
||||
|
||||
def get_hypo_line_labeling_for_segm(self, segm_idx, line_hypos_agg, verbose=False):
|
||||
|
||||
# select line segment ground truth
|
||||
seg_ls_df = self.select_df_by_segm_idx(segm_idx).copy()
|
||||
# from n points only n-1 segments -> remove empty ones
|
||||
seg_ls_df = seg_ls_df[seg_ls_df.line_segs.apply(len) > 0]
|
||||
|
||||
# check if any annotations found
|
||||
if len(seg_ls_df) > 0:
|
||||
# assign hypo lines to gt line segments
|
||||
gt_line_segs = seg_ls_df.line_segs.values.tolist()
|
||||
gt_ls_lbl, gt_ls_dist = assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg)
|
||||
# update dataframe
|
||||
seg_ls_df['hypo_line_lbl'] = gt_ls_lbl
|
||||
seg_ls_df['hypo_line_dist'] = np.sqrt(gt_ls_dist)
|
||||
# decide hypo line labels
|
||||
seg_ls_df = seg_ls_df.groupby(['gt_line_idx']).apply(decide_hypo_line_lbl)
|
||||
else:
|
||||
if verbose:
|
||||
print('No line ground truth available for segment idx [{}]!'.format(segm_idx))
|
||||
|
||||
return seg_ls_df
|
||||
|
||||
def get_assignment_for_line_hypos(self, segm_idx, line_hypos_agg):
|
||||
# create empty dummy for cases where no annotations available
|
||||
gt_line_assignment = pd.DataFrame()
|
||||
|
||||
if len(self.anno_df) > 0:
|
||||
# get labelling
|
||||
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
|
||||
|
||||
if len(seg_ls_df) > 0:
|
||||
# in case of multiple annotations per hypo line, pick the one with smallest distance
|
||||
gt_line_assignment = seg_ls_df.sort_values('hypo_line_dist').groupby('hypo_line_lbl').head(1)[
|
||||
['gt_line_idx', 'hypo_line_lbl']]
|
||||
gt_line_assignment = gt_line_assignment.sort_values('gt_line_idx')
|
||||
|
||||
return gt_line_assignment
|
||||
|
||||
def visualize_hypo_line_assignments(self, segm_idx, line_hypos_agg, input_im):
|
||||
# get labelling
|
||||
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
|
||||
gt_ls_lbl = seg_ls_df.hypo_line_lbl.values
|
||||
gt_line_segs = seg_ls_df.line_segs.values
|
||||
|
||||
# visualize
|
||||
visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, input_im)
|
||||
|
||||
def visualize_gt_lines_with_assignments(self, segm_idx, line_hypos_agg, center_im):
|
||||
|
||||
# gt assignment
|
||||
gt_line_assignment = self.get_assignment_for_line_hypos(segm_idx, line_hypos_agg)
|
||||
|
||||
# get labelling
|
||||
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
|
||||
gt_ls_lbl = seg_ls_df.gt_line_idx.values
|
||||
gt_line_segs = seg_ls_df.line_segs.values
|
||||
|
||||
# get line hypo endpoints
|
||||
list_hypo_endpts = [np.fliplr(np.array(compute_line_endpoints_by_hypo_idx(hidx, line_hypos_agg)).
|
||||
reshape(2, 2)).ravel() for hidx in gt_line_assignment.hypo_line_lbl.values]
|
||||
|
||||
# get color map
|
||||
color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1)) # len(np.unique(gt_ls_lbl))
|
||||
|
||||
fig, axes = plt.subplots(1, 2, figsize=(15, 7))
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(center_im, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
|
||||
ax[1].imshow(center_im * 0)
|
||||
for line, li in zip(gt_line_segs, gt_ls_lbl):
|
||||
p0, p1 = line
|
||||
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
|
||||
|
||||
ax[1].set_xlim((0, center_im.shape[1]))
|
||||
ax[1].set_ylim((center_im.shape[0], 0))
|
||||
ax[1].set_title('gt line segments and assigned line hypos')
|
||||
|
||||
for idx, line_pts in enumerate(list_hypo_endpts):
|
||||
ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
|
||||
ax[1].text(line_pts[0], line_pts[1], '{}'.format(idx),
|
||||
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
|
||||
|
||||
|
||||
|
||||
#### HELPERS
|
||||
|
||||
# create line segment column
|
||||
|
||||
def assemble_line_segments(group):
|
||||
# assemble line segments
|
||||
line_grouped = group.groupby('line_idx')
|
||||
line_segs = []
|
||||
# iterate over lines
|
||||
for lidx, lgroup in line_grouped:
|
||||
num_pts = len(lgroup)
|
||||
# iterate over segments
|
||||
for sidx in range(num_pts):
|
||||
# assemble segments
|
||||
if sidx == num_pts - 1:
|
||||
line_segs.append(())
|
||||
else:
|
||||
line_segs.append(((lgroup.iloc[sidx].x, lgroup.iloc[sidx].y),
|
||||
(lgroup.iloc[sidx + 1].x, lgroup.iloc[sidx + 1].y)
|
||||
))
|
||||
# assign to group
|
||||
group['line_segs'] = line_segs
|
||||
|
||||
return group
|
||||
|
||||
|
||||
# group line segments to line
|
||||
|
||||
def add_x_minmax(group):
|
||||
group['xmin'] = group.x.min()
|
||||
group['xmax'] = group.x.max()
|
||||
return group
|
||||
|
||||
|
||||
def mark_x_seperate(group):
|
||||
# iterate line segments
|
||||
list_left_or_right = []
|
||||
for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
|
||||
# create list of segments to the left
|
||||
index_left = group.line_idx[group.xmax < line_seg.xmin].unique()
|
||||
# create list of segments to the left
|
||||
index_right = group.line_idx[group.xmin > line_seg.xmax].unique()
|
||||
# concat and append to list
|
||||
list_left_or_right.append(np.concatenate([np.array(index_left), np.array(index_right)]))
|
||||
group['ls_x_separate'] = list_left_or_right
|
||||
return group
|
||||
|
||||
|
||||
def set_line_param(line_seg):
|
||||
|
||||
if len(line_seg) > 0:
|
||||
# use basic line equation
|
||||
#line_params = line_params_from_pts(line_seg[0], line_seg[1])
|
||||
|
||||
# use hess normal form (in corporates angle)
|
||||
line_params = hess_normal_form_from_pts(line_seg[0], line_seg[1])
|
||||
return line_params[1] # only interest in height
|
||||
else:
|
||||
return np.NaN
|
||||
|
||||
|
||||
def set_mean(group):
|
||||
group['dist_avg'] = group.dist.mean()
|
||||
return group
|
||||
|
||||
|
||||
def mark_vert_nb(group, interline_thresh):
|
||||
# iterate line segments
|
||||
list_vert_nb = []
|
||||
for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
|
||||
# create list of segments to the left
|
||||
index_vert_near = group.line_idx[(group.dist >= 0) &
|
||||
(np.abs(group.dist_avg - line_seg.dist_avg) < interline_thresh)].unique()
|
||||
list_vert_nb.append(np.array(index_vert_near))
|
||||
|
||||
group['ls_vert_nb'] = list_vert_nb
|
||||
return group
|
||||
|
||||
|
||||
# def make_inline_symmetric(group):
|
||||
# # iterate over line segments, and make symmetric reference of inline
|
||||
# for i, (sidx, line_seg) in enumerate(group.iterrows()):
|
||||
# if len(line_seg.inline) > 0:
|
||||
# select_inline = group.line_idx.isin(line_seg.inline)
|
||||
# group.loc[select_inline, 'inline'] = select_inline.sum() * [line_seg.inline]
|
||||
# # deal with type mismatch in column (did find no better way :/)
|
||||
# inline_list = []
|
||||
# for el in group.inline.astype(list).values:
|
||||
# if isinstance(el, np.ndarray):
|
||||
# inline_list.append(el)
|
||||
# else:
|
||||
# inline_list.append(np.array([el]))
|
||||
# group['inline'] = inline_list
|
||||
# # return
|
||||
# return group
|
||||
|
||||
|
||||
def group_ls_by_order(group, interline_thresh):
|
||||
last_lidx = -1
|
||||
last_xseparate = []
|
||||
# QUICK FIX: use this to deal with loc and list inserts (loc[idx] works rather than loc[idx, col]!!)
|
||||
group_inline = group.inline
|
||||
# iter line_idx aggregate
|
||||
# https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group/49148885#49148885
|
||||
ls_agg = group.sort_values('line_idx').groupby('line_idx').nth(0) #.first() is dangerous
|
||||
|
||||
for curr_lidx, ls_agg_rec in ls_agg.iterrows():
|
||||
if last_lidx != -1:
|
||||
# check if last line segment is x separate
|
||||
if np.any(np.isin(ls_agg_rec.ls_x_separate, last_lidx)):
|
||||
last_rec = ls_agg.loc[last_lidx]
|
||||
# check if last line segment on the left
|
||||
ls_left = (last_rec.xmax < ls_agg_rec.xmin)
|
||||
if ls_left:
|
||||
# check if vertical distance is small
|
||||
vert_dist_is_small = np.abs(last_rec.dist_avg - ls_agg_rec.dist_avg) < interline_thresh
|
||||
if vert_dist_is_small:
|
||||
# check if already inline
|
||||
if last_lidx not in ls_agg_rec.inline:
|
||||
# print('merge line segments {} with {}'.format(curr_lidx, last_lidx))
|
||||
# create new inlines
|
||||
# do not use ls_agg_rec.inline, since it does not get updated during loop
|
||||
#new_inline = np.concatenate([ls_agg_rec.inline, np.array([last_lidx])])
|
||||
#new_last_inline = np.concatenate([ls_agg_rec.inline, np.array([curr_lidx])])
|
||||
new_inline = np.concatenate([group_inline.loc[group.line_idx == curr_lidx].values[0], np.array([last_lidx])])
|
||||
new_last_inline = np.concatenate([group_inline.loc[group.line_idx == last_lidx].values[0], np.array([curr_lidx])])
|
||||
# add to data frame (loc[idx] works rather than loc[idx, col]!!)
|
||||
select_line_idx = (group.line_idx == curr_lidx)
|
||||
group_inline.loc[select_line_idx] = [new_inline] * select_line_idx.sum()
|
||||
select_line_idx = (group.line_idx == last_lidx)
|
||||
group_inline.loc[select_line_idx] = [new_last_inline] * select_line_idx.sum()
|
||||
|
||||
# set last values
|
||||
last_lidx = curr_lidx
|
||||
last_xseparate = ls_agg_rec.ls_x_separate
|
||||
return group
|
||||
|
||||
|
||||
# finalize assignment
|
||||
|
||||
def assign_actual_line_index(group):
|
||||
# create new column
|
||||
group['gt_line_idx'] = np.ones(len(group), dtype=int) * -1
|
||||
# iterate over line segments and assign acutal_line_idx (segs sorted by 1) y position 2) x position)
|
||||
new_idx = 0
|
||||
for sidx, line_seg in group.sort_values(['dist_avg', 'x']).iterrows():
|
||||
# check if index is already set
|
||||
if group.loc[sidx, 'gt_line_idx'] == -1:
|
||||
# assign index to line segment
|
||||
group.loc[group.line_idx == line_seg.line_idx, 'gt_line_idx'] = new_idx
|
||||
# assign same index to inline segments
|
||||
for lidx in line_seg.inline:
|
||||
group.loc[group.line_idx == lidx, 'gt_line_idx'] = new_idx
|
||||
# finally increment index
|
||||
new_idx += 1
|
||||
# if index is already set, extend it to all inline members
|
||||
else:
|
||||
curr_idx = group.loc[sidx, 'gt_line_idx']
|
||||
# assign same index to inline segments
|
||||
for lidx in line_seg.inline:
|
||||
group.loc[group.line_idx == lidx, 'gt_line_idx'] = curr_idx
|
||||
|
||||
return group
|
||||
|
||||
|
||||
# for eval need to assign detection lines to ground truth lines
|
||||
|
||||
def assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg):
|
||||
# get line pts from polar lines
|
||||
line_pts = []
|
||||
for idx in range(len(line_hypos_agg)):
|
||||
# compute line endpoints
|
||||
line_pts.append(compute_line_endpoints_by_hypo_idx(idx, line_hypos_agg))
|
||||
#line_pts.append(line_frag.compute_line_endpoints(-1, hypo_idx=i))
|
||||
|
||||
line_pts = np.vstack(line_pts)
|
||||
line_pts = np.flip(line_pts.reshape((-1, 2, 2)), axis=2).reshape(-1, 4)
|
||||
|
||||
# get line segments
|
||||
line_seg_pts = np.stack(gt_line_segs).reshape(len(gt_line_segs), -1)
|
||||
|
||||
# compute distance between line segments and lines
|
||||
X2_dist = cdist(line_pts, line_seg_pts,
|
||||
lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
|
||||
# assign line segments to nearest line
|
||||
ls_labels = np.argmin(X2_dist, axis=0)
|
||||
ls_dist = np.min(X2_dist, axis=0)
|
||||
|
||||
return ls_labels, ls_dist
|
||||
|
||||
|
||||
def decide_hypo_line_lbl(group):
|
||||
# count hypo line labels
|
||||
uv, counts = np.unique(group.hypo_line_lbl, return_counts=True)
|
||||
# get idx to all largest
|
||||
largest_select = (np.max(counts) == counts)
|
||||
# check if tiebreak is required
|
||||
if largest_select.sum() > 1:
|
||||
# for each similar large group compute mean hypo_line_dist and pick largest
|
||||
tiebreak_df = group.groupby('hypo_line_lbl').hypo_line_dist.mean()
|
||||
most_freq_hypo_lbl = tiebreak_df[uv[largest_select]].idxmax()
|
||||
else:
|
||||
most_freq_hypo_lbl = uv[np.argmax(counts)]
|
||||
|
||||
# assign most frequent label
|
||||
group['hypo_line_lbl'] = most_freq_hypo_lbl
|
||||
|
||||
return group
|
||||
|
||||
|
||||
# visualize
|
||||
|
||||
def visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, center_im, line_hypo_endpts=None):
|
||||
|
||||
color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1))
|
||||
|
||||
fig, axes = plt.subplots(1, 2, figsize=(15, 7))
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(center_im, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
|
||||
# ax[1].imshow(lbl_ind_x, cmap='gray')
|
||||
# ax[1].set_title('line det')
|
||||
|
||||
ax[1].imshow(center_im * 0)
|
||||
for line, li in zip(gt_line_segs, gt_ls_lbl):
|
||||
p0, p1 = line
|
||||
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
|
||||
ax[1].text(p0[0], p0[1], '{}'.format(li),
|
||||
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
|
||||
ax[1].set_xlim((0, center_im.shape[1]))
|
||||
ax[1].set_ylim((center_im.shape[0], 0))
|
||||
ax[1].set_title('gt line segments and assigned line hypos')
|
||||
|
||||
if line_hypo_endpts is not None:
|
||||
for idx, line_pts in enumerate(line_hypo_endpts):
|
||||
ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
|
||||
|
||||
|
||||
@@ -0,0 +1,15 @@
|
||||
# evaluate line-tl alignment using gt-line annotations
|
||||
# only quality indicator because transliterations are unreliable
|
||||
# (transliterations often miss lines or contain invisible lines)
|
||||
|
||||
|
||||
def eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines):
|
||||
tl_asst_eval = line_frag.line_hypos_agg[['gt_line_idx', 'tl_line']].dropna()
|
||||
tl_asst_eval = tl_asst_eval[tl_asst_eval.tl_line >= 0] # do not count unassigned lines
|
||||
print('LineHypos-TL assignment accuracy: {}'.format(
|
||||
(tl_asst_eval.gt_line_idx == tl_asst_eval.tl_line).mean()))
|
||||
# check if consistent gt and tl
|
||||
num_lines_tl = num_vis_lines # line_frag.tl_df.line_idx.nunique()
|
||||
num_lines_gt = lines_anno.select_df_by_segm_idx(seg_idx).gt_line_idx.nunique()
|
||||
if num_lines_tl != num_lines_gt:
|
||||
print('line annotation - transliteration mismatch: {} vs {} lines '.format(num_lines_gt, num_lines_tl))
|
||||
@@ -0,0 +1,512 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from tqdm import tqdm
|
||||
|
||||
from .config import cfg
|
||||
|
||||
from ..detection.detection_helpers import convert_detections_to_array
|
||||
from ..utils.bbox_utils import box_iou
|
||||
|
||||
|
||||
def voc_ap(rec, prec, use_07_metric=False):
|
||||
""" ap = voc_ap(rec, prec, [use_07_metric])
|
||||
Compute VOC AP given precision and recall.
|
||||
If use_07_metric is true, uses the
|
||||
VOC 07 11 point method (default:False).
|
||||
|
||||
Reference: Ross Girshick's Fast/er R-CNN code
|
||||
"""
|
||||
if use_07_metric:
|
||||
# 11 point metric
|
||||
ap = 0.
|
||||
for t in np.arange(0., 1.1, 0.1):
|
||||
if np.sum(rec >= t) == 0:
|
||||
p = 0
|
||||
else:
|
||||
p = np.max(prec[rec >= t])
|
||||
ap = ap + p / 11.
|
||||
else:
|
||||
# correct AP calculation
|
||||
# first append sentinel values at the end
|
||||
mrec = np.concatenate(([0.], rec, [1.]))
|
||||
mpre = np.concatenate(([0.], prec, [0.]))
|
||||
|
||||
# compute the precision envelope
|
||||
for i in range(mpre.size - 1, 0, -1):
|
||||
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
|
||||
|
||||
# to calculate area under PR curve, look for points
|
||||
# where X axis (recall) changes value
|
||||
i = np.where(mrec[1:] != mrec[:-1])[0]
|
||||
|
||||
# and sum (\Delta recall) * prec
|
||||
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
|
||||
return ap
|
||||
|
||||
|
||||
# *BASIC* AP COMPUTATION (Fast RCNN style)
|
||||
|
||||
def evaluate_on_gt(gt_boxes, gt_labels, num_images, all_boxes, ovthresh=None, num_classes=None, use_07_metric=False):
|
||||
# Reference: Ross Girshick's Fast/er R-CNN code
|
||||
|
||||
if ovthresh is None:
|
||||
ovthresh = cfg.TEST.TP_MIN_OVERLAP
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
|
||||
# all detections are collected into:
|
||||
# all_boxes[cls][image] = N x 5 array of detections in
|
||||
# (x1, y1, x2, y2, score)
|
||||
all_tp = [[[] for _ in xrange(num_images)]
|
||||
for _ in xrange(num_classes)]
|
||||
all_fp = [[[] for _ in xrange(num_images)]
|
||||
for _ in xrange(num_classes)]
|
||||
det_stats = []
|
||||
total_num_tp = 0
|
||||
total_false_cls = np.zeros(num_classes)
|
||||
for j in xrange(1, num_classes): # num_classes
|
||||
# if no detections for class available
|
||||
if len(all_boxes[j][0]) == 0:
|
||||
BB = np.empty((0, 4), dtype=np.float32)
|
||||
confidence = np.empty(0, dtype=np.float32)
|
||||
else:
|
||||
BB = all_boxes[j][0][:, :4]
|
||||
confidence = all_boxes[j][0][:, -1]
|
||||
|
||||
# sort by confidence
|
||||
sorted_ind = np.argsort(-confidence)
|
||||
sorted_scores = np.sort(-confidence)
|
||||
BB = BB[sorted_ind, :]
|
||||
inds = np.where(gt_labels == j)[0]
|
||||
BBGT = gt_boxes[inds, :].astype(float)
|
||||
npos = BBGT.shape[0]
|
||||
det = [False] * npos
|
||||
|
||||
if npos > 0: # else if no gt boxes available for class, AP computation is not meaningful
|
||||
|
||||
# go down dets and mark TPs and FPs
|
||||
nd = len(sorted_ind)
|
||||
tp = np.zeros(nd)
|
||||
fp = np.zeros(nd)
|
||||
cls_tp = []
|
||||
cls_fp = []
|
||||
for d in range(nd):
|
||||
bb = BB[d, :].astype(float)
|
||||
ovmax = -np.inf
|
||||
|
||||
if BBGT.size > 0:
|
||||
# compute overlaps
|
||||
# intersection
|
||||
ixmin = np.maximum(BBGT[:, 0], bb[0])
|
||||
iymin = np.maximum(BBGT[:, 1], bb[1])
|
||||
ixmax = np.minimum(BBGT[:, 2], bb[2])
|
||||
iymax = np.minimum(BBGT[:, 3], bb[3])
|
||||
iw = np.maximum(ixmax - ixmin + 1., 0.)
|
||||
ih = np.maximum(iymax - iymin + 1., 0.)
|
||||
inters = iw * ih
|
||||
|
||||
# union
|
||||
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
|
||||
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
|
||||
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
|
||||
|
||||
overlaps = inters / uni
|
||||
ovmax = np.max(overlaps)
|
||||
jmax = np.argmax(overlaps)
|
||||
|
||||
if ovmax > ovthresh:
|
||||
if not det[jmax]:
|
||||
tp[d] = 1.
|
||||
det[jmax] = 1
|
||||
cls_tp.append(d)
|
||||
else:
|
||||
# double detection (unlikely due to nms)
|
||||
fp[d] = 1.
|
||||
cls_fp.append(d) # comment?!
|
||||
else:
|
||||
fp[d] = 1.
|
||||
cls_fp.append(d)
|
||||
|
||||
# save tp detections
|
||||
all_tp[j][0] = np.array(cls_tp)
|
||||
# save fp detections
|
||||
all_fp[j][0] = np.array(cls_fp)
|
||||
# compute precision recall
|
||||
fp = np.cumsum(fp)
|
||||
tp = np.cumsum(tp)
|
||||
rec = tp / float(npos)
|
||||
# avoid divide by zero in case the first detection matches a difficult
|
||||
# ground truth
|
||||
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
|
||||
ap = voc_ap(rec, prec, use_07_metric)
|
||||
# print rec, prec, ap
|
||||
num_tp = np.sum(det).astype(int)
|
||||
total_num_tp += num_tp
|
||||
det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
|
||||
else:
|
||||
if len(BB) > 0:
|
||||
total_false_cls[j] += len(BB)
|
||||
#print 'outlier class:', j, len(BB)
|
||||
select_nonzero = total_false_cls > 0
|
||||
# print(np.nonzero(select_nonzero), total_false_cls[select_nonzero])
|
||||
return all_tp, all_fp, det_stats, total_num_tp #, total_false_cls
|
||||
|
||||
|
||||
def df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=None, num_classes=None, use_07_metric=False):
|
||||
# Reference: Ross Girshick's Fast/er R-CNN code
|
||||
|
||||
if ovthresh is None:
|
||||
ovthresh = cfg.TEST.TP_MIN_OVERLAP
|
||||
if num_classes is None:
|
||||
num_classes = cfg.TEST.NUM_CLASSES
|
||||
num_images = gt_boxes_df.seg_idx.nunique()
|
||||
|
||||
# sort by confidence
|
||||
pred_boxes_df = pred_boxes_df.sort_values('conf', ascending=False)
|
||||
|
||||
det = [False] * len(gt_boxes_df)
|
||||
|
||||
det_stats = []
|
||||
total_num_tp = 0
|
||||
for j in tqdm(xrange(1, num_classes)): # num_classes
|
||||
cls_dets_df = pred_boxes_df[pred_boxes_df.cls == j]
|
||||
cls_gt_df = gt_boxes_df[gt_boxes_df.cls == j]
|
||||
|
||||
# get bounding box and image ids
|
||||
BB = cls_dets_df[['x1', 'y1', 'x2', 'y2']].values
|
||||
image_ids = cls_dets_df.seg_idx.values
|
||||
# confidence = cls_dets_df.conf.values
|
||||
|
||||
npos = len(cls_gt_df)
|
||||
|
||||
if npos > 0: # else if no gt boxes available for class, AP computation is not meaningful
|
||||
|
||||
# go down dets and mark TPs and FPs
|
||||
nd = len(cls_dets_df)
|
||||
tp = np.zeros(nd)
|
||||
fp = np.zeros(nd)
|
||||
|
||||
for d in range(nd):
|
||||
ovmax = -np.inf
|
||||
|
||||
# get bbox and seg_idx
|
||||
bb = BB[d, :].astype(float)
|
||||
seg_idx = image_ids[d]
|
||||
|
||||
# get gt boxes
|
||||
seg_cls_gt_df = cls_gt_df[cls_gt_df.seg_idx == seg_idx]
|
||||
BBGT = seg_cls_gt_df[['x1', 'y1', 'x2', 'y2']].values.astype(float)
|
||||
|
||||
if BBGT.size > 0:
|
||||
# compute overlaps
|
||||
# intersection
|
||||
ixmin = np.maximum(BBGT[:, 0], bb[0])
|
||||
iymin = np.maximum(BBGT[:, 1], bb[1])
|
||||
ixmax = np.minimum(BBGT[:, 2], bb[2])
|
||||
iymax = np.minimum(BBGT[:, 3], bb[3])
|
||||
iw = np.maximum(ixmax - ixmin + 1., 0.)
|
||||
ih = np.maximum(iymax - iymin + 1., 0.)
|
||||
inters = iw * ih
|
||||
|
||||
# union
|
||||
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
|
||||
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
|
||||
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
|
||||
|
||||
overlaps = inters / uni
|
||||
ovmax = np.max(overlaps)
|
||||
jmax = np.argmax(overlaps)
|
||||
|
||||
if ovmax > ovthresh:
|
||||
# map seg_cls idx to global idx
|
||||
gidx = seg_cls_gt_df.index.values[jmax]
|
||||
if not det[gidx]:
|
||||
tp[d] = 1.
|
||||
det[gidx] = 1
|
||||
else:
|
||||
# double detection (unlikely due to nms)
|
||||
fp[d] = 1.
|
||||
else:
|
||||
fp[d] = 1.
|
||||
# compute num tp before cumsum (!)
|
||||
num_tp = np.sum(tp).astype(int)
|
||||
# compute precision recall
|
||||
fp = np.cumsum(fp)
|
||||
tp = np.cumsum(tp)
|
||||
rec = tp / float(npos)
|
||||
# avoid divide by zero in case the first detection matches a difficult
|
||||
# ground truth
|
||||
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
|
||||
ap = voc_ap(rec, prec, use_07_metric)
|
||||
# print rec, prec, ap
|
||||
total_num_tp += num_tp
|
||||
det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
|
||||
# print np.sum(det), total_num_tp
|
||||
else:
|
||||
if len(cls_dets_df) > 0:
|
||||
if False: # turn on for debugging to see which classes are missing
|
||||
print('outlier class:', j, len(BB))
|
||||
|
||||
return det_stats, total_num_tp
|
||||
|
||||
|
||||
def eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=None, verbose=True):
|
||||
# evaluate
|
||||
num_imgs = 1
|
||||
all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels, num_imgs, all_boxes,
|
||||
ovthresh=ovthresh)
|
||||
|
||||
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
|
||||
# print stats
|
||||
pd.set_option('display.max_rows', 50)
|
||||
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
|
||||
|
||||
if verbose:
|
||||
print("total_tp", total_num_tp, "total_fp", total_num_fp,
|
||||
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
|
||||
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
|
||||
acc = total_num_tp / float(total_num_tp + total_num_fp)
|
||||
|
||||
return acc, df_stats
|
||||
|
||||
|
||||
def eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None):
|
||||
det_stats, total_num_tp = df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=ovthresh)
|
||||
|
||||
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
|
||||
# print stats
|
||||
pd.set_option('display.max_rows', 50)
|
||||
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
|
||||
|
||||
print('RESULTS ON FULL COLLECTION :')
|
||||
print("total_tp", total_num_tp, "total_fp", total_num_fp,
|
||||
"acc", '{:0.3f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
|
||||
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
|
||||
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
|
||||
acc = total_num_tp / float(total_num_tp + total_num_fp)
|
||||
|
||||
return acc, df_stats
|
||||
|
||||
|
||||
# *FAST* AP COMPUTATION
|
||||
|
||||
|
||||
# prepare AP computation
|
||||
|
||||
|
||||
def add_max_det(group):
|
||||
# add column to dataframe
|
||||
group['max_det'] = False
|
||||
# select detections marked as TP
|
||||
tp_group = group[group.det_type == 3]
|
||||
# only one can be TP, others are double detections
|
||||
if len(tp_group) > 0:
|
||||
# set max entry to true
|
||||
group.max_det.loc[tp_group.score.idxmax()] = True
|
||||
return group
|
||||
|
||||
|
||||
def add_det_type_column(eval_df, tp_thresh=0.5, bg_thresh=0.2):
|
||||
# based on "Diagnosing Error in Object Detectors" by Hoiem et al.
|
||||
# modifications:
|
||||
# sim and other categories are merged, since every sign is considered similar
|
||||
# bg_thresh is 0.2 instead of default 0.1
|
||||
|
||||
# determine detection types
|
||||
|
||||
type_list = []
|
||||
for didx, det_rec in eval_df.iterrows():
|
||||
overlap = det_rec.overlap
|
||||
# class matches
|
||||
if det_rec.pred == det_rec.true:
|
||||
if overlap > tp_thresh:
|
||||
type_list.append(3) # TP (3)
|
||||
elif overlap > bg_thresh:
|
||||
type_list.append(0) # FP: Loc(0) confusion
|
||||
else:
|
||||
type_list.append(2) # FP: BG(2) confusion
|
||||
else:
|
||||
if overlap > bg_thresh:
|
||||
type_list.append(1) # FP: Sim/Oth(1) confusion
|
||||
else:
|
||||
type_list.append(2) # FP: BG(2) confusion
|
||||
|
||||
# add column to dataframe
|
||||
eval_df['det_type'] = type_list
|
||||
|
||||
return eval_df
|
||||
|
||||
|
||||
def prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, tp_thresh, bg_thresh):
|
||||
""" prepare eval_df that contains most information for average precision computation """
|
||||
# convert all_boxes to ndarray (N x 9)
|
||||
# [ID, cx, cy, score, x1, y1, x2, y2, idx] bbox = [4:8] ctr = [1:3]
|
||||
sign_detections = convert_detections_to_array(all_boxes)
|
||||
|
||||
# compute ious between detections and gt_boxes
|
||||
ious = box_iou(sign_detections[:, 4:8], gt_boxes)
|
||||
|
||||
# for each detection get best fit with gt box
|
||||
index_gt = np.argmax(ious, axis=1)
|
||||
overlap_gt = np.max(ious, axis=1)
|
||||
label_gt = gt_labels[index_gt]
|
||||
|
||||
# collect in data frame
|
||||
eval_df = pd.DataFrame(np.hstack([overlap_gt.reshape(-1, 1), label_gt.reshape(-1, 1),
|
||||
sign_detections[:, [0, 3, 8]], index_gt.reshape(-1, 1)]),
|
||||
columns=['overlap', 'true', 'pred', 'score', 'det_idx', 'gt_idx'])
|
||||
# add column with segment index
|
||||
eval_df['seg_idx'] = seg_idx
|
||||
# add det_type column (0:LOC, 1:SIM, 2:BG, 3:TP)
|
||||
eval_df = add_det_type_column(eval_df, tp_thresh, bg_thresh)
|
||||
# compute max_det (in order to fin double detections)
|
||||
eval_df = eval_df.groupby('gt_idx').apply(add_max_det)
|
||||
|
||||
return eval_df
|
||||
|
||||
|
||||
# AP computation
|
||||
|
||||
|
||||
def compute_mean_ap(col_eval_df, gt_df, num_classes=240, class_list=None, verbose=True):
|
||||
""" compute mean class AP """
|
||||
|
||||
# define list of classes to evaluate over
|
||||
if class_list is None:
|
||||
class_list = np.arange(1, num_classes) # range(1, num_classes)
|
||||
col_eval_df = col_eval_df.sort_values('score', ascending=False)
|
||||
if False:
|
||||
# filter gt according to considered segments
|
||||
bbox_anno = None
|
||||
gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
|
||||
gt_df['cls'] = gt_df.train_label
|
||||
|
||||
# compute class counts
|
||||
gt_counts = gt_df.cls.value_counts()
|
||||
|
||||
det_stats = []
|
||||
for cls_idx in class_list:
|
||||
# get class predictions
|
||||
cls_det_df = col_eval_df[col_eval_df.pred == cls_idx]
|
||||
# get gt number
|
||||
if cls_idx in gt_counts.index:
|
||||
npos = gt_counts[cls_idx]
|
||||
else:
|
||||
npos = 0
|
||||
if npos > 0:
|
||||
if 1:
|
||||
tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
|
||||
fp_vec = ~tp_vec
|
||||
# fp_vec = (cls_det_df.det_type < 3) | (cls_det_df.max_det == False)
|
||||
fp = np.cumsum(fp_vec.values)
|
||||
tp = np.cumsum(tp_vec.values)
|
||||
|
||||
assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
|
||||
else:
|
||||
# without considering double detections
|
||||
fp = np.cumsum(cls_det_df.det_type < 3)
|
||||
tp = np.cumsum(cls_det_df.det_type == 3)
|
||||
|
||||
rec = tp / float(npos)
|
||||
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
|
||||
ap = voc_ap(rec, prec, False)
|
||||
# sum is used to map empty list to 0
|
||||
det_stats.append([npos, len(cls_det_df), np.sum(tp[-1:]), np.sum(fp[-1:]), ap, cls_idx])
|
||||
else:
|
||||
if len(cls_det_df) > 0:
|
||||
if False: # turn on for debugging to see which classes are missing
|
||||
print('outlier class:', cls_idx, len(cls_det_df))
|
||||
# convert to ndarray
|
||||
det_stats = np.asarray(det_stats)
|
||||
mean_ap = np.mean(det_stats[:, -2])
|
||||
# return aps
|
||||
if verbose:
|
||||
print('mAP {:.4}'.format(mean_ap))
|
||||
return det_stats
|
||||
|
||||
|
||||
def compute_global_ap(col_eval_df, gt_df, num_classes=240, verbose=True):
|
||||
""" compute global AP """
|
||||
|
||||
# sort according to score
|
||||
col_eval_df = col_eval_df.sort_values('score', ascending=False)
|
||||
# not necessary, because predict classes are only in range [1, num_classes] anyways
|
||||
cls_det_df = col_eval_df[col_eval_df.pred.isin(range(1, num_classes))]
|
||||
if False:
|
||||
# filter gt according to considered segments
|
||||
bbox_anno = None
|
||||
gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
|
||||
gt_df['cls'] = gt_df.train_label
|
||||
# filter considered classes
|
||||
gt_df = gt_df[gt_df.cls.isin(range(1, num_classes))]
|
||||
|
||||
# select number of gt positives
|
||||
npos = len(gt_df)
|
||||
# npos = len(bbox_anno.anno_df.train_label[bbox_anno.anno_df.train_label > 0])
|
||||
|
||||
ap = 0
|
||||
if npos > 0:
|
||||
if 1:
|
||||
tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
|
||||
fp_vec = ~tp_vec
|
||||
fp = np.cumsum(fp_vec)
|
||||
tp = np.cumsum(tp_vec)
|
||||
|
||||
assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
|
||||
else:
|
||||
# without considering double detections
|
||||
fp = np.cumsum(cls_det_df.det_type < 3)
|
||||
tp = np.cumsum(cls_det_df.det_type == 3)
|
||||
|
||||
rec = tp / float(npos)
|
||||
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
|
||||
ap = voc_ap(rec, prec, False)
|
||||
|
||||
if False:
|
||||
from sklearn.metrics import precision_recall_curve, auc
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# compute normalized PR curve
|
||||
precision, recall, _ = precision_recall_curve(tp_vec, cls_det_df.score.values)
|
||||
# plot pr curve
|
||||
plt.figure()
|
||||
plt.step(recall, precision, color='b', alpha=0.2, where='post')
|
||||
# plt.step(rec, prec, color='b', alpha=0.2) # works, but rec values not normalized to [0, 1] range
|
||||
|
||||
# compare different ways to compute VOC AP (ie. area under the precision recall curve)
|
||||
# first two methods should produce same results, but there are slight differences
|
||||
# in doubt use original VOC AP code
|
||||
# https://datascience.stackexchange.com/questions/25119/how-to-calculate-map-for-detection-task-for-the-pascal-voc-challenge
|
||||
# https://github.com/rafaelpadilla/Object-Detection-Metrics
|
||||
plt.title('voc ap: {:.3} | PR AUC: {:.3} | norm. PR AUC: {:.3}'.format(voc_ap(rec, prec, False),
|
||||
auc(rec, prec),
|
||||
auc(recall, precision)))
|
||||
plt.show()
|
||||
|
||||
# return ap
|
||||
if verbose:
|
||||
print('global AP {:.4}'.format(ap))
|
||||
return ap
|
||||
|
||||
|
||||
# FP categorization
|
||||
|
||||
|
||||
def get_type_val_frac(fp_type_series, type_values=[0, 1, 2, 3], num_fp_thres=[5, 10, 25, 50, 100]):
|
||||
# type_values = [0, 1, 2, 3]
|
||||
# num_fp_thres = [5, 10, 25, 50, 100]
|
||||
|
||||
type_val_frac = np.zeros((len(num_fp_thres), len(type_values)))
|
||||
for i, thres in enumerate(num_fp_thres):
|
||||
type_counts = fp_type_series[:thres].value_counts(normalize=True, sort=True)
|
||||
for j, val in enumerate(type_values):
|
||||
val_check = type_counts.index.values == val
|
||||
if np.any(val_check):
|
||||
val_idx = np.argmax(val_check)
|
||||
type_val_frac[i, j] = type_counts.iloc[val_idx]
|
||||
return type_val_frac
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,156 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from ast import literal_eval
|
||||
|
||||
import os.path
|
||||
|
||||
from .config import cfg
|
||||
|
||||
from ..detection.detection_helpers import scale_detection_boxes, correct_for_shift, crop_bboxes_from_im
|
||||
|
||||
# class to wrap annotations
|
||||
|
||||
|
||||
class BBoxAnnotations(object):
|
||||
|
||||
def __init__(self, collection_name, relative_path='../'):
|
||||
# basic paths
|
||||
self.data_root = cfg.DATA_TEST_DIR
|
||||
self.num_classes = cfg.TEST.NUM_CLASSES
|
||||
self.path_to_data_products = '{}data/annotations/'.format(relative_path)
|
||||
|
||||
# load collection annotations
|
||||
self.anno_df = self.load_collection_annotations(collection_name)
|
||||
|
||||
if len(self.anno_df) > 0:
|
||||
print('Load bbox annotations for {} dataset: {} found!'.format(collection_name,
|
||||
self.anno_df.segm_idx.nunique()))
|
||||
else:
|
||||
print('No bbox annotations for {} dataset'.format(collection_name))
|
||||
|
||||
def load_collection_annotations(self, collection_name):
|
||||
# assemble annotation file path
|
||||
annotation_file = 'bbox_annotations_{}.csv'.format(collection_name)
|
||||
annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
|
||||
|
||||
# check if annotation file exists
|
||||
if os.path.isfile(annotation_file_path):
|
||||
# read annotation file
|
||||
anno_df = pd.read_csv(annotation_file_path, engine='python')
|
||||
# convert string of list to list
|
||||
anno_df['relative_bbox'] = anno_df['relative_bbox'].apply(literal_eval)
|
||||
anno_df['bbox'] = anno_df['bbox'].apply(literal_eval)
|
||||
# return data frame
|
||||
return anno_df
|
||||
else:
|
||||
# return empty list (check later with len(.) to see if file exists)
|
||||
return []
|
||||
|
||||
def select_anno_df_by_segm_idx(self, segm_idx):
|
||||
# wrap pandas logic
|
||||
return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
|
||||
|
||||
def select_anno_df_by_cdli_and_view(self, cdli, view):
|
||||
# wrap pandas logic
|
||||
return self.anno_df[(self.anno_df.tablet_CDLI == cdli) & (self.anno_df.view_desc == view)]
|
||||
|
||||
|
||||
# static functions
|
||||
def get_boxes_and_labels(anno_df):
|
||||
# retrieves gt_boxes and gt_labels from anno_df
|
||||
#gt_boxes = np.stack(anno_df.bbox.values)
|
||||
if len(anno_df) > 0:
|
||||
gt_boxes = np.stack(anno_df.relative_bbox.values) # use relative bbox
|
||||
gt_labels = anno_df.train_label.values
|
||||
else:
|
||||
gt_boxes, gt_labels = np.array([]), np.array([]) # just dummy
|
||||
return gt_boxes, gt_labels
|
||||
|
||||
|
||||
def get_class_gt_boxes(gt_boxes, gt_labels, cls_id):
|
||||
inds = np.where(gt_labels == cls_id)[0]
|
||||
return gt_boxes[inds, :]
|
||||
|
||||
|
||||
def apply_scaling_and_shift(gt_boxes, scaling=1, shift=0):
|
||||
# if used, should be applied before calling eval
|
||||
# apply scaling of detection boxes
|
||||
gt_boxes = scale_detection_boxes(gt_boxes, scaling)
|
||||
# apply shift of detection boxes due to center crop
|
||||
gt_boxes = correct_for_shift(gt_boxes, shift)
|
||||
return gt_boxes
|
||||
|
||||
|
||||
def apply_scaling(gt_boxes, scaling=1):
|
||||
# if used, should be applied before calling eval
|
||||
# apply scaling of detection boxes
|
||||
gt_boxes = scale_detection_boxes(gt_boxes, scaling)
|
||||
return gt_boxes
|
||||
|
||||
|
||||
def collect_gt_crops(gt_boxes, gt_labels, im, num_classes, max_vis=2):
|
||||
# takes tablet image
|
||||
# returns list of ground truth crops organized by class
|
||||
gt_crops = [[] for _ in xrange(num_classes)]
|
||||
for j in xrange(1, num_classes):
|
||||
BBGT = get_class_gt_boxes(gt_boxes, gt_labels, j).astype(float)
|
||||
npos = BBGT.shape[0]
|
||||
if npos > 0:
|
||||
# get boxes
|
||||
bboxes = BBGT[:, :4] # remove any additional dims
|
||||
ncrops = min(max_vis, bboxes.shape[0])
|
||||
gt_crops[j] = crop_bboxes_from_im(im, bboxes[:ncrops, :])
|
||||
return gt_crops
|
||||
|
||||
|
||||
def prepare_segment_gt(segm_idx, segm_scale, bbox_anno, with_star_crop=False):
|
||||
# this is how things work together
|
||||
|
||||
# create empty lists in case no annotations available
|
||||
gt_boxes, gt_labels = [], []
|
||||
|
||||
if len(bbox_anno.anno_df) > 0:
|
||||
# select annotations for specific segment
|
||||
sub_anno_df = bbox_anno.select_anno_df_by_segm_idx(segm_idx)
|
||||
# get boxes and labels
|
||||
gt_boxes, gt_labels = get_boxes_and_labels(sub_anno_df)
|
||||
# adapt gt boxes to input format
|
||||
if with_star_crop:
|
||||
gt_boxes = apply_scaling_and_shift(gt_boxes, scaling=segm_scale, shift=-cfg.TEST.SHIFT / 2.)
|
||||
else:
|
||||
gt_boxes = apply_scaling(gt_boxes, scaling=segm_scale)
|
||||
|
||||
# return selected ground truth
|
||||
return gt_boxes, gt_labels
|
||||
|
||||
|
||||
# def get_pred_boxes_df(all_boxes, seg_idx):
|
||||
# # iterate list
|
||||
# list_boxes = []
|
||||
# list_cls_idx = []
|
||||
# for cls, boxes in enumerate(all_boxes):
|
||||
# num_boxes = len(boxes[0])
|
||||
# if num_boxes > 0:
|
||||
# list_boxes.append(boxes)
|
||||
# list_cls_idx.extend([cls] * num_boxes)
|
||||
# # create df
|
||||
# pred_boxes_df = pd.DataFrame() # []
|
||||
# if len(list_boxes) > 0:
|
||||
# pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
|
||||
# pred_boxes_df['cls'] = list_cls_idx
|
||||
# pred_boxes_df['seg_idx'] = seg_idx
|
||||
#
|
||||
# return pred_boxes_df
|
||||
#
|
||||
#
|
||||
# def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
|
||||
# # create df
|
||||
# gt_boxes_df = pd.DataFrame() # []
|
||||
# if len(gt_boxes) > 0:
|
||||
# gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
|
||||
# gt_boxes_df['cls'] = gt_labels
|
||||
# gt_boxes_df['seg_idx'] = seg_idx
|
||||
# return gt_boxes_df
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
|
||||
def get_pred_boxes_df(all_boxes, seg_idx):
|
||||
# iterate list
|
||||
list_boxes = []
|
||||
list_cls_idx = []
|
||||
for cls, boxes in enumerate(all_boxes):
|
||||
num_boxes = len(boxes[0])
|
||||
if num_boxes > 0:
|
||||
list_boxes.append(boxes)
|
||||
list_cls_idx.extend([cls] * num_boxes)
|
||||
# create df
|
||||
pred_boxes_df = pd.DataFrame() # []
|
||||
if len(list_boxes) > 0:
|
||||
pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
|
||||
pred_boxes_df['cls'] = list_cls_idx
|
||||
pred_boxes_df['seg_idx'] = seg_idx
|
||||
|
||||
return pred_boxes_df
|
||||
|
||||
|
||||
def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
|
||||
# create df
|
||||
gt_boxes_df = pd.DataFrame() # []
|
||||
if len(gt_boxes) > 0:
|
||||
gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
|
||||
gt_boxes_df['cls'] = gt_labels
|
||||
gt_boxes_df['seg_idx'] = seg_idx
|
||||
return gt_boxes_df
|
||||
|
||||
|
||||
# SSD specific
|
||||
|
||||
|
||||
def convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, total_labels=240):
|
||||
|
||||
# convert from ssd detector format to all_boxes
|
||||
all_boxes = [[] for _ in range(total_labels)]
|
||||
|
||||
for boxes, labels, scores in zip(pred_boxes, pred_labels, pred_scores):
|
||||
for bbox, lbl, score in zip(boxes, labels, scores):
|
||||
# temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
|
||||
|
||||
# copy data to _new_ all_boxes
|
||||
box = np.zeros((1, 5))
|
||||
box[0, :4] = bbox
|
||||
box[0, 4] = score
|
||||
all_boxes[np.int(lbl)].append(box)
|
||||
|
||||
# for each class stack list of bounding boxes together
|
||||
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
|
||||
|
||||
return all_boxes
|
||||
|
||||
|
||||
def prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds, num_classes=240):
|
||||
|
||||
if len(box_preds) > 0:
|
||||
# Wrap VOC evaluation for PyTorch
|
||||
pred_boxes = [b.numpy() for b in [box_preds]]
|
||||
pred_labels = [label.numpy() for label in [label_preds]]
|
||||
pred_scores = [score.numpy() for score in [score_preds]]
|
||||
|
||||
# convert to all boxes and stack tiles (better would be to have single tile for whole segment)
|
||||
all_boxes = convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, num_classes)
|
||||
all_boxes = [[el] for el in all_boxes]
|
||||
else:
|
||||
# deal with case if there are not any detections
|
||||
all_boxes = [[] for _ in range(num_classes)]
|
||||
all_boxes = [[el] for el in all_boxes]
|
||||
|
||||
return all_boxes
|
||||
|
||||
|
||||
def prepare_ssd_gt_for_eval(gt_boxes, gt_labels):
|
||||
gt_boxes = [b.numpy() for b in [gt_boxes]]
|
||||
gt_labels = [label.numpy() for label in [gt_labels]]
|
||||
|
||||
return gt_boxes[0], gt_labels[0]
|
||||
|
||||
|
||||
# alignment specific
|
||||
|
||||
def convert_to_all_boxes(seg_gen_annos, relative_bboxes, scale, num_labels):
|
||||
all_boxes = [[] for _ in range(num_labels)]
|
||||
|
||||
for anno_idx, anno_rec in seg_gen_annos.iterrows():
|
||||
# [x1, y1, x2, y2, score]
|
||||
box = np.zeros((1, 5))
|
||||
box[0, :4] = np.array(relative_bboxes[anno_idx]) * scale
|
||||
box[0, 4] = anno_rec.det_score
|
||||
# assign to class
|
||||
all_boxes[anno_rec.newLabel].append(box)
|
||||
|
||||
# for each class stack list of bounding boxes together
|
||||
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
|
||||
|
||||
return all_boxes
|
||||
|
||||
@@ -0,0 +1,187 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
from .sign_evaluation_prep import (get_pred_boxes_df, get_gt_boxes_df)
|
||||
from .sign_evaluation import (eval_detector, eval_detector_on_collection, prepare_eval_df,
|
||||
compute_global_ap, compute_mean_ap)
|
||||
|
||||
|
||||
# *BASIC* EVALUATION
|
||||
|
||||
class SignEvalBasic(object):
|
||||
def __init__(self, model_version, collection_name, eval_ovthresh=0.5):
|
||||
|
||||
self.model_version = model_version
|
||||
self.coll_name = collection_name
|
||||
|
||||
self.eval_ovthresh = eval_ovthresh
|
||||
|
||||
self.list_seg_mean_ap = []
|
||||
self.list_df_stats = []
|
||||
#self.list_seg_name_with_anno = []
|
||||
|
||||
self.list_pred_boxes_df = []
|
||||
self.list_gt_boxes_df = []
|
||||
|
||||
self.gt_boxes_df = pd.DataFrame()
|
||||
self.pred_boxes_df = pd.DataFrame()
|
||||
|
||||
def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
|
||||
# evaluate and print stats
|
||||
acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=self.eval_ovthresh, verbose=verbose)
|
||||
|
||||
# collect results
|
||||
self.list_seg_mean_ap.append(df_stats['ap'].mean())
|
||||
self.list_df_stats.append(df_stats)
|
||||
#list_seg_name_with_anno.append(image_name + view_desc)
|
||||
|
||||
# prepare full collection evaluation
|
||||
if type(all_boxes[0]) is np.ndarray:
|
||||
self.list_pred_boxes_df.append(get_pred_boxes_df([el.tolist() for el in all_boxes], seg_idx))
|
||||
self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
|
||||
else:
|
||||
self.list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_idx))
|
||||
self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
|
||||
|
||||
def prepare_eval_collection(self):
|
||||
self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
|
||||
self.pred_boxes_df = pd.concat(self.list_pred_boxes_df, ignore_index=True)
|
||||
|
||||
def eval_collection(self, verbose=True):
|
||||
self.prepare_eval_collection()
|
||||
acc, df_stats = eval_detector_on_collection(self.gt_boxes_df, self.pred_boxes_df, ovthresh=self.eval_ovthresh)
|
||||
return acc, df_stats
|
||||
|
||||
|
||||
# *FAST* EVALUATION
|
||||
|
||||
class SignEvalFast(object):
|
||||
def __init__(self, model_version, collection_name, tp_thresh=0.5, bg_thresh=0.2, num_classes=240):
|
||||
|
||||
self.model_version = model_version
|
||||
self.coll_name = collection_name
|
||||
|
||||
self.tp_thresh = tp_thresh
|
||||
self.bg_thresh = bg_thresh
|
||||
self.num_classes = num_classes
|
||||
|
||||
self.list_seg_mean_ap = []
|
||||
|
||||
self.list_eval_df = []
|
||||
self.list_gt_boxes_df = []
|
||||
self.list_seg_global_ap = []
|
||||
|
||||
self.col_eval_df = pd.DataFrame()
|
||||
self.gt_boxes_df = pd.DataFrame()
|
||||
|
||||
def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
|
||||
# get eval_df
|
||||
eval_df = prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, self.tp_thresh, self.bg_thresh)
|
||||
# get gt_df
|
||||
gt_df = get_gt_boxes_df(gt_boxes, gt_labels, seg_idx)
|
||||
|
||||
mean_ap, global_ap, mean_ap_align = 0., 0., 0.
|
||||
if len(eval_df) > 0 and len(gt_df[gt_df.cls > 0]) > 0:
|
||||
# eval
|
||||
det_stats = compute_mean_ap(eval_df, gt_df, self.num_classes, verbose=False)
|
||||
global_ap = compute_global_ap(eval_df, gt_df, self.num_classes, verbose=False)
|
||||
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
|
||||
mean_ap = np.mean(df_stats.ap)
|
||||
# mean_ap_align = np.mean(df_stats.ap[df_stats.ap.nonzero()[0]])
|
||||
mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0]) # only consider classes with detections
|
||||
if verbose:
|
||||
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
|
||||
print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(*get_summary(eval_df, gt_df)))
|
||||
else:
|
||||
if verbose:
|
||||
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {}'.format(mean_ap, global_ap, mean_ap_align))
|
||||
print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(0, 0, 0, 0.))
|
||||
|
||||
# append
|
||||
self.list_seg_mean_ap.append(mean_ap)
|
||||
self.list_seg_global_ap.append(global_ap)
|
||||
self.list_eval_df.append(eval_df)
|
||||
self.list_gt_boxes_df.append(gt_df)
|
||||
|
||||
def prepare_eval_collection(self, verbose=False):
|
||||
if len(self.col_eval_df) == 0:
|
||||
if len(self.list_eval_df) > 0: # only concat if there is anything to concat
|
||||
# concat dataframes
|
||||
self.col_eval_df = pd.concat(self.list_eval_df)
|
||||
self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
|
||||
|
||||
if verbose:
|
||||
print(self.col_eval_df.det_type.value_counts())
|
||||
print("num det:", len(self.col_eval_df))
|
||||
print("num TP (without double detections):",
|
||||
len(self.col_eval_df[(self.col_eval_df.max_det == True)
|
||||
& (self.col_eval_df.det_type == 3)]))
|
||||
|
||||
def eval_collection(self, verbose=True):
|
||||
# concat dataframes
|
||||
self.prepare_eval_collection()
|
||||
|
||||
global_ap = 0
|
||||
df_stats = pd.DataFrame()
|
||||
if len(self.gt_boxes_df) > 0:
|
||||
# full collection eval
|
||||
det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
|
||||
global_ap = compute_global_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
|
||||
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
|
||||
mean_ap = np.mean(det_stats[:, -2])
|
||||
mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0]) # only consider classes with detections
|
||||
if verbose:
|
||||
print('{} | {}'.format(self.coll_name, self.model_version))
|
||||
print('RESULTS ON FULL COLLECTION :')
|
||||
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
|
||||
print ("total_tp: {} | total_fp: {} [{}] | prec: {:.3}".format(*self.get_col_summary()))
|
||||
|
||||
return df_stats, global_ap
|
||||
|
||||
def eval_collection_class_freq(self, freq_classes_list):
|
||||
# freq_classes_list: sorted list of most frequent classes (in descending order)
|
||||
|
||||
# concat dataframes
|
||||
self.prepare_eval_collection()
|
||||
|
||||
# compute mAP for different sets of topk most frequent classes
|
||||
topk_list = [2, 4, 8, 16, 32, 64, 128, 192, 256]
|
||||
topk_mAP_list = []
|
||||
for topk in topk_list:
|
||||
print("over {} most freq classes".format(topk))
|
||||
det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes,
|
||||
class_list=freq_classes_list[:topk])
|
||||
mean_ap = np.mean(det_stats[:, -2])
|
||||
topk_mAP_list.append(mean_ap)
|
||||
# plot
|
||||
plt.figure()
|
||||
plt.plot(topk_list, topk_mAP_list, "o-")
|
||||
plt.title('{} - {}'.format(self.coll_name, self.model_version))
|
||||
plt.ylabel('mAP')
|
||||
plt.xlabel('topk')
|
||||
# plt.xscale('log')
|
||||
|
||||
def get_seg_summary(self, didx):
|
||||
""" didx: index of segment in list of segments to evaluate """
|
||||
num_tp, num_fp, num_fp_global, acc = get_summary(self.list_eval_df[didx], self.list_gt_boxes_df[didx])
|
||||
mean_ap = self.list_seg_mean_ap
|
||||
global_ap = self.list_seg_global_ap
|
||||
return num_tp, num_fp, num_fp_global, acc, mean_ap, global_ap
|
||||
|
||||
def get_col_summary(self):
|
||||
num_tp, num_fp, num_fp_global, acc = get_summary(self.col_eval_df, self.gt_boxes_df)
|
||||
return num_tp, num_fp, num_fp_global, acc
|
||||
|
||||
|
||||
def get_summary(col_eval_df, gt_boxes_df):
|
||||
if len(gt_boxes_df) > 0 and len(col_eval_df) > 0:
|
||||
select_tp = (col_eval_df.det_type == 3) & (col_eval_df.max_det == True)
|
||||
select_fp = (~select_tp) & col_eval_df.pred.isin(gt_boxes_df.cls.unique())
|
||||
num_tp = select_tp.sum()
|
||||
num_fp = select_fp.sum()
|
||||
num_fp_global = (~select_tp).sum()
|
||||
return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)
|
||||
else:
|
||||
return 0, 0, 0, 0.
|
||||
|
||||
@@ -0,0 +1,129 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
import editdistance
|
||||
import Levenshtein
|
||||
from nltk.translate.bleu_score import sentence_bleu
|
||||
from nltk.translate.bleu_score import SmoothingFunction
|
||||
|
||||
from .sign_evaluation import evaluate_on_gt
|
||||
|
||||
# deprecated (should be handled by sign_evaluator)
|
||||
def get_eval_stats(gt_boxes, gt_labels, aligned_list):
|
||||
# evaluate
|
||||
num_imgs = 1
|
||||
all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels,
|
||||
num_imgs, [[el] for el in aligned_list])
|
||||
|
||||
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
|
||||
# print stats
|
||||
pd.set_option('display.max_rows', 50)
|
||||
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
|
||||
|
||||
print("total_tp", total_num_tp, "total_fp", total_num_fp,
|
||||
# here precision = accuarcy
|
||||
"acc", '{:0.2f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
|
||||
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
|
||||
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
|
||||
acc = total_num_tp / float(total_num_tp + total_num_fp)
|
||||
|
||||
return acc, df_stats
|
||||
|
||||
# deprecated (should be handled by sign_evaluator)
|
||||
def compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=False):
|
||||
# only run if gt available
|
||||
if len(gt_boxes) > 0:
|
||||
acc, df_stats = get_eval_stats(gt_boxes, gt_labels, aligned_list)
|
||||
if return_stats:
|
||||
return acc, df_stats
|
||||
else:
|
||||
return acc
|
||||
else:
|
||||
return -1
|
||||
|
||||
|
||||
def convert_alignments_for_eval(detections, total_labels=240):
|
||||
# convert from RANSAC format (Nx9) to all_boxes
|
||||
all_boxes = [[] for _ in range(total_labels)]
|
||||
|
||||
for temp in detections:
|
||||
# temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
|
||||
|
||||
# copy data to _new_ all_boxes
|
||||
box = np.zeros((1, 5))
|
||||
box[0, :4] = temp[4:8]
|
||||
box[0, 4] = temp[3]
|
||||
all_boxes[np.int(temp[0])].append(box)
|
||||
|
||||
# for each class stack list of bounding boxes together
|
||||
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
|
||||
|
||||
return all_boxes
|
||||
|
||||
|
||||
# SCORE FUNCTIONS
|
||||
|
||||
|
||||
def compute_bleu_score(candidate_words, reference_words):
|
||||
reference = [reference_words]
|
||||
candidate = candidate_words
|
||||
# compute score
|
||||
|
||||
# deal with issue
|
||||
# https://github.com/nltk/nltk/issues/1554
|
||||
hyp_lengths = len(reference_words)
|
||||
weights = (0.25, 0.25, 0.25, 0.25)
|
||||
if hyp_lengths < 4:
|
||||
if hyp_lengths == 0:
|
||||
weights = (0, )
|
||||
else:
|
||||
weights = (1 / float(hyp_lengths), ) * hyp_lengths
|
||||
|
||||
chencherry = SmoothingFunction()
|
||||
score = sentence_bleu(reference, candidate, weights=weights, smoothing_function=chencherry.method1)
|
||||
return score
|
||||
|
||||
|
||||
def compute_levenshtein(candidate, reference, normalize=True):
|
||||
edist = 0
|
||||
if len(reference) > 0:
|
||||
# strict normalization in [0,1] range
|
||||
edist = editdistance.eval(reference, candidate)
|
||||
if normalize:
|
||||
edist = float(edist) / max(len(reference), len(candidate))
|
||||
return edist
|
||||
|
||||
|
||||
def compute_cer(candidate, reference):
|
||||
# character error rate (see also WER)
|
||||
# character accuracy 1 - CER
|
||||
edist = 0
|
||||
if len(reference) > 0:
|
||||
edist = editdistance.eval(reference, candidate)
|
||||
edist = float(edist) / len(reference)
|
||||
return edist
|
||||
|
||||
|
||||
def compute_levenshtein_ops(candidate, reference, normalize=True):
|
||||
# https://rawgit.com/ztane/python-Levenshtein/master/docs/Levenshtein.html
|
||||
ops_dict = {'insert': 0, 'delete': 1, 'replace': 2}
|
||||
# print candidate, reference
|
||||
edist = 0
|
||||
edit_ops = np.zeros(len(ops_dict))
|
||||
if len(reference) > 0:
|
||||
# convert to string for Levenshtein function
|
||||
candidate_str = u''.join([unichr(lbl) for lbl in candidate])
|
||||
reference_str = u''.join([unichr(lbl) for lbl in reference])
|
||||
# compute ed ops
|
||||
ops_df = pd.DataFrame(Levenshtein.editops(candidate_str, reference_str), columns=['type', 'ixA', 'ixB'])
|
||||
edist = len(ops_df)
|
||||
# collect types
|
||||
for op, ii in ops_dict.iteritems():
|
||||
edit_ops[ii] = len(ops_df[ops_df.type == op])
|
||||
if normalize:
|
||||
edist = float(edist) / max(len(reference), len(candidate))
|
||||
|
||||
return edist, edit_ops
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
### Network architectures
|
||||
|
||||
- `linenet.py` : a modified AlexNet used for line segmentation
|
||||
- `mobilenetv2_mod03.py` : a modified MobileNetV2 used as backbone for the sign detector
|
||||
- `mobilenetv2_fpn.py` : a FPN network wrapper for the backbone architecture
|
||||
- `trained_model_loader.py` : contains functions to load sign detector and line segmentation models;
|
||||
describes how detectors are assembled from parts.
|
||||
@@ -0,0 +1,192 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.init as init
|
||||
import torch.nn.functional as F
|
||||
|
||||
|
||||
# HELPER FUNCTIONS
|
||||
|
||||
def initialize_weights(model):
|
||||
for m in model.modules():
|
||||
if isinstance(m, nn.Conv2d):
|
||||
# n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
|
||||
# m.weight.data.normal_(0, math.sqrt(2. / n))
|
||||
# init.xavier_normal(m.weight.data)
|
||||
# init.kaiming_normal(m.weight.data)
|
||||
init.normal_(m.weight.data, std=0.01)
|
||||
# check if bias = True
|
||||
if hasattr(m.bias, 'data'):
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.Linear):
|
||||
m.weight.data.normal_(0, 0.005)
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.BatchNorm2d):
|
||||
# check if affine = True
|
||||
if hasattr(m.bias, 'data'):
|
||||
m.weight.data.fill_(1)
|
||||
m.bias.data.zero_()
|
||||
|
||||
|
||||
def copy_layer_params(target, source):
|
||||
""" Copy layer parameters from source to target; size of arrays needs to match! """
|
||||
target.weight.data.copy_(source.weight.data.view(target.weight.size()))
|
||||
target.bias.data.copy_(source.bias.data.view(target.bias.size()))
|
||||
|
||||
|
||||
# HELPER MODULES
|
||||
|
||||
|
||||
class LRN(nn.Module):
|
||||
def __init__(self, local_size=1, alpha=1.0, beta=0.75, ACROSS_CHANNELS=False):
|
||||
super(LRN, self).__init__()
|
||||
self.ACROSS_CHANNELS = ACROSS_CHANNELS
|
||||
if self.ACROSS_CHANNELS:
|
||||
# make it work with pytorch 0.2.X # hacky!!! should be ConstantPadding
|
||||
# self.average = nn.Sequential(
|
||||
# nn.ReplicationPad3d(padding=(0, 0, 0, 0, int((local_size - 1.0) / 2), int((local_size - 1.0) / 2))),
|
||||
# nn.AvgPool3d(kernel_size=(local_size, 1, 1), stride=1),
|
||||
# )
|
||||
self.average = nn.AvgPool3d(kernel_size=(local_size, 1, 1),
|
||||
stride=1,
|
||||
padding=(int((local_size - 1.0) / 2), 0, 0))
|
||||
else:
|
||||
self.average = nn.AvgPool2d(kernel_size=local_size,
|
||||
stride=1,
|
||||
padding=int((local_size - 1.0) / 2))
|
||||
self.alpha = alpha
|
||||
self.beta = beta
|
||||
|
||||
def forward(self, x):
|
||||
if self.ACROSS_CHANNELS:
|
||||
div = x.pow(2).unsqueeze(1)
|
||||
div = self.average(div).squeeze(1)
|
||||
div = div.mul(self.alpha).add(1.0).pow(self.beta)
|
||||
else:
|
||||
div = x.pow(2)
|
||||
div = self.average(div)
|
||||
div = div.mul(self.alpha).add(1.0).pow(self.beta)
|
||||
x = x.div(div)
|
||||
return x
|
||||
|
||||
|
||||
class Softmax3D(nn.Module):
|
||||
def forward(self, input_):
|
||||
batch_size = input_.size()[0]
|
||||
output_ = torch.stack([F.softmax(input_[i]) for i in range(batch_size)], 0)
|
||||
return output_
|
||||
|
||||
|
||||
# MAIN MODULES
|
||||
|
||||
|
||||
class LineNet(nn.Module):
|
||||
|
||||
def __init__(self, num_classes=1000, input_channels=3):
|
||||
super(LineNet, self).__init__()
|
||||
self.features = nn.Sequential(
|
||||
nn.Conv2d(input_channels, 64, kernel_size=11, stride=4, padding=0),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.MaxPool2d(kernel_size=3, stride=2),
|
||||
LRN(alpha=1e-4, beta=0.75, local_size=1),
|
||||
nn.Conv2d(64, 256, kernel_size=5, padding=2),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.MaxPool2d(kernel_size=3, stride=2),
|
||||
LRN(alpha=1e-4, beta=0.75, local_size=1),
|
||||
nn.Conv2d(256, 384, kernel_size=3, padding=1),
|
||||
nn.BatchNorm2d(384, affine=False, momentum=.1),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.Conv2d(384, 384, kernel_size=3, padding=1),
|
||||
nn.BatchNorm2d(384, affine=False, momentum=.1),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.Conv2d(384, 256, kernel_size=3, padding=1),
|
||||
nn.BatchNorm2d(256, affine=False, momentum=.1),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.MaxPool2d(kernel_size=3, stride=2),
|
||||
)
|
||||
self.fc6 = nn.Linear(256 * 6 * 6, 512)
|
||||
self.score = nn.Linear(512, 240)
|
||||
self.classifier = nn.Sequential(
|
||||
self.fc6,
|
||||
nn.ReLU(inplace=True),
|
||||
nn.Dropout(),
|
||||
self.score,
|
||||
)
|
||||
self.line_score = nn.Linear(512, num_classes)
|
||||
self.line_classifier = nn.Sequential(
|
||||
self.fc6,
|
||||
nn.ReLU(inplace=True),
|
||||
nn.Dropout(),
|
||||
self.line_score,
|
||||
)
|
||||
initialize_weights(self)
|
||||
|
||||
def legacy_forward(self, x):
|
||||
x = self.features(x)
|
||||
x = x.view(x.size(0), 256 * 6 * 6) # x = x.view(x.size(0), -1)
|
||||
x = self.classifier(x)
|
||||
|
||||
return x
|
||||
|
||||
def forward(self, x):
|
||||
x = self.features(x)
|
||||
x = x.view(x.size(0), 256 * 6 * 6) # x = x.view(x.size(0), -1)
|
||||
x = self.line_classifier(x)
|
||||
|
||||
return x
|
||||
|
||||
|
||||
class LineNetFCN(nn.Module):
|
||||
def __init__(self, original_model, num_classes=240):
|
||||
super(LineNetFCN, self).__init__()
|
||||
|
||||
# simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
|
||||
self.features = original_model.features
|
||||
|
||||
# create new module and assign features
|
||||
# original_net = CuneiNet(input_channels=1)
|
||||
# self.features = original_net.features
|
||||
# self.features.load_state_dict(original_model.features.state_dict())
|
||||
|
||||
# softmax function
|
||||
self.softmax = nn.Softmax2d() ## Softmax3D(),
|
||||
|
||||
# create fcn head
|
||||
self.classifier = nn.Sequential(
|
||||
nn.Conv2d(256, 512, kernel_size=6, padding=0),
|
||||
nn.ReLU(inplace=True),
|
||||
# nn.Dropout(), DO NOT USE 1d dropout!!!
|
||||
nn.Dropout2d(),
|
||||
nn.Conv2d(512, num_classes, kernel_size=1, padding=0),
|
||||
# self.softmax # not here to
|
||||
)
|
||||
|
||||
# perform net surgery
|
||||
self.net_surgery(original_model)
|
||||
|
||||
def forward(self, x):
|
||||
x = self.features(x)
|
||||
x = self.classifier(x)
|
||||
x = self.softmax(x)
|
||||
# batch_size = x.size()[0]
|
||||
# x = torch.stack([F.softmax(x[i]) for i in range(batch_size)], 0)
|
||||
return x
|
||||
|
||||
def get_conv_features(self, x):
|
||||
x = self.features(x)
|
||||
return x
|
||||
|
||||
def get_fc_features(self, x):
|
||||
x = self.features(x)
|
||||
x = self.classifier(x)
|
||||
return x
|
||||
|
||||
def net_surgery(self, original_model):
|
||||
""" perform net surgery
|
||||
original.classifier --> fcn.classifier
|
||||
"""
|
||||
for i, l1 in enumerate(original_model.line_classifier):
|
||||
if isinstance(l1, nn.Linear):
|
||||
l2 = self.classifier[i]
|
||||
# l2.weight.data.copy_(l1.weight.data.view(l2.weight.size()))
|
||||
# l2.bias.data.copy_(l1.bias.data.view(l2.bias.size()))
|
||||
copy_layer_params(l2, l1)
|
||||
@@ -0,0 +1,93 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
import math
|
||||
|
||||
|
||||
class MobileNetV2FPN(nn.Module):
|
||||
def __init__(self, original_model, num_classes=240, width_mult=1, with_p4=False):
|
||||
super(MobileNetV2FPN, self).__init__()
|
||||
|
||||
# simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
|
||||
self.features = original_model.features
|
||||
|
||||
self.conv6 = nn.Conv2d(512, 256, kernel_size=3, stride=2, padding=1)
|
||||
|
||||
# Top-down layers
|
||||
self.toplayer = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0)
|
||||
|
||||
self.with_p4 = with_p4
|
||||
if self.with_p4:
|
||||
# Lateral layers
|
||||
self.latlayer1 = nn.Conv2d(int(32*width_mult), 256, kernel_size=1, stride=1, padding=0)
|
||||
|
||||
# Smooth layers
|
||||
self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
|
||||
|
||||
# init weights (exclude features) TODO
|
||||
self._initialize_weights(['conv6', 'toplayer', 'latlayer1', 'smooth1'])
|
||||
|
||||
def forward(self, x):
|
||||
for i in range(3):
|
||||
x = self.features[i](x)
|
||||
c4 = self.features[3](x) # 14 x 32*width_mult (expansion factor does not affect output of block)
|
||||
|
||||
x = self.features[4](c4)
|
||||
x = self.features[5](x)
|
||||
x = self.features[6](x) # 7 x 160*width_mult
|
||||
c5 = self.features[7](x) # 7 x 512
|
||||
p6 = self.conv6(c5)
|
||||
|
||||
# Top-down
|
||||
p5 = self.toplayer(c5)
|
||||
|
||||
if self.with_p4:
|
||||
p4 = self._upsample_add(p5, self.latlayer1(c4))
|
||||
p4 = self.smooth1(p4)
|
||||
return p4, p5, p6
|
||||
else:
|
||||
return p5, p6
|
||||
|
||||
def _upsample_add(self, x, y):
|
||||
'''Upsample and add two feature maps.
|
||||
|
||||
Args:
|
||||
x: (Variable) top feature map to be upsampled.
|
||||
y: (Variable) lateral feature map.
|
||||
|
||||
Returns:
|
||||
(Variable) added feature map.
|
||||
|
||||
Note in PyTorch, when input size is odd, the upsampled feature map
|
||||
with `F.upsample(..., scale_factor=2, mode='nearest')`
|
||||
maybe not equal to the lateral feature map size.
|
||||
|
||||
e.g.
|
||||
original input size: [N,_,15,15] ->
|
||||
conv2d feature map size: [N,_,8,8] ->
|
||||
upsampled feature map size: [N,_,16,16]
|
||||
|
||||
So we choose bilinear upsample which supports arbitrary output sizes.
|
||||
'''
|
||||
_, _, H, W = y.size()
|
||||
return F.upsample(x, size=(H, W), mode='bilinear', align_corners=False) + y
|
||||
|
||||
def _initialize_weights(self, name_list):
|
||||
for name, m in self.named_modules():
|
||||
# only init modules in name_list
|
||||
if name in name_list:
|
||||
# exclude self.features, Mobile_blocks
|
||||
if isinstance(m, nn.Conv2d):
|
||||
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
|
||||
m.weight.data.normal_(0, math.sqrt(2. / n))
|
||||
if m.bias is not None:
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.BatchNorm2d):
|
||||
m.weight.data.fill_(1)
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.Linear):
|
||||
n = m.weight.size(1)
|
||||
m.weight.data.normal_(0, 0.01)
|
||||
m.bias.data.zero_()
|
||||
|
||||
|
||||
@@ -0,0 +1,160 @@
|
||||
import torch.nn as nn
|
||||
import math
|
||||
|
||||
|
||||
def conv_bn(inp, oup, stride):
|
||||
return nn.Sequential(
|
||||
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
|
||||
nn.BatchNorm2d(oup),
|
||||
nn.ReLU6(inplace=True)
|
||||
)
|
||||
|
||||
|
||||
def conv_1x1_bn(inp, oup):
|
||||
return nn.Sequential(
|
||||
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
|
||||
nn.BatchNorm2d(oup),
|
||||
nn.ReLU6(inplace=True)
|
||||
)
|
||||
|
||||
|
||||
class InvertedResidual(nn.Module):
|
||||
def __init__(self, inp, oup, stride, expand_ratio):
|
||||
super(InvertedResidual, self).__init__()
|
||||
self.stride = stride
|
||||
assert stride in [1, 2]
|
||||
|
||||
self.use_res_connect = self.stride == 1 and inp == oup
|
||||
|
||||
self.conv = nn.Sequential(
|
||||
# pw
|
||||
nn.Conv2d(inp, inp * expand_ratio, 1, 1, 0, bias=False),
|
||||
nn.BatchNorm2d(inp * expand_ratio),
|
||||
nn.ReLU6(inplace=True),
|
||||
# dw
|
||||
nn.Conv2d(inp * expand_ratio, inp * expand_ratio, 3, stride, 1, groups=inp * expand_ratio, bias=False),
|
||||
nn.BatchNorm2d(inp * expand_ratio),
|
||||
nn.ReLU6(inplace=True),
|
||||
# pw-linear
|
||||
nn.Conv2d(inp * expand_ratio, oup, 1, 1, 0, bias=False),
|
||||
nn.BatchNorm2d(oup),
|
||||
)
|
||||
|
||||
def forward(self, x):
|
||||
if self.use_res_connect:
|
||||
return x + self.conv(x)
|
||||
else:
|
||||
return self.conv(x)
|
||||
|
||||
|
||||
class MobileBlock(nn.Module):
|
||||
# introduced to simplify creation of FPN
|
||||
def __init__(self, residual_setting, input_channel, output_channel):
|
||||
super(MobileBlock, self).__init__()
|
||||
t, c, n, s = residual_setting
|
||||
|
||||
block_seq = []
|
||||
for i in range(n):
|
||||
if i == 0:
|
||||
block_seq.append(InvertedResidual(input_channel, output_channel, s, t))
|
||||
else:
|
||||
block_seq.append(InvertedResidual(input_channel, output_channel, 1, t))
|
||||
input_channel = output_channel
|
||||
self.mobile_block = nn.Sequential(*block_seq)
|
||||
self.output_channel = output_channel
|
||||
|
||||
def forward(self, x):
|
||||
return self.mobile_block(x)
|
||||
|
||||
|
||||
class MobileNetV2(nn.Module):
|
||||
def __init__(self, n_class=1000, input_size=224, input_dim=1, width_mult=1., arch_opt=1):
|
||||
super(MobileNetV2, self).__init__()
|
||||
# setting of inverted residual blocks
|
||||
self.interverted_residual_setting = [
|
||||
# t, c, n, s
|
||||
[1, 16, 1, 2],
|
||||
#[1, 16, 1, 1],
|
||||
[6, 24, 2, 2],
|
||||
[6, 32, 3, 2],
|
||||
[6, 64, 4, 2],
|
||||
[6, 96, 3, 1],
|
||||
[6, 160, 3, 1],
|
||||
# [6, 320, 1, 1],
|
||||
]
|
||||
|
||||
# set arch option
|
||||
self.arch_opt = arch_opt
|
||||
|
||||
# building first layer
|
||||
assert input_size % 32 == 0
|
||||
input_channel = int(32 * width_mult)
|
||||
|
||||
# self.last_channel = int(1280 * width_mult) if width_mult > 1.0 else 1280
|
||||
if self.arch_opt == 1:
|
||||
self.last_channel = int(512 * width_mult) if width_mult > 1.0 else 512
|
||||
elif self.arch_opt == 2:
|
||||
self.last_channel = int(256 * width_mult) if width_mult > 1.0 else 256
|
||||
|
||||
self.features = [conv_bn(input_dim, input_channel, 2)]
|
||||
# building inverted residual blocks
|
||||
for ii, residual_setting in enumerate(self.interverted_residual_setting):
|
||||
t, c, n, s = residual_setting
|
||||
output_channel = int(c * width_mult)
|
||||
new_block = MobileBlock(residual_setting, input_channel, output_channel)
|
||||
self.features.append(new_block)
|
||||
input_channel = new_block.output_channel
|
||||
|
||||
# building last several layers
|
||||
self.features.append(conv_1x1_bn(input_channel, self.last_channel))
|
||||
|
||||
if self.arch_opt == 1:
|
||||
self.features.append(nn.AvgPool2d(input_size / 32, stride=1))
|
||||
# need stride=1 for FCN (because default stride is kernel_sz)
|
||||
# self.features.append(nn.MaxPool2d(kernel_size=input_size / 32, stride=1))
|
||||
|
||||
# building classifier
|
||||
self.classifier = nn.Sequential(
|
||||
nn.Dropout(),
|
||||
nn.Linear(self.last_channel, n_class),
|
||||
)
|
||||
|
||||
elif self.arch_opt == 2:
|
||||
# building classifier
|
||||
self.classifier = nn.Sequential(
|
||||
nn.Linear(self.last_channel * 7 * 7, 384),
|
||||
nn.ReLU(inplace=True),
|
||||
nn.Dropout(),
|
||||
nn.Linear(384, n_class),
|
||||
)
|
||||
|
||||
# make it nn.Sequential
|
||||
self.features = nn.Sequential(*self.features)
|
||||
|
||||
self._initialize_weights()
|
||||
|
||||
def forward(self, x):
|
||||
for layer in self.features:
|
||||
x = layer(x)
|
||||
|
||||
# x = x.view(-1, self.last_channel)
|
||||
x = x.view(x.size(0), -1)
|
||||
x = self.classifier(x)
|
||||
return x
|
||||
|
||||
def _initialize_weights(self):
|
||||
for m in self.modules():
|
||||
if isinstance(m, nn.Conv2d):
|
||||
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
|
||||
m.weight.data.normal_(0, math.sqrt(2. / n))
|
||||
if m.bias is not None:
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.BatchNorm2d):
|
||||
m.weight.data.fill_(1)
|
||||
m.bias.data.zero_()
|
||||
elif isinstance(m, nn.Linear):
|
||||
n = m.weight.size(1)
|
||||
m.weight.data.normal_(0, 0.01)
|
||||
m.bias.data.zero_()
|
||||
|
||||
|
||||
@@ -0,0 +1,99 @@
|
||||
import torch
|
||||
import os
|
||||
|
||||
from .linenet import LineNet, LineNetFCN
|
||||
|
||||
from .mobilenetv2_mod03 import MobileNetV2
|
||||
from .mobilenetv2_fpn import MobileNetV2FPN
|
||||
|
||||
from ..utils.torchcv.models.net import FPNSSD
|
||||
from ..utils.torchcv.models.rpn_net import RPN
|
||||
|
||||
|
||||
def get_cunei_net_basic(model_version, device, arch_type, arch_opt=1, width_mult=0.5,
|
||||
relative_path='../../', num_classes=240, num_c=1):
|
||||
|
||||
# create classifier model
|
||||
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
|
||||
arch_opt=arch_opt)
|
||||
|
||||
# load pretrained weights
|
||||
weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, model_version)
|
||||
basic_net.load_state_dict(torch.load(weights_path)) # , strict=False
|
||||
|
||||
# deploy to device and switch to train
|
||||
basic_net.to(device)
|
||||
basic_net.eval() # ATTENTION!
|
||||
|
||||
return basic_net
|
||||
|
||||
|
||||
def get_line_net_fcn(model_version, device, relative_path='../../', num_classes=2, num_c=1):
|
||||
|
||||
# choose model filename
|
||||
weights_path = '{}results/weights/lineNet_basic_{}.pth'.format(relative_path, model_version)
|
||||
assert os.path.exists(weights_path), "File '{}' not found!".format(weights_path)
|
||||
|
||||
# load model definition
|
||||
model_ft = LineNet(num_classes=num_classes, input_channels=num_c)
|
||||
|
||||
# load model weights
|
||||
model_ft.load_state_dict(torch.load(weights_path), strict=False)
|
||||
|
||||
# create fully-convolutional version (convolutionalize)
|
||||
model_fcn = LineNetFCN(model_ft, num_classes)
|
||||
|
||||
# deploy model to device
|
||||
model_fcn = model_fcn.to(device)
|
||||
|
||||
# switch model to evaluation mode
|
||||
# model_fcn.train(False)
|
||||
model_fcn.eval()
|
||||
|
||||
return model_fcn
|
||||
|
||||
|
||||
def get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
|
||||
relative_path='../../', num_classes=240, num_c=1, rnd_init_model=False):
|
||||
# create classifier model
|
||||
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
|
||||
arch_opt=arch_opt)
|
||||
|
||||
# create FPN model with classifier model
|
||||
fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
|
||||
|
||||
# load full detector net
|
||||
fpnssd_net = FPNSSD(fpn_net, num_classes)
|
||||
if not rnd_init_model:
|
||||
# load pretrained weights
|
||||
weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
|
||||
fpnssd_net.load_state_dict(torch.load(weights_path, map_location=device)) # , strict=False
|
||||
|
||||
# deploy to device and switch to train
|
||||
fpnssd_net.to(device)
|
||||
fpnssd_net.eval()
|
||||
|
||||
return fpnssd_net
|
||||
|
||||
|
||||
def get_rpn_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
|
||||
relative_path='../../', num_classes=240, num_c=1):
|
||||
# create classifier model
|
||||
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
|
||||
arch_opt=arch_opt)
|
||||
|
||||
# create FPN model with classifier model
|
||||
fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
|
||||
|
||||
# load full detector net
|
||||
rpn_net = RPN(fpn_net, num_classes, with_64)
|
||||
# load pretrained weights
|
||||
weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
|
||||
rpn_net.load_state_dict(torch.load(weights_path)) # , strict=False
|
||||
|
||||
# deploy to device and switch to train
|
||||
rpn_net.to(device)
|
||||
rpn_net.eval()
|
||||
|
||||
return rpn_net
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
@author: tdencker
|
||||
"""
|
||||
|
||||
import pandas as pd
|
||||
|
||||
|
||||
class SignsStats(object):
|
||||
|
||||
def __init__(self, tblSignHeight=128, stats_csv_file='../../data/unicode_sign_stats.csv'):
|
||||
self.tblSignHeight = tblSignHeight
|
||||
self.sign_df = None
|
||||
# load stats file
|
||||
self.load_stats_from_file(stats_csv_file)
|
||||
|
||||
def load_stats_from_file(self, stats_csv_file):
|
||||
# Load sign stats
|
||||
sign_df = pd.read_csv(stats_csv_file)
|
||||
sign_df = sign_df.set_index('train_lbl')
|
||||
# assign
|
||||
self.sign_df = sign_df
|
||||
|
||||
def get_sign_width(self, train_lbl, sign_width=None):
|
||||
""" Return width of sign from stats """
|
||||
# check default sign width
|
||||
if sign_width is None:
|
||||
sign_width = self.tblSignHeight
|
||||
if train_lbl in self.sign_df.index:
|
||||
sign_width = self.sign_df.width.loc[train_lbl]
|
||||
return sign_width
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
import pandas as pd
|
||||
from ..utils.path_utils import *
|
||||
|
||||
|
||||
class TransliterationSet:
|
||||
|
||||
def __init__(self, collections=[], relative_path='../../'):
|
||||
# load list of coll_tl_df
|
||||
list_coll_tl_df = []
|
||||
for collection in collections:
|
||||
coll_tl_file = '{}data/transliterations/transliterations_{}.csv'.format(relative_path, collection)
|
||||
# check if transliteration exists
|
||||
if os.path.isfile(coll_tl_file):
|
||||
print('Transliteration file {} found!'.format(coll_tl_file))
|
||||
# load transliteration
|
||||
coll_tl_df = pd.read_csv(coll_tl_file)
|
||||
# select subset of columns
|
||||
coll_tl_df = coll_tl_df[['segm_idx', 'tablet_CDLI', 'train_label', 'mzl_label', 'line_idx', 'pos_idx', 'status']]
|
||||
coll_tl_df['lbl'] = coll_tl_df['train_label']
|
||||
coll_tl_df['mzl_lbl'] = coll_tl_df['mzl_label']
|
||||
else:
|
||||
print('Transliteration file {} NOT found!'.format(coll_tl_file))
|
||||
coll_tl_df = pd.DataFrame()
|
||||
# append coll_tl_df to list
|
||||
list_coll_tl_df.append(coll_tl_df)
|
||||
# make accessible
|
||||
self.collections = collections
|
||||
self.list_coll_tl_df = list_coll_tl_df
|
||||
|
||||
def get_tl_df(self, seg_rec, verbose=True):
|
||||
# init empty tl
|
||||
num_lines = 0
|
||||
tl_df = pd.DataFrame()
|
||||
# select corresponding coll_tl_df
|
||||
collection = seg_rec.collection
|
||||
coll_idx = self.collections.index(collection)
|
||||
coll_tl_df = self.list_coll_tl_df[coll_idx]
|
||||
# check if transliterations available
|
||||
if len(coll_tl_df) > 0:
|
||||
# select corresponding tl_df slice in coll_df
|
||||
tl_df = coll_tl_df[coll_tl_df.segm_idx == seg_rec.name]
|
||||
# compute number lines
|
||||
num_lines = tl_df.line_idx.nunique()
|
||||
# report if transliteration is missing
|
||||
if len(tl_df) == 0:
|
||||
if verbose:
|
||||
print('No transliteration found for {}!'.format(seg_rec.tablet_CDLI))
|
||||
return tl_df, num_lines
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,57 @@
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def load_cunei_mzl_df(path_to_csv='./cunei_mzl.csv', filter=False):
|
||||
cunei_mzl_df = pd.read_csv(path_to_csv, index_col=0)
|
||||
# avoid mzl idx without codepoint
|
||||
cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
|
||||
# deal with multiple versions
|
||||
#cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
|
||||
# create composite sign
|
||||
cunei_mzl_df['comp_script'] = cunei_mzl_df[['script_0', 'script_1', 'script_2']].fillna('').apply(
|
||||
lambda x: ''.join(x), axis=1)
|
||||
# decode to unicode (for matching with oracc utf8)
|
||||
cunei_mzl_df.comp_script = cunei_mzl_df.comp_script.apply(lambda x: x.decode('utf8'))
|
||||
|
||||
if filter:
|
||||
# avoid mzl idx without codepoint
|
||||
cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
|
||||
# deal with multiple versions
|
||||
cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
|
||||
|
||||
return cunei_mzl_df
|
||||
|
||||
|
||||
# def get_unicode(mzl_idx, cunei_mzl_df):
|
||||
# select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
|
||||
# if select_mzl_idx.any():
|
||||
# cpt_hex = cunei_mzl_df.codepoint_0[select_mzl_idx].str[2:].values[0] # get hex
|
||||
# cpt_int = int(cpt_hex, 16) # convert to int
|
||||
# return unichr(cpt_int)
|
||||
# else:
|
||||
# return mzl_idx
|
||||
|
||||
|
||||
def get_unicode_comp(mzl_idx, cunei_mzl_df):
|
||||
# also handle composite signs by concatenation
|
||||
select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
|
||||
if select_mzl_idx.any():
|
||||
cunei_rec = cunei_mzl_df[select_mzl_idx]
|
||||
out_str = ''
|
||||
for i in range(cunei_rec.num_cpts):
|
||||
cpt_hex = cunei_rec['codepoint_{}'.format(i)].str[2:].values[0] # get hex
|
||||
cpt_int = int(cpt_hex, 16) # convert to int
|
||||
out_str += unichr(cpt_int)
|
||||
return out_str
|
||||
else:
|
||||
return mzl_idx
|
||||
|
||||
|
||||
def get_sign_name(mzl_idx, cunei_mzl_df):
|
||||
select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
|
||||
if select_mzl_idx.any():
|
||||
cunei_rec = cunei_mzl_df[select_mzl_idx]
|
||||
#return cunei_rec['Sign Name'].str.decode('utf8').item()
|
||||
return cunei_rec['Sign Name'].str.decode('utf8').str.split('(').str[0].item()
|
||||
else:
|
||||
return mzl_idx
|
||||
@@ -0,0 +1,28 @@
|
||||
import json
|
||||
import numpy as np
|
||||
|
||||
|
||||
def get_label_list(path_to_lbl_file='../../data/newLabels.json'):
|
||||
# get list that maps old -> new
|
||||
|
||||
# load label list
|
||||
with open(path_to_lbl_file) as json_data:
|
||||
lbl_list = json.load(json_data)
|
||||
return lbl_list
|
||||
|
||||
|
||||
def get_lbl2lbl(path_to_lbl_file):
|
||||
# get list that maps new -> old
|
||||
# actually using lbl_list with index function works as well !
|
||||
|
||||
# load label list
|
||||
lbl_list = np.asarray(get_label_list(path_to_lbl_file))
|
||||
# print np.unique(lbl_list)
|
||||
# reverse (assume mapping is unique)
|
||||
lbl2lbl = np.zeros(len(np.unique(lbl_list)), ) # 240
|
||||
for (i, val) in enumerate(lbl_list):
|
||||
lbl2lbl[val] = i # new -> old
|
||||
# since mapping is not unique for 0, need to set manually to background
|
||||
lbl2lbl[0] = 0
|
||||
return lbl2lbl
|
||||
|
||||
@@ -0,0 +1,200 @@
|
||||
# --------------------------------------------------------
|
||||
# Parts of code adapted from Ross Girshick's Fast/er R-CNN code
|
||||
# --------------------------------------------------------
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def unique_boxes(boxes, scale=1.0):
|
||||
"""Return indices of unique boxes."""
|
||||
v = np.array([1, 1e3, 1e6, 1e9])
|
||||
hashes = np.round(boxes * scale).dot(v)
|
||||
_, index = np.unique(hashes, return_index=True)
|
||||
return np.sort(index)
|
||||
|
||||
|
||||
def xywh_to_xyxy(boxes):
|
||||
"""Convert [x y w h] box format to [x1 y1 x2 y2] format."""
|
||||
return np.hstack((boxes[:, 0:2], boxes[:, 0:2] + boxes[:, 2:4] - 1))
|
||||
|
||||
|
||||
def xyxy_to_xywh(boxes):
|
||||
"""Convert [x1 y1 x2 y2] box format to [x y w h] format."""
|
||||
return np.hstack((boxes[:, 0:2], boxes[:, 2:4] - boxes[:, 0:2] + 1))
|
||||
|
||||
|
||||
def convert_bbox_global2local(gbbox, seg_bbox):
|
||||
relative_bbox = np.array(gbbox) - np.array(seg_bbox[:2] * 2)
|
||||
return relative_bbox.tolist()
|
||||
|
||||
|
||||
def convert_bbox_local2global(lbbox, seg_bbox):
|
||||
global_bbox = np.array(lbbox) + np.array(seg_bbox[:2] * 2)
|
||||
return global_bbox.tolist()
|
||||
|
||||
|
||||
def validate_boxes(boxes, width=0, height=0):
|
||||
"""Check that a set of boxes are valid."""
|
||||
x1 = boxes[:, 0]
|
||||
y1 = boxes[:, 1]
|
||||
x2 = boxes[:, 2]
|
||||
y2 = boxes[:, 3]
|
||||
assert (x1 >= 0).all()
|
||||
assert (y1 >= 0).all()
|
||||
assert (x2 >= x1).all()
|
||||
assert (y2 >= y1).all()
|
||||
assert (x2 < width).all()
|
||||
assert (y2 < height).all()
|
||||
|
||||
|
||||
def filter_small_boxes(boxes, min_size):
|
||||
w = boxes[:, 2] - boxes[:, 0]
|
||||
h = boxes[:, 3] - boxes[:, 1]
|
||||
keep = np.where((w >= min_size) & (h > min_size))[0]
|
||||
return keep
|
||||
|
||||
|
||||
def clip_boxes(boxes, im_shape):
|
||||
"""
|
||||
Clip boxes to image boundaries.
|
||||
usage for single: bb_new = clip_boxes(bb[np.newaxis, :], [imw, imh]).squeeze()
|
||||
"""
|
||||
|
||||
# x1 >= 0
|
||||
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
|
||||
# y1 >= 0
|
||||
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
|
||||
# x2 < im_shape[1]
|
||||
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
|
||||
# y2 < im_shape[0]
|
||||
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
|
||||
return boxes
|
||||
|
||||
|
||||
# def clip_boxes(boxes, im_shape):
|
||||
# """Clip boxes to image boundaries."""
|
||||
# # x1 >= 0
|
||||
# boxes[:, 0::4] = np.maximum(boxes[:, 0::4], 0)
|
||||
# # y1 >= 0
|
||||
# boxes[:, 1::4] = np.maximum(boxes[:, 1::4], 0)
|
||||
# # x2 < im_shape[1]
|
||||
# boxes[:, 2::4] = np.minimum(boxes[:, 2::4], im_shape[1] - 1)
|
||||
# # y2 < im_shape[0]
|
||||
# boxes[:, 3::4] = np.minimum(boxes[:, 3::4], im_shape[0] - 1)
|
||||
# return boxes
|
||||
|
||||
|
||||
def intersection_over_union(Reframe, GTframe):
|
||||
# by Oemer
|
||||
|
||||
x1 = Reframe[0]
|
||||
y1 = Reframe[1]
|
||||
width1 = Reframe[2] - Reframe[0]
|
||||
height1 = Reframe[3] - Reframe[1]
|
||||
|
||||
x2 = GTframe[0]
|
||||
y2 = GTframe[1]
|
||||
width2 = GTframe[2] - GTframe[0]
|
||||
height2 = GTframe[3] - GTframe[1]
|
||||
|
||||
endx = max(x1 + width1, x2 + width2)
|
||||
startx = min(x1, x2)
|
||||
width = width1 + width2 - (endx - startx)
|
||||
|
||||
endy = max(y1 + height1, y2 + height2)
|
||||
starty = min(y1, y2)
|
||||
height = height1 + height2 - (endy - starty)
|
||||
|
||||
if width <= 0 or height <= 0:
|
||||
ratio = 0
|
||||
else:
|
||||
Area = width * height
|
||||
Area1 = width1 * height1
|
||||
Area2 = width2 * height2
|
||||
ratio = Area * 1. / (Area1 + Area2 - Area)
|
||||
# return IOU
|
||||
return ratio # Reframe,GTframe
|
||||
|
||||
|
||||
def bb_intersection_over_union(box_a, box_b):
|
||||
# adopted from https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
|
||||
|
||||
# determine the (x, y)-coordinates of the intersection rectangle
|
||||
xA = max(box_a[0], box_b[0])
|
||||
yA = max(box_a[1], box_b[1])
|
||||
xB = min(box_a[2], box_b[2])
|
||||
yB = min(box_a[3], box_b[3])
|
||||
|
||||
# compute the area of intersection rectangle
|
||||
inter_area = (xB - xA + 1) * (yB - yA + 1)
|
||||
|
||||
# compute the area of both the prediction and ground-truth
|
||||
# rectangles
|
||||
box_a_area = (box_a[2] - box_a[0] + 1) * (box_a[3] - box_a[1] + 1)
|
||||
box_b_area = (box_b[2] - box_b[0] + 1) * (box_b[3] - box_b[1] + 1)
|
||||
|
||||
# compute the intersection over union by taking the intersection
|
||||
# area and dividing it by the sum of prediction + ground-truth
|
||||
# areas - the intersection area
|
||||
if (xB - xA + 1) <= 0 or (yB - yA + 1) <= 0:
|
||||
iou = 0
|
||||
else:
|
||||
iou = inter_area / float(box_a_area + box_b_area - inter_area)
|
||||
|
||||
# return the intersection over union value
|
||||
return iou
|
||||
|
||||
|
||||
def box_iou(box1, box2):
|
||||
'''Compute the intersection over union of two set of boxes.
|
||||
TD: modified to be legacy compatible
|
||||
The box order must be (xmin, ymin, xmax, ymax).
|
||||
Args:
|
||||
box1: (tensor) bounding boxes, sized [N,4].
|
||||
box2: (tensor) bounding boxes, sized [M,4].
|
||||
Return:
|
||||
(tensor) iou, sized [N,M].
|
||||
Reference:
|
||||
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
|
||||
'''
|
||||
N = box1.shape[0]
|
||||
M = box2.shape[0]
|
||||
|
||||
lt = np.maximum(box1[:,None,:2], box2[:,:2]) # [N,M,2]
|
||||
rb = np.minimum(box1[:,None,2:], box2[:,2:]) # [N,M,2]
|
||||
|
||||
wh = np.clip((rb-lt+1.), 0, None) # [N,M,2]
|
||||
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
|
||||
|
||||
area1 = (box1[:,2]-box1[:,0]+1.) * (box1[:,3]-box1[:,1]+1.) # [N,]
|
||||
area2 = (box2[:,2]-box2[:,0]+1.) * (box2[:,3]-box2[:,1]+1.) # [M,]
|
||||
iou = inter / (area1[:,None] + area2 - inter)
|
||||
return iou
|
||||
|
||||
|
||||
def box_iou_org(box1, box2):
|
||||
'''Compute the intersection over union of two set of boxes.
|
||||
The box order must be (xmin, ymin, xmax, ymax).
|
||||
Args:
|
||||
box1: (tensor) bounding boxes, sized [N,4].
|
||||
box2: (tensor) bounding boxes, sized [M,4].
|
||||
Return:
|
||||
(tensor) iou, sized [N,M].
|
||||
Reference:
|
||||
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
|
||||
'''
|
||||
N = box1.shape[0]
|
||||
M = box2.shape[0]
|
||||
|
||||
lt = np.maximum(box1[:,None,:2], box2[:,:2]) # [N,M,2]
|
||||
rb = np.minimum(box1[:,None,2:], box2[:,2:]) # [N,M,2]
|
||||
|
||||
wh = np.clip((rb-lt), 0, None) # [N,M,2]
|
||||
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
|
||||
|
||||
area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1]) # [N,]
|
||||
area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1]) # [M,]
|
||||
iou = inter / (area1[:,None] + area2 - inter)
|
||||
return iou
|
||||
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
# --------------------------------------------------------
|
||||
# Fast R-CNN
|
||||
# Copyright (c) 2015 Microsoft
|
||||
# Licensed under The MIT License [see LICENSE for details]
|
||||
# Written by Ross Girshick
|
||||
# --------------------------------------------------------
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def nms(dets, scores, threshold=0.5):
|
||||
x1 = dets[:, 0]
|
||||
y1 = dets[:, 1]
|
||||
x2 = dets[:, 2]
|
||||
y2 = dets[:, 3]
|
||||
# scores = dets[:, 4]
|
||||
|
||||
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
xx1 = np.maximum(x1[i], x1[order[1:]])
|
||||
yy1 = np.maximum(y1[i], y1[order[1:]])
|
||||
xx2 = np.minimum(x2[i], x2[order[1:]])
|
||||
yy2 = np.minimum(y2[i], y2[order[1:]])
|
||||
|
||||
w = np.maximum(0.0, xx2 - xx1 + 1)
|
||||
h = np.maximum(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
ovr = inter / (areas[i] + areas[order[1:]] - inter)
|
||||
|
||||
inds = np.where(ovr <= threshold)[0]
|
||||
order = order[inds + 1]
|
||||
|
||||
return keep
|
||||
@@ -0,0 +1,44 @@
|
||||
import os
|
||||
|
||||
|
||||
# file names, folders, paths
|
||||
|
||||
def make_folder(res_path):
|
||||
# create folder, if it does not exist
|
||||
if not os.path.exists(res_path):
|
||||
os.makedirs(res_path)
|
||||
|
||||
|
||||
def prepare_data_gen_folder(relative_path, sign_model_version, collection_name, res_folder_name='results'):
|
||||
# create path to file that stores generated training data
|
||||
res_path = '{}pytorch/{}/{}'.format(relative_path, res_folder_name, sign_model_version)
|
||||
train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path, collection_name)
|
||||
collection_subfolder = '{}/images/'.format(collection_name)
|
||||
# create folder, if necessary
|
||||
make_folder(res_path)
|
||||
# remove generated file, if it exists
|
||||
if os.path.isfile(train_data_ext_file):
|
||||
os.remove(train_data_ext_file)
|
||||
|
||||
return train_data_ext_file, collection_subfolder, res_path
|
||||
|
||||
|
||||
def prepare_data_gen_folder_slim(collection_name, res_path_base):
|
||||
# create path to file that stores generated training data
|
||||
|
||||
train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path_base, collection_name)
|
||||
collection_subfolder = '{}/images/'.format(collection_name)
|
||||
# create folder, if necessary
|
||||
make_folder(res_path_base)
|
||||
# remove generated file, if it exists
|
||||
if os.path.isfile(train_data_ext_file):
|
||||
os.remove(train_data_ext_file)
|
||||
|
||||
return train_data_ext_file, collection_subfolder
|
||||
|
||||
|
||||
def clean_cdli(cdli_str):
|
||||
# remove Vs, Rs
|
||||
out_str = cdli_str.replace("Vs", "")
|
||||
out_str = out_str.replace("Rs", "")
|
||||
return out_str
|
||||
@@ -0,0 +1,384 @@
|
||||
import time
|
||||
from collections import OrderedDict
|
||||
import copy
|
||||
|
||||
from tqdm import tqdm
|
||||
import numpy as np
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from torch.autograd import Variable
|
||||
from torch.nn.modules.module import _addindent
|
||||
|
||||
import torchvision
|
||||
from torchvision.transforms import *
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
|
||||
# HELPER FUNCTIONS
|
||||
|
||||
|
||||
def weights_init(m):
|
||||
if isinstance(m, nn.Linear):
|
||||
m.weight.data.normal_(0, 0.01) # 0.005
|
||||
m.bias.data.zero_()
|
||||
|
||||
|
||||
def torch_summarize(model, show_weights=True, show_parameters=True):
|
||||
# code found here: https://stackoverflow.com/questions/42480111/model-summary-in-pytorch
|
||||
"""Summarizes torch model by showing trainable parameters and weights."""
|
||||
tmpstr = model.__class__.__name__ + ' (\n'
|
||||
for key, module in model._modules.items():
|
||||
# if it contains layers let call it recursively to get params and weights
|
||||
if type(module) in [
|
||||
torch.nn.modules.container.Container,
|
||||
torch.nn.modules.container.Sequential
|
||||
]:
|
||||
modstr = torch_summarize(module)
|
||||
else:
|
||||
modstr = module.__repr__()
|
||||
modstr = _addindent(modstr, 2)
|
||||
|
||||
params = sum([np.prod(p.size()) for p in module.parameters()])
|
||||
weights = tuple([tuple(p.size()) for p in module.parameters()])
|
||||
|
||||
tmpstr += ' (' + key + '): ' + modstr
|
||||
if show_weights:
|
||||
tmpstr += ', weights={}'.format(weights)
|
||||
if show_parameters:
|
||||
tmpstr += ', parameters={}'.format(params)
|
||||
tmpstr += '\n'
|
||||
|
||||
tmpstr = tmpstr + ')'
|
||||
return tmpstr
|
||||
|
||||
|
||||
def summary(mymodule, input_size):
|
||||
# code from PR by isaykatsman https://github.com/pytorch/pytorch/pull/3043
|
||||
def register_hook(module):
|
||||
def hook(module, input, output):
|
||||
if module._modules: # only want base layers
|
||||
return
|
||||
class_name = str(module.__class__).split('.')[-1].split("'")[0]
|
||||
module_idx = len(summary)
|
||||
m_key = '%s-%i' % (class_name, module_idx + 1)
|
||||
summary[m_key] = OrderedDict()
|
||||
summary[m_key]['input_shape'] = list(input[0].size())
|
||||
summary[m_key]['input_shape'][0] = None
|
||||
if output.__class__.__name__ == 'tuple':
|
||||
summary[m_key]['output_shape'] = list(output[0].size())
|
||||
else:
|
||||
summary[m_key]['output_shape'] = list(output.size())
|
||||
summary[m_key]['output_shape'][0] = None
|
||||
|
||||
params = 0
|
||||
# iterate through parameters and count num params
|
||||
for name, p in module._parameters.items():
|
||||
params += torch.numel(p.data)
|
||||
summary[m_key]['trainable'] = p.requires_grad
|
||||
|
||||
summary[m_key]['nb_params'] = params
|
||||
|
||||
if not isinstance(module, torch.nn.Sequential) and \
|
||||
not isinstance(module, torch.nn.ModuleList) and \
|
||||
not (module == mymodule):
|
||||
hooks.append(module.register_forward_hook(hook))
|
||||
|
||||
# check if there are multiple inputs to the network
|
||||
if isinstance(input_size[0], (list, tuple)):
|
||||
x = [Variable(torch.rand(1, *in_size)) for in_size in input_size]
|
||||
else:
|
||||
x = Variable(torch.randn(1, *input_size))
|
||||
|
||||
# create properties
|
||||
summary = OrderedDict()
|
||||
hooks = []
|
||||
# register hook
|
||||
mymodule.apply(register_hook)
|
||||
# make a forward pass
|
||||
mymodule(x)
|
||||
# remove these hooks
|
||||
for h in hooks:
|
||||
h.remove()
|
||||
|
||||
# print out neatly
|
||||
def get_names(module, name, acc):
|
||||
if not module._modules:
|
||||
acc.append(name)
|
||||
else:
|
||||
for key in module._modules.keys():
|
||||
p_name = key if name == "" else name + "." + key
|
||||
get_names(module._modules[key], p_name, acc)
|
||||
names = []
|
||||
get_names(mymodule, "", names)
|
||||
|
||||
col_width = 25 # should be >= 12
|
||||
summary_width = 61
|
||||
|
||||
def crop(s):
|
||||
return s[:col_width] if len(s) > col_width else s
|
||||
|
||||
print('_' * summary_width)
|
||||
print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
|
||||
'Layer (type)', 'Output Shape', 'Param #', col_width))
|
||||
print('=' * summary_width)
|
||||
total_params = 0
|
||||
trainable_params = 0
|
||||
for (i, l_type), l_name in zip(enumerate(summary), names):
|
||||
d = summary[l_type]
|
||||
total_params += d['nb_params']
|
||||
if 'trainable' in d and d['trainable']:
|
||||
trainable_params += d['nb_params']
|
||||
print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
|
||||
crop(l_name + ' (' + l_type[:-2] + ')'), crop(str(d['output_shape'])),
|
||||
crop(str(d['nb_params'])), col_width))
|
||||
if i < len(summary) - 1:
|
||||
print('_' * summary_width)
|
||||
print('=' * summary_width)
|
||||
print('Total params: ' + str(total_params))
|
||||
print('Trainable params: ' + str(trainable_params))
|
||||
print('Non-trainable params: ' + str((total_params - trainable_params)))
|
||||
print('_' * summary_width)
|
||||
|
||||
|
||||
def visualize_model(model, dataloader, re_transform, device, num_images=6):
|
||||
was_training = model.training
|
||||
images_so_far = 0
|
||||
fig = plt.figure(figsize=(10, 10))
|
||||
|
||||
# switch to bachnorm and dropout to eval mode
|
||||
model.eval()
|
||||
|
||||
with torch.no_grad():
|
||||
for i, (inputs, labels) in enumerate(dataloader):
|
||||
inputs = inputs.to(device)
|
||||
labels = labels.to(device)
|
||||
|
||||
# compute predictions using the model
|
||||
outputs = model(inputs)
|
||||
_, preds = torch.max(outputs, 1)
|
||||
|
||||
for j in range(inputs.size()[0]):
|
||||
images_so_far += 1
|
||||
ax = plt.subplot(num_images // 2, 2, images_so_far)
|
||||
ax.axis('off')
|
||||
ax.set_title('predicted: {}'.format(preds[j]))
|
||||
|
||||
ax.imshow(re_transform(inputs.cpu().data[j].clone()), cmap=plt.cm.Greys_r)
|
||||
|
||||
if images_so_far == num_images:
|
||||
model.train(mode=was_training)
|
||||
return
|
||||
model.train(mode=was_training)
|
||||
|
||||
|
||||
def prepare_embedding(model_feature, dataloader, re_transform, device):
|
||||
|
||||
# switch to bachnorm and dropout to eval mode
|
||||
model_feature.eval()
|
||||
|
||||
f_list = []
|
||||
i_list = []
|
||||
l_list = []
|
||||
|
||||
with torch.no_grad():
|
||||
# inputs, labels = next(iter(dataloaders['train']))
|
||||
for inputs, labels in dataloader:
|
||||
em_sz = inputs.shape[0]
|
||||
|
||||
# append labels
|
||||
l_list.append(labels)
|
||||
|
||||
# undo transform, convert to RGB, and convert back to tensor
|
||||
t_list = []
|
||||
for t in inputs:
|
||||
t_list.append(torchvision.transforms.ToTensor()(re_transform(t.clone()).convert('RGB')))
|
||||
|
||||
# append images
|
||||
i_list.append(torch.stack(t_list))
|
||||
|
||||
# compute feature
|
||||
inputs = inputs.to(device)
|
||||
|
||||
# append features
|
||||
f_list.append(model_feature(inputs).view(em_sz, -1).data)
|
||||
|
||||
return torch.cat(f_list), torch.cat(l_list).numpy(), torch.cat(i_list)
|
||||
|
||||
|
||||
def prepare_prcurves(model, dataloader, device):
|
||||
# create softmax
|
||||
softmax = nn.Softmax()
|
||||
# loop over dataset with dataloader
|
||||
p_list = []
|
||||
l_list = []
|
||||
with torch.no_grad():
|
||||
# inputs, labels = next(iter(dataloaders['train']))
|
||||
for inputs, labels in dataloader:
|
||||
# append labels
|
||||
l_list.append(labels)
|
||||
# prepare input
|
||||
inputs = inputs.to(device)
|
||||
# apply network model
|
||||
output = model(inputs)
|
||||
# compute softmax
|
||||
predicted = softmax(output)
|
||||
# append features
|
||||
p_list.append(predicted.data.cpu())
|
||||
|
||||
# concat to tensors
|
||||
return torch.cat(p_list), torch.cat(l_list)
|
||||
|
||||
|
||||
def preprocess_tablet_im(pil_im, scale, shift=5.0):
|
||||
# compute scaled size
|
||||
imw, imh = pil_im.size
|
||||
imw = int(imw * scale)
|
||||
imh = int(imh * scale)
|
||||
# determine crop size
|
||||
crop_sz = [int(imh - shift), int(imw - shift)]
|
||||
# tensor-space transforms
|
||||
ts_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.ToTensor(),
|
||||
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
|
||||
])
|
||||
# compose transforms
|
||||
tablet_transform = torchvision.transforms.Compose([
|
||||
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
|
||||
Resize((imh, imw)), # resize according to scale
|
||||
FiveCrop((crop_sz[0], crop_sz[1])), # oversample
|
||||
torchvision.transforms.Lambda(
|
||||
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
|
||||
])
|
||||
# apply transforms
|
||||
im_list = tablet_transform(pil_im)
|
||||
return im_list
|
||||
|
||||
|
||||
def predict(model, im_list, device, use_bbox_reg=False):
|
||||
inputs = im_list
|
||||
|
||||
with torch.no_grad(): # faster, less memory usage
|
||||
# prepare input
|
||||
inputs = inputs.to(device)
|
||||
|
||||
# apply network model
|
||||
# output = model(inputs) # consumes to much memory
|
||||
output = []
|
||||
for in_im in inputs:
|
||||
output.append(model(in_im.unsqueeze(0)))
|
||||
output = torch.cat(output, dim=0)
|
||||
|
||||
# convert to numpy
|
||||
predicted = output.data.cpu().numpy()
|
||||
# free memory?!
|
||||
# del output
|
||||
|
||||
# TODO: integrate bbox regression
|
||||
result_roi = []
|
||||
# stack detections to single tensor
|
||||
predicted_roi = []
|
||||
if use_bbox_reg:
|
||||
predicted_roi = np.stack(result_roi).squeeze()
|
||||
|
||||
return predicted, predicted_roi
|
||||
|
||||
|
||||
# TRAINER HELPER
|
||||
|
||||
|
||||
def get_tensorboard_writer(logs_folder='runs_new', comment=''):
|
||||
# init logger
|
||||
import os
|
||||
import socket
|
||||
from datetime import datetime
|
||||
from tensorboardX import SummaryWriter
|
||||
|
||||
current_time = datetime.now().strftime('%b%d_%H-%M-%S')
|
||||
log_dir = os.path.join(logs_folder, current_time + '_' + socket.gethostname() + comment)
|
||||
writer = SummaryWriter(log_dir=log_dir) # comment='_{}'.format(weights_path.split('/')[1].split('.')[0])
|
||||
return writer
|
||||
|
||||
|
||||
# TRAINER FUNCTIONS
|
||||
|
||||
def train_model(model, criterion, optimizer, scheduler, writer, dataloaders, dataset_sizes, device, num_epochs=25, test_every=10):
|
||||
''' generic trainer function '''
|
||||
since = time.time()
|
||||
|
||||
best_model_wts = copy.deepcopy(model.state_dict())
|
||||
best_acc = 0.0
|
||||
best_epoch = 0
|
||||
|
||||
for epoch in tqdm(range(num_epochs)):
|
||||
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
|
||||
print('-' * 10)
|
||||
|
||||
# Each epoch has a training and validation phase
|
||||
|
||||
phases = ['train', 'dev']
|
||||
if epoch % test_every != 0:
|
||||
phases = ['train']
|
||||
|
||||
for phase in phases:
|
||||
if phase == 'train':
|
||||
scheduler.step()
|
||||
model.train() # Set model to training mode
|
||||
else:
|
||||
model.eval() # Set model to evaluate mode
|
||||
|
||||
running_loss = 0.0
|
||||
running_corrects = 0
|
||||
|
||||
# Iterate over data.
|
||||
for inputs, labels in dataloaders[phase]:
|
||||
inputs = inputs.to(device)
|
||||
labels = labels.to(device)
|
||||
|
||||
# zero the parameter gradients
|
||||
optimizer.zero_grad()
|
||||
|
||||
# forward
|
||||
# track history if only in train
|
||||
with torch.set_grad_enabled(phase == 'train'):
|
||||
outputs = model(inputs)
|
||||
_, preds = torch.max(outputs, 1)
|
||||
loss = criterion(outputs, labels)
|
||||
|
||||
# backward + optimize only if in training phase
|
||||
if phase == 'train':
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
# else:
|
||||
# for name, param in model.named_parameters():
|
||||
# writer.add_histogram(name, param.clone().cpu().data.numpy(), epoch)
|
||||
|
||||
# statistics
|
||||
running_loss += loss.item() # * inputs.size(0) # uncomment this to fix a legacy bug XXX
|
||||
running_corrects += torch.sum(preds == labels.data)
|
||||
|
||||
epoch_loss = running_loss / dataset_sizes[phase]
|
||||
epoch_acc = running_corrects.double() / float(dataset_sizes[phase])
|
||||
|
||||
# write to logger
|
||||
writer.add_scalar('data/{}/loss'.format(phase), epoch_loss, epoch)
|
||||
writer.add_scalar('data/{}/acc'.format(phase), epoch_acc, epoch)
|
||||
|
||||
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
|
||||
print('{} Number correct: {} '.format(phase, running_corrects))
|
||||
|
||||
# deep copy the model
|
||||
if phase == 'dev' and epoch_acc > best_acc:
|
||||
best_acc = epoch_acc
|
||||
best_model_wts = copy.deepcopy(model.state_dict())
|
||||
best_epoch = epoch
|
||||
|
||||
time_elapsed = time.time() - since
|
||||
print('Training complete in {:.0f}m {:.0f}s'.format(
|
||||
time_elapsed // 60, time_elapsed % 60))
|
||||
print('Best val Acc: {:4f} at {}'.format(best_acc, best_epoch))
|
||||
|
||||
# load best model weights
|
||||
model.load_state_dict(best_model_wts)
|
||||
return model
|
||||
@@ -0,0 +1,134 @@
|
||||
import torch
|
||||
|
||||
|
||||
def change_box_order(boxes, order):
|
||||
'''Change box order between (xmin,ymin,xmax,ymax) and (xcenter,ycenter,width,height).
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes, sized [N,4].
|
||||
order: (str) either 'xyxy2xywh' or 'xywh2xyxy'.
|
||||
|
||||
Returns:
|
||||
(tensor) converted bounding boxes, sized [N,4].
|
||||
'''
|
||||
assert order in ['xyxy2xywh','xywh2xyxy']
|
||||
a = boxes[:,:2]
|
||||
b = boxes[:,2:]
|
||||
if order == 'xyxy2xywh':
|
||||
return torch.cat([(a+b)/2,b-a], 1)
|
||||
return torch.cat([a-b/2,a+b/2], 1)
|
||||
|
||||
def box_clamp(boxes, xmin, ymin, xmax, ymax):
|
||||
'''Clamp boxes.
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
|
||||
xmin: (number) min value of x.
|
||||
ymin: (number) min value of y.
|
||||
xmax: (number) max value of x.
|
||||
ymax: (number) max value of y.
|
||||
|
||||
Returns:
|
||||
(tensor) clamped boxes.
|
||||
'''
|
||||
boxes[:,0].clamp_(min=xmin, max=xmax)
|
||||
boxes[:,1].clamp_(min=ymin, max=ymax)
|
||||
boxes[:,2].clamp_(min=xmin, max=xmax)
|
||||
boxes[:,3].clamp_(min=ymin, max=ymax)
|
||||
return boxes
|
||||
|
||||
def box_select(boxes, xmin, ymin, xmax, ymax):
|
||||
'''Select boxes in range (xmin,ymin,xmax,ymax).
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
|
||||
xmin: (number) min value of x.
|
||||
ymin: (number) min value of y.
|
||||
xmax: (number) max value of x.
|
||||
ymax: (number) max value of y.
|
||||
|
||||
Returns:
|
||||
(tensor) selected boxes, sized [M,4].
|
||||
(tensor) selected mask, sized [N,].
|
||||
'''
|
||||
mask = (boxes[:,0]>=xmin) & (boxes[:,1]>=ymin) \
|
||||
& (boxes[:,2]<=xmax) & (boxes[:,3]<=ymax)
|
||||
boxes = boxes[mask,:]
|
||||
return boxes, mask
|
||||
|
||||
def box_iou(box1, box2):
|
||||
'''Compute the intersection over union of two set of boxes.
|
||||
|
||||
The box order must be (xmin, ymin, xmax, ymax).
|
||||
|
||||
Args:
|
||||
box1: (tensor) bounding boxes, sized [N,4].
|
||||
box2: (tensor) bounding boxes, sized [M,4].
|
||||
|
||||
Return:
|
||||
(tensor) iou, sized [N,M].
|
||||
|
||||
Reference:
|
||||
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
|
||||
'''
|
||||
N = box1.size(0)
|
||||
M = box2.size(0)
|
||||
|
||||
lt = torch.max(box1[:,None,:2], box2[:,:2]) # [N,M,2]
|
||||
rb = torch.min(box1[:,None,2:], box2[:,2:]) # [N,M,2]
|
||||
|
||||
wh = (rb-lt).clamp(min=0) # [N,M,2]
|
||||
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
|
||||
|
||||
area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1]) # [N,]
|
||||
area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1]) # [M,]
|
||||
iou = inter / (area1[:,None] + area2 - inter)
|
||||
return iou
|
||||
|
||||
def box_nms(bboxes, scores, threshold=0.5):
|
||||
'''Non maximum suppression.
|
||||
|
||||
Args:
|
||||
bboxes: (tensor) bounding boxes, sized [N,4].
|
||||
scores: (tensor) confidence scores, sized [N,].
|
||||
threshold: (float) overlap threshold.
|
||||
|
||||
Returns:
|
||||
keep: (tensor) selected indices.
|
||||
|
||||
Reference:
|
||||
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/py_cpu_nms.py
|
||||
'''
|
||||
x1 = bboxes[:,0]
|
||||
y1 = bboxes[:,1]
|
||||
x2 = bboxes[:,2]
|
||||
y2 = bboxes[:,3]
|
||||
|
||||
areas = (x2-x1 + 1) * (y2-y1 + 1)
|
||||
_, order = scores.sort(0, descending=True)
|
||||
|
||||
keep = []
|
||||
while order.numel() > 0:
|
||||
if order.numel() == 1:
|
||||
i = order.item()
|
||||
keep.append(i)
|
||||
break
|
||||
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
|
||||
xx1 = x1[order[1:]].clamp(min=x1[i].item())
|
||||
yy1 = y1[order[1:]].clamp(min=y1[i].item())
|
||||
xx2 = x2[order[1:]].clamp(max=x2[i].item())
|
||||
yy2 = y2[order[1:]].clamp(max=y2[i].item())
|
||||
|
||||
w = (xx2-xx1 + 1).clamp(min=0)
|
||||
h = (yy2-yy1 + 1).clamp(min=0)
|
||||
inter = w * h
|
||||
|
||||
overlap = inter / (areas[i] + areas[order[1:]] - inter)
|
||||
ids = (overlap <= threshold).nonzero().squeeze()
|
||||
if ids.numel() == 0:
|
||||
break
|
||||
order = order[ids+1]
|
||||
return torch.tensor(keep, dtype=torch.long)
|
||||
@@ -0,0 +1,246 @@
|
||||
'''Encode object boxes and labels.'''
|
||||
import math
|
||||
import torch
|
||||
import itertools
|
||||
import time
|
||||
import numpy as np
|
||||
|
||||
from .meshgrid import meshgrid
|
||||
from .box import box_iou, box_nms, change_box_order
|
||||
|
||||
|
||||
class FPNSSDBoxCoder:
|
||||
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
|
||||
self.num_anchors = 12 # 12 # 9
|
||||
# self.anchor_areas = (32 * 32., 64 * 64., 128 * 128., 256 * 256., 341 * 341., 426 * 426., 512 * 512.)
|
||||
# self.aspect_ratios = (1 / 2., 1 / 1., 2 / 1.)
|
||||
# self.scale_ratios = (1., pow(2, 1 / 3.), pow(2, 2 / 3.))
|
||||
|
||||
# compute num boxes for 500x500 patch
|
||||
# 500/16(stride) -> 32
|
||||
# 500/32(stride) -> 16
|
||||
# 500/64(stride) -> 8
|
||||
# (16^2 + 8^2) * num_anchors -> for 12: 3840
|
||||
# (32^2 + 16^2 + 8^2) * num_anchors -> for 12: 16128
|
||||
|
||||
self.with_64 = with_64
|
||||
if self.with_64:
|
||||
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
|
||||
else:
|
||||
self.anchor_areas = [128 * 128., 256 * 256.]
|
||||
if with_4_aspects:
|
||||
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
|
||||
if with_4_scales:
|
||||
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
|
||||
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
|
||||
self.input_size = torch.tensor(input_size).float()
|
||||
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
|
||||
|
||||
self.create_bg_class = create_bg_class
|
||||
|
||||
def _get_anchor_wh(self):
|
||||
'''Compute anchor width and height for each feature map.
|
||||
|
||||
Returns:
|
||||
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
|
||||
'''
|
||||
anchor_wh = []
|
||||
for s in self.anchor_areas:
|
||||
for ar in self.aspect_ratios: # w/h = ar
|
||||
h = math.sqrt(s / ar)
|
||||
w = ar * h
|
||||
for sr in self.scale_ratios: # scale
|
||||
anchor_h = h * sr
|
||||
anchor_w = w * sr
|
||||
anchor_wh.append([anchor_w, anchor_h])
|
||||
num_fms = len(self.anchor_areas)
|
||||
return torch.tensor(anchor_wh).view(num_fms, -1, 2)
|
||||
|
||||
def _get_anchor_boxes(self, input_size):
|
||||
'''Compute anchor boxes for each feature map.
|
||||
|
||||
Args:
|
||||
input_size: (tensor) model input size of (w,h).
|
||||
|
||||
Returns:
|
||||
anchor_boxes: (tensor) anchor boxes for each feature map. Each of size [#anchors,4],
|
||||
where #anchors = fmw * fmh * #anchors_per_cell
|
||||
'''
|
||||
num_fms = len(self.anchor_areas)
|
||||
anchor_wh = self._get_anchor_wh()
|
||||
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
|
||||
if self.with_64: # num_fms == 3:
|
||||
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
|
||||
else: # num_fms == 2:
|
||||
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
|
||||
|
||||
boxes = []
|
||||
for i in range(num_fms):
|
||||
fm_size = fm_sizes[i]
|
||||
grid_size = input_size / fm_size
|
||||
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
|
||||
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
|
||||
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
|
||||
boxes.append(box.view(-1, 4))
|
||||
return torch.cat(boxes, 0)
|
||||
|
||||
def encode(self, boxes, labels):
|
||||
'''Encode target bounding boxes and class labels.
|
||||
|
||||
SSD coding rules:
|
||||
tx = (x - anchor_x) / (variance[0]*anchor_w)
|
||||
ty = (y - anchor_y) / (variance[0]*anchor_h)
|
||||
tw = log(w / anchor_w)
|
||||
th = log(h / anchor_h)
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj,4].
|
||||
labels: (tensor) object class labels, sized [#obj,].
|
||||
|
||||
Returns:
|
||||
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
|
||||
cls_targets: (tensor) encoded class labels, sized [#anchors,].
|
||||
|
||||
Reference:
|
||||
https://github.com/chainer/chainercv/blob/master/chainercv/links/model/ssd/multibox_coder.py
|
||||
'''
|
||||
|
||||
def argmax(x):
|
||||
'''Find the max value index(row & col) of a 2D tensor.'''
|
||||
v, i = x.max(0)
|
||||
j = v.max(0)[1].item()
|
||||
return (i[j], j)
|
||||
|
||||
# before_ts = time.time()
|
||||
|
||||
anchor_boxes = self.anchor_boxes
|
||||
ious = box_iou(anchor_boxes, boxes) # [#anchors, #obj]
|
||||
index = torch.empty(anchor_boxes.size(0), dtype=torch.long).fill_(-1) # TD: for every anchorbox
|
||||
masked_ious = ious.clone()
|
||||
|
||||
# TD: this whole while loop seems unnecessary... maybe performance issue?!
|
||||
while True:
|
||||
# TD: this should be run for every gt box with fitting anchor
|
||||
i, j = argmax(masked_ious)
|
||||
if masked_ious[i, j] < 1e-6:
|
||||
break
|
||||
index[i] = j
|
||||
# TD: zero row and column
|
||||
masked_ious[i, :] = 0
|
||||
masked_ious[:, j] = 0
|
||||
|
||||
# TD: deal with anchor boxes that have not been assigned yet
|
||||
mask = (index < 0) & (ious.max(1)[0] >= 0.5)
|
||||
if mask.any():
|
||||
index[mask] = ious[mask].max(1)[1] # TD: assign if iou more than 0.5
|
||||
# TD: does this clamp remove index -1 otherwise boxes[0] selected very often?!
|
||||
boxes = boxes[index.clamp(min=0)] # negative index not supported
|
||||
boxes = change_box_order(boxes, 'xyxy2xywh')
|
||||
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
|
||||
|
||||
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
|
||||
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
|
||||
loc_targets = torch.cat([loc_xy, loc_wh], 1)
|
||||
|
||||
if self.create_bg_class:
|
||||
# TD: does this clamp remove index -1 otherwise labels[0] selected very often?!
|
||||
cls_targets = 1 + labels[index.clamp(min=0)]
|
||||
else:
|
||||
# if background class 0 already exists in labels
|
||||
cls_targets = labels[index.clamp(min=0)]
|
||||
|
||||
# ok here index -1 targets are set to zero anyways
|
||||
cls_targets[index < 0] = 0
|
||||
|
||||
# print('time spent encoding: {}'.format(time.time() - before_ts))
|
||||
|
||||
return loc_targets, cls_targets
|
||||
|
||||
def decode(self, loc_preds, cls_preds, score_thresh=0.6, nms_thresh=0.45):
|
||||
'''Decode predicted loc/cls back to real box locations and class labels.
|
||||
|
||||
Args:
|
||||
loc_preds: (tensor) predicted loc, sized [#anchors,4].
|
||||
cls_preds: (tensor) predicted conf, sized [#anchors,#classes].
|
||||
score_thresh: (float) threshold for object confidence score.
|
||||
nms_thresh: (float) threshold for box nms.
|
||||
|
||||
Returns:
|
||||
boxes: (tensor) bbox locations, sized [#obj,4].
|
||||
labels: (tensor) class labels, sized [#obj,].
|
||||
'''
|
||||
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
|
||||
xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
|
||||
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
|
||||
|
||||
boxes = []
|
||||
labels = []
|
||||
scores = []
|
||||
num_classes = cls_preds.size(1)
|
||||
if self.create_bg_class:
|
||||
for i in range(num_classes - 1):
|
||||
score = cls_preds[:, i + 1] # class i corresponds to (i+1) column
|
||||
mask = score > score_thresh
|
||||
if not mask.any():
|
||||
continue
|
||||
box = box_preds[mask]
|
||||
score = score[mask]
|
||||
# print(box.size())
|
||||
# print(score.size())
|
||||
|
||||
keep = box_nms(box, score, nms_thresh)
|
||||
boxes.append(box[keep])
|
||||
labels.append(torch.empty_like(keep).fill_(i))
|
||||
scores.append(score[keep])
|
||||
else:
|
||||
for i in range(1, num_classes):
|
||||
score = cls_preds[:, i] # class i corresponds to (i+1) column
|
||||
mask = score > score_thresh
|
||||
if not mask.any():
|
||||
continue
|
||||
box = box_preds[mask]
|
||||
score = score[mask]
|
||||
# print(box.size())
|
||||
# print(score.size())
|
||||
|
||||
keep = box_nms(box, score, nms_thresh)
|
||||
boxes.append(box[keep])
|
||||
labels.append(torch.empty_like(keep).fill_(i))
|
||||
scores.append(score[keep])
|
||||
|
||||
# concatenate if not empty
|
||||
if len(boxes) > 0:
|
||||
boxes = torch.cat(boxes, 0)
|
||||
labels = torch.cat(labels, 0)
|
||||
scores = torch.cat(scores, 0)
|
||||
return boxes, labels, scores
|
||||
|
||||
def decode_boxes(self, loc_preds):
|
||||
|
||||
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
|
||||
xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
|
||||
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
|
||||
|
||||
boxes = box_preds
|
||||
return boxes
|
||||
|
||||
|
||||
def test():
|
||||
box_coder = FPNSSDBoxCoder()
|
||||
print(box_coder.anchor_boxes.size())
|
||||
boxes = torch.tensor([[0, 0, 100, 100], [100, 100, 200, 200]], dtype=torch.float)
|
||||
labels = torch.tensor([0, 1], dtype=torch.long)
|
||||
loc_targets, cls_targets = box_coder.encode(boxes, labels)
|
||||
print(loc_targets.size(), cls_targets.size())
|
||||
|
||||
# test()
|
||||
@@ -0,0 +1,169 @@
|
||||
'''Encode object boxes and labels.'''
|
||||
import math
|
||||
import torch
|
||||
import numpy as np
|
||||
|
||||
from .meshgrid import meshgrid
|
||||
from .box import box_iou, box_nms, change_box_order
|
||||
|
||||
|
||||
class RetinaBoxCoder:
|
||||
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
|
||||
self.num_anchors = 12
|
||||
# self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.) # p3 -> p7
|
||||
# self.aspect_ratios = (1/2., 1/1., 2/1.)
|
||||
# self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
|
||||
self.with_64 = with_64
|
||||
if self.with_64:
|
||||
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
|
||||
else:
|
||||
self.anchor_areas = [128 * 128., 256 * 256.]
|
||||
if with_4_aspects:
|
||||
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
|
||||
if with_4_scales:
|
||||
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
|
||||
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
|
||||
self.input_size = torch.tensor(input_size).float()
|
||||
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
|
||||
|
||||
self.create_bg_class = create_bg_class
|
||||
|
||||
def _get_anchor_wh(self):
|
||||
'''Compute anchor width and height for each feature map.
|
||||
|
||||
Returns:
|
||||
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
|
||||
'''
|
||||
anchor_wh = []
|
||||
for s in self.anchor_areas:
|
||||
for ar in self.aspect_ratios: # w/h = ar
|
||||
h = math.sqrt(s / ar)
|
||||
w = ar * h
|
||||
for sr in self.scale_ratios: # scale
|
||||
anchor_h = h * sr
|
||||
anchor_w = w * sr
|
||||
anchor_wh.append([anchor_w, anchor_h])
|
||||
num_fms = len(self.anchor_areas)
|
||||
return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
|
||||
|
||||
def _get_anchor_boxes(self, input_size):
|
||||
'''Compute anchor boxes for each feature map.
|
||||
|
||||
Args:
|
||||
input_size: (tensor) model input size of (w,h).
|
||||
|
||||
Returns:
|
||||
boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
|
||||
where #anchors = fmw * fmh * #anchors_per_cell
|
||||
'''
|
||||
num_fms = len(self.anchor_areas)
|
||||
anchor_wh = self._get_anchor_wh()
|
||||
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
|
||||
if self.with_64: # num_fms == 3:
|
||||
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
|
||||
else: # num_fms == 2:
|
||||
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
|
||||
|
||||
boxes = []
|
||||
for i in range(num_fms):
|
||||
fm_size = fm_sizes[i]
|
||||
grid_size = input_size / fm_size
|
||||
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
|
||||
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
|
||||
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
|
||||
boxes.append(box.view(-1, 4))
|
||||
return torch.cat(boxes, 0)
|
||||
|
||||
def encode(self, boxes, labels):
|
||||
'''Encode target bounding boxes and class labels.
|
||||
|
||||
We obey the Faster RCNN box coder:
|
||||
tx = (x - anchor_x) / anchor_w
|
||||
ty = (y - anchor_y) / anchor_h
|
||||
tw = log(w / anchor_w)
|
||||
th = log(h / anchor_h)
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
|
||||
labels: (tensor) object class labels, sized [#obj,].
|
||||
|
||||
Returns:
|
||||
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
|
||||
cls_targets: (tensor) encoded class labels, sized [#anchors,].
|
||||
'''
|
||||
anchor_boxes = self.anchor_boxes
|
||||
ious = box_iou(anchor_boxes, boxes)
|
||||
max_ious, max_ids = ious.max(1)
|
||||
boxes = boxes[max_ids]
|
||||
|
||||
boxes = change_box_order(boxes, 'xyxy2xywh')
|
||||
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
|
||||
|
||||
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
|
||||
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
|
||||
loc_targets = torch.cat([loc_xy, loc_wh], 1)
|
||||
|
||||
if self.create_bg_class:
|
||||
cls_targets = 1 + labels[max_ids]
|
||||
else:
|
||||
# if background class 0 already exists in labels
|
||||
cls_targets = labels[max_ids]
|
||||
|
||||
cls_targets[max_ious < 0.5] = 0 # WATCH OUT HERE, this is just for testing!!
|
||||
# ignore = (max_ious > 0.4) & (max_ious < 0.5) # ignore ious between [0.4,0.5]
|
||||
# cls_targets[ignore] = -1 # mark ignored to -1
|
||||
return loc_targets, cls_targets
|
||||
|
||||
def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
|
||||
'''Decode outputs back to bouding box locations and class labels.
|
||||
|
||||
Args:
|
||||
loc_preds: (tensor) predicted locations, sized [#anchors, 4].
|
||||
cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
|
||||
input_size: (tuple) model input size of (w,h).
|
||||
|
||||
Returns:
|
||||
boxes: (tensor) decode box locations, sized [#obj,4].
|
||||
labels: (tensor) class labels for each box, sized [#obj,].
|
||||
'''
|
||||
CLS_THRESH = score_thresh
|
||||
NMS_THRESH = nms_thresh
|
||||
|
||||
input_size = torch.Tensor(input_size)
|
||||
# anchor_boxes = self._get_anchor_boxes(input_size) # xywh
|
||||
anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
|
||||
|
||||
loc_xy = loc_preds[:, :2]
|
||||
loc_wh = loc_preds[:, 2:]
|
||||
|
||||
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_wh.exp() * anchor_boxes[:, 2:]
|
||||
boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1) # [#anchors,4]
|
||||
|
||||
score, labels = cls_preds.sigmoid().max(1) # [#anchors,]
|
||||
ids = score > CLS_THRESH
|
||||
ids = ids.nonzero().squeeze() # [#obj,]
|
||||
keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
|
||||
return boxes[ids][keep], labels[ids][keep] # , score[ids][keep]
|
||||
|
||||
def decode_boxes(self, loc_preds):
|
||||
|
||||
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
|
||||
|
||||
loc_xy = loc_preds[:, :2]
|
||||
loc_wh = loc_preds[:, 2:]
|
||||
|
||||
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_wh.exp() * anchor_boxes[:, 2:]
|
||||
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
|
||||
|
||||
boxes = box_preds
|
||||
return boxes
|
||||
@@ -0,0 +1,178 @@
|
||||
'''Encode object boxes and labels.'''
|
||||
import math
|
||||
import torch
|
||||
import numpy as np
|
||||
|
||||
from .meshgrid import meshgrid
|
||||
from .box import box_iou, box_nms, change_box_order
|
||||
|
||||
|
||||
class RetinaBoxCoder:
|
||||
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
|
||||
self.num_anchors = 12
|
||||
# self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.) # p3 -> p7
|
||||
# self.aspect_ratios = (1/2., 1/1., 2/1.)
|
||||
# self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
|
||||
self.with_64 = with_64
|
||||
if self.with_64:
|
||||
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
|
||||
else:
|
||||
self.anchor_areas = [128 * 128., 256 * 256.]
|
||||
if with_4_aspects:
|
||||
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
|
||||
if with_4_scales:
|
||||
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
|
||||
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
|
||||
else:
|
||||
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
|
||||
|
||||
self.input_size = torch.tensor(input_size).float()
|
||||
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
|
||||
|
||||
self.create_bg_class = create_bg_class
|
||||
|
||||
def _get_anchor_wh(self):
|
||||
'''Compute anchor width and height for each feature map.
|
||||
|
||||
Returns:
|
||||
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
|
||||
'''
|
||||
anchor_wh = []
|
||||
for s in self.anchor_areas:
|
||||
for ar in self.aspect_ratios: # w/h = ar
|
||||
h = math.sqrt(s / ar)
|
||||
w = ar * h
|
||||
for sr in self.scale_ratios: # scale
|
||||
anchor_h = h * sr
|
||||
anchor_w = w * sr
|
||||
anchor_wh.append([anchor_w, anchor_h])
|
||||
num_fms = len(self.anchor_areas)
|
||||
return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
|
||||
|
||||
def _get_anchor_boxes(self, input_size):
|
||||
'''Compute anchor boxes for each feature map.
|
||||
|
||||
Args:
|
||||
input_size: (tensor) model input size of (w,h).
|
||||
|
||||
Returns:
|
||||
boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
|
||||
where #anchors = fmw * fmh * #anchors_per_cell
|
||||
'''
|
||||
num_fms = len(self.anchor_areas)
|
||||
anchor_wh = self._get_anchor_wh()
|
||||
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
|
||||
if self.with_64: # num_fms == 3:
|
||||
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
|
||||
else: # num_fms == 2:
|
||||
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
|
||||
|
||||
boxes = []
|
||||
for i in range(num_fms):
|
||||
fm_size = fm_sizes[i]
|
||||
grid_size = input_size / fm_size
|
||||
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
|
||||
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
|
||||
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
|
||||
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
|
||||
boxes.append(box.view(-1, 4))
|
||||
return torch.cat(boxes, 0)
|
||||
|
||||
def encode(self, boxes, labels, linemap):
|
||||
'''Encode target bounding boxes and class labels.
|
||||
|
||||
We obey the Faster RCNN box coder:
|
||||
tx = (x - anchor_x) / anchor_w
|
||||
ty = (y - anchor_y) / anchor_h
|
||||
tw = log(w / anchor_w)
|
||||
th = log(h / anchor_h)
|
||||
|
||||
Args:
|
||||
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
|
||||
labels: (tensor) object class labels, sized [#obj,].
|
||||
|
||||
Returns:
|
||||
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
|
||||
cls_targets: (tensor) encoded class labels, sized [#anchors,].
|
||||
'''
|
||||
anchor_boxes = self.anchor_boxes
|
||||
ious = box_iou(anchor_boxes, boxes)
|
||||
max_ious, max_ids = ious.max(1)
|
||||
boxes = boxes[max_ids]
|
||||
|
||||
# need to check if anchor_box center has positive linemap
|
||||
anchor_ctrs = torch.zeros((anchor_boxes.shape[0], 2)).int()
|
||||
anchor_ctrs[:, 0] = (anchor_boxes[:, 2] + anchor_boxes[:, 0]) / 2
|
||||
anchor_ctrs[:, 1] = (anchor_boxes[:, 3] + anchor_boxes[:, 1]) / 2
|
||||
linemap_val = np.asarray(linemap)[anchor_ctrs[:, 1], anchor_ctrs[:, 0]]
|
||||
|
||||
boxes = change_box_order(boxes, 'xyxy2xywh')
|
||||
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
|
||||
|
||||
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
|
||||
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
|
||||
loc_targets = torch.cat([loc_xy, loc_wh], 1)
|
||||
if self.create_bg_class:
|
||||
cls_targets = 1 + labels[max_ids]
|
||||
else:
|
||||
# if background class 0 already exists in labels
|
||||
cls_targets = labels[max_ids]
|
||||
|
||||
cls_targets[max_ious < 0.5] = 0 # WATCH OUT HERE, this is just for testing!!
|
||||
# ignore = (max_ious > 0.4) & (max_ious < 0.5) # ignore ious between [0.4,0.5]
|
||||
# cls_targets[ignore] = -1 # mark ignored to -1
|
||||
|
||||
# ignore if box centered on line detection and iou below 0.5
|
||||
ignore = torch.from_numpy(linemap_val.astype(np.uint8)) & (max_ious < 0.35) # 0.5
|
||||
cls_targets[ignore] = -1 # mark ignored to -1
|
||||
return loc_targets, cls_targets
|
||||
|
||||
def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
|
||||
'''Decode outputs back to bouding box locations and class labels.
|
||||
|
||||
Args:
|
||||
loc_preds: (tensor) predicted locations, sized [#anchors, 4].
|
||||
cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
|
||||
input_size: (tuple) model input size of (w,h).
|
||||
|
||||
Returns:
|
||||
boxes: (tensor) decode box locations, sized [#obj,4].
|
||||
labels: (tensor) class labels for each box, sized [#obj,].
|
||||
'''
|
||||
CLS_THRESH = score_thresh
|
||||
NMS_THRESH = nms_thresh
|
||||
|
||||
input_size = torch.Tensor(input_size)
|
||||
# anchor_boxes = self._get_anchor_boxes(input_size) # xywh
|
||||
anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
|
||||
|
||||
loc_xy = loc_preds[:, :2]
|
||||
loc_wh = loc_preds[:, 2:]
|
||||
|
||||
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_wh.exp() * anchor_boxes[:, 2:]
|
||||
boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1) # [#anchors,4]
|
||||
|
||||
score, labels = cls_preds.sigmoid().max(1) # [#anchors,]
|
||||
ids = score > CLS_THRESH
|
||||
ids = ids.nonzero().squeeze() # [#obj,]
|
||||
keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
|
||||
return boxes[ids][keep], labels[ids][keep] # , score[ids][keep]
|
||||
|
||||
def decode_boxes(self, loc_preds):
|
||||
|
||||
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
|
||||
|
||||
loc_xy = loc_preds[:, :2]
|
||||
loc_wh = loc_preds[:, 2:]
|
||||
|
||||
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
|
||||
wh = loc_wh.exp() * anchor_boxes[:, 2:]
|
||||
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
|
||||
|
||||
boxes = box_preds
|
||||
return boxes
|
||||
Arquivo executável
+358
@@ -0,0 +1,358 @@
|
||||
'''Compute PASCAL_VOC MAP.
|
||||
|
||||
Reference:
|
||||
https://github.com/chainer/chainercv/blob/master/chainercv/evaluations/eval_detection_voc.py
|
||||
'''
|
||||
from __future__ import division
|
||||
|
||||
import six
|
||||
import itertools
|
||||
import numpy as np
|
||||
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
def voc_eval(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
|
||||
gt_difficults=None, iou_thresh=0.5, use_07_metric=True):
|
||||
'''Wrap VOC evaluation for PyTorch.'''
|
||||
pred_bboxes = [xy2yx(b).numpy() for b in pred_bboxes]
|
||||
pred_labels = [label.numpy() for label in pred_labels]
|
||||
pred_scores = [score.numpy() for score in pred_scores]
|
||||
gt_bboxes = [xy2yx(b).numpy() for b in gt_bboxes]
|
||||
gt_labels = [label.numpy() for label in gt_labels]
|
||||
return eval_detection_voc(
|
||||
pred_bboxes, pred_labels, pred_scores, gt_bboxes,
|
||||
gt_labels, gt_difficults, iou_thresh, use_07_metric)
|
||||
|
||||
def xy2yx(boxes):
|
||||
'''Convert box (xmin,ymin,xmax,ymax) to (ymin,xmin,ymax,xmax).'''
|
||||
c0 = boxes[:,0].clone()
|
||||
c2 = boxes[:,2].clone()
|
||||
boxes[:,0] = boxes[:,1]
|
||||
boxes[:,1] = c0
|
||||
boxes[:,2] = boxes[:,3]
|
||||
boxes[:,3] = c2
|
||||
return boxes
|
||||
|
||||
def bbox_iou(bbox_a, bbox_b):
|
||||
'''Calculate the Intersection of Unions (IoUs) between bounding boxes.
|
||||
|
||||
Args:
|
||||
bbox_a (array): An array whose shape is :math:`(N, 4)`.
|
||||
:math:`N` is the number of bounding boxes.
|
||||
The dtype should be :obj:`numpy.float32`.
|
||||
bbox_b (array): An array similar to :obj:`bbox_a`,
|
||||
whose shape is :math:`(K, 4)`.
|
||||
The dtype should be :obj:`numpy.float32`.
|
||||
|
||||
Returns:
|
||||
array:
|
||||
An array whose shape is :math:`(N, K)`. \
|
||||
An element at index :math:`(n, k)` contains IoUs between \
|
||||
:math:`n` th bounding box in :obj:`bbox_a` and :math:`k` th bounding \
|
||||
box in :obj:`bbox_b`.
|
||||
'''
|
||||
# top left
|
||||
tl = np.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
|
||||
# bottom right
|
||||
br = np.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])
|
||||
|
||||
area_i = np.prod(br - tl, axis=2) * (tl < br).all(axis=2)
|
||||
area_a = np.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
|
||||
area_b = np.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
|
||||
return area_i / (area_a[:, None] + area_b - area_i)
|
||||
|
||||
def eval_detection_voc(
|
||||
pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
|
||||
gt_difficults=None,
|
||||
iou_thresh=0.5, use_07_metric=False):
|
||||
"""Calculate average precisions based on evaluation code of PASCAL VOC.
|
||||
|
||||
This function evaluates predicted bounding boxes obtained from a dataset
|
||||
which has :math:`N` images by using average precision for each class.
|
||||
The code is based on the evaluation code used in PASCAL VOC Challenge.
|
||||
|
||||
Args:
|
||||
pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
|
||||
sets of bounding boxes.
|
||||
Its index corresponds to an index for the base dataset.
|
||||
Each element of :obj:`pred_bboxes` is a set of coordinates
|
||||
of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
|
||||
where :math:`R` corresponds
|
||||
to the number of bounding boxes, which may vary among boxes.
|
||||
The second axis corresponds to
|
||||
:math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
|
||||
pred_labels (iterable of numpy.ndarray): An iterable of labels.
|
||||
Similar to :obj:`pred_bboxes`, its index corresponds to an
|
||||
index for the base dataset. Its length is :math:`N`.
|
||||
pred_scores (iterable of numpy.ndarray): An iterable of confidence
|
||||
scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
|
||||
its index corresponds to an index for the base dataset.
|
||||
Its length is :math:`N`.
|
||||
gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
|
||||
bounding boxes
|
||||
whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
|
||||
bounding box whose shape is :math:`(R, 4)`. Note that the number of
|
||||
bounding boxes in each image does not need to be same as the number
|
||||
of corresponding predicted boxes.
|
||||
gt_labels (iterable of numpy.ndarray): An iterable of ground truth
|
||||
labels which are organized similarly to :obj:`gt_bboxes`.
|
||||
gt_difficults (iterable of numpy.ndarray): An iterable of boolean
|
||||
arrays which is organized similarly to :obj:`gt_bboxes`.
|
||||
This tells whether the
|
||||
corresponding ground truth bounding box is difficult or not.
|
||||
By default, this is :obj:`None`. In that case, this function
|
||||
considers all bounding boxes to be not difficult.
|
||||
iou_thresh (float): A prediction is correct if its Intersection over
|
||||
Union with the ground truth is above this value.
|
||||
use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
|
||||
for calculating average precision. The default value is
|
||||
:obj:`False`.
|
||||
|
||||
Returns:
|
||||
dict:
|
||||
|
||||
The keys, value-types and the description of the values are listed
|
||||
below.
|
||||
|
||||
* **ap** (*numpy.ndarray*): An array of average precisions. \
|
||||
The :math:`l`-th value corresponds to the average precision \
|
||||
for class :math:`l`. If class :math:`l` does not exist in \
|
||||
either :obj:`pred_labels` or :obj:`gt_labels`, the corresponding \
|
||||
value is set to :obj:`numpy.nan`.
|
||||
* **map** (*float*): The average of Average Precisions over classes.
|
||||
|
||||
"""
|
||||
|
||||
prec, rec = calc_detection_voc_prec_rec(
|
||||
pred_bboxes, pred_labels, pred_scores,
|
||||
gt_bboxes, gt_labels, gt_difficults,
|
||||
iou_thresh=iou_thresh)
|
||||
|
||||
ap = calc_detection_voc_ap(prec, rec, use_07_metric=use_07_metric)
|
||||
|
||||
return {'ap': ap, 'map': np.nanmean(ap)}
|
||||
|
||||
|
||||
def calc_detection_voc_prec_rec(
|
||||
pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
|
||||
gt_difficults=None,
|
||||
iou_thresh=0.5):
|
||||
"""Calculate precision and recall based on evaluation code of PASCAL VOC.
|
||||
|
||||
This function calculates precision and recall of
|
||||
predicted bounding boxes obtained from a dataset which has :math:`N`
|
||||
images.
|
||||
The code is based on the evaluation code used in PASCAL VOC Challenge.
|
||||
|
||||
Args:
|
||||
pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
|
||||
sets of bounding boxes.
|
||||
Its index corresponds to an index for the base dataset.
|
||||
Each element of :obj:`pred_bboxes` is a set of coordinates
|
||||
of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
|
||||
where :math:`R` corresponds
|
||||
to the number of bounding boxes, which may vary among boxes.
|
||||
The second axis corresponds to
|
||||
:math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
|
||||
pred_labels (iterable of numpy.ndarray): An iterable of labels.
|
||||
Similar to :obj:`pred_bboxes`, its index corresponds to an
|
||||
index for the base dataset. Its length is :math:`N`.
|
||||
pred_scores (iterable of numpy.ndarray): An iterable of confidence
|
||||
scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
|
||||
its index corresponds to an index for the base dataset.
|
||||
Its length is :math:`N`.
|
||||
gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
|
||||
bounding boxes
|
||||
whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
|
||||
bounding box whose shape is :math:`(R, 4)`. Note that the number of
|
||||
bounding boxes in each image does not need to be same as the number
|
||||
of corresponding predicted boxes.
|
||||
gt_labels (iterable of numpy.ndarray): An iterable of ground truth
|
||||
labels which are organized similarly to :obj:`gt_bboxes`.
|
||||
gt_difficults (iterable of numpy.ndarray): An iterable of boolean
|
||||
arrays which is organized similarly to :obj:`gt_bboxes`.
|
||||
This tells whether the
|
||||
corresponding ground truth bounding box is difficult or not.
|
||||
By default, this is :obj:`None`. In that case, this function
|
||||
considers all bounding boxes to be not difficult.
|
||||
iou_thresh (float): A prediction is correct if its Intersection over
|
||||
Union with the ground truth is above this value..
|
||||
|
||||
Returns:
|
||||
tuple of two lists:
|
||||
This function returns two lists: :obj:`prec` and :obj:`rec`.
|
||||
|
||||
* :obj:`prec`: A list of arrays. :obj:`prec[l]` is precision \
|
||||
for class :math:`l`. If class :math:`l` does not exist in \
|
||||
either :obj:`pred_labels` or :obj:`gt_labels`, :obj:`prec[l]` is \
|
||||
set to :obj:`None`.
|
||||
* :obj:`rec`: A list of arrays. :obj:`rec[l]` is recall \
|
||||
for class :math:`l`. If class :math:`l` that is not marked as \
|
||||
difficult does not exist in \
|
||||
:obj:`gt_labels`, :obj:`rec[l]` is \
|
||||
set to :obj:`None`.
|
||||
|
||||
"""
|
||||
|
||||
pred_bboxes = iter(pred_bboxes)
|
||||
pred_labels = iter(pred_labels)
|
||||
pred_scores = iter(pred_scores)
|
||||
gt_bboxes = iter(gt_bboxes)
|
||||
gt_labels = iter(gt_labels)
|
||||
if gt_difficults is None:
|
||||
gt_difficults = itertools.repeat(None)
|
||||
else:
|
||||
gt_difficults = iter(gt_difficults)
|
||||
|
||||
n_pos = defaultdict(int)
|
||||
score = defaultdict(list)
|
||||
match = defaultdict(list)
|
||||
|
||||
for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \
|
||||
six.moves.zip(
|
||||
pred_bboxes, pred_labels, pred_scores,
|
||||
gt_bboxes, gt_labels, gt_difficults):
|
||||
|
||||
if gt_difficult is None:
|
||||
gt_difficult = np.zeros(gt_bbox.shape[0], dtype=bool)
|
||||
|
||||
for l in np.unique(np.concatenate((pred_label, gt_label)).astype(int)):
|
||||
pred_mask_l = pred_label == l
|
||||
pred_bbox_l = pred_bbox[pred_mask_l]
|
||||
pred_score_l = pred_score[pred_mask_l]
|
||||
# sort by score
|
||||
order = pred_score_l.argsort()[::-1]
|
||||
pred_bbox_l = pred_bbox_l[order]
|
||||
pred_score_l = pred_score_l[order]
|
||||
|
||||
gt_mask_l = gt_label == l
|
||||
gt_bbox_l = gt_bbox[gt_mask_l]
|
||||
gt_difficult_l = gt_difficult[gt_mask_l]
|
||||
|
||||
n_pos[l] += np.logical_not(gt_difficult_l).sum()
|
||||
score[l].extend(pred_score_l)
|
||||
|
||||
if len(pred_bbox_l) == 0:
|
||||
continue
|
||||
if len(gt_bbox_l) == 0:
|
||||
match[l].extend((0,) * pred_bbox_l.shape[0])
|
||||
continue
|
||||
|
||||
# VOC evaluation follows integer typed bounding boxes.
|
||||
pred_bbox_l = pred_bbox_l.copy()
|
||||
pred_bbox_l[:, 2:] += 1
|
||||
gt_bbox_l = gt_bbox_l.copy()
|
||||
gt_bbox_l[:, 2:] += 1
|
||||
|
||||
iou = bbox_iou(pred_bbox_l, gt_bbox_l)
|
||||
gt_index = iou.argmax(axis=1)
|
||||
# set -1 if there is no matching ground truth
|
||||
gt_index[iou.max(axis=1) < iou_thresh] = -1
|
||||
del iou
|
||||
|
||||
selec = np.zeros(gt_bbox_l.shape[0], dtype=bool)
|
||||
for gt_idx in gt_index:
|
||||
if gt_idx >= 0:
|
||||
if gt_difficult_l[gt_idx]:
|
||||
match[l].append(-1)
|
||||
else:
|
||||
if not selec[gt_idx]:
|
||||
match[l].append(1)
|
||||
else:
|
||||
match[l].append(0)
|
||||
selec[gt_idx] = True
|
||||
else:
|
||||
match[l].append(0)
|
||||
|
||||
for iter_ in (
|
||||
pred_bboxes, pred_labels, pred_scores,
|
||||
gt_bboxes, gt_labels, gt_difficults):
|
||||
if next(iter_, None) is not None:
|
||||
raise ValueError('Length of input iterables need to be same.')
|
||||
|
||||
n_fg_class = max(n_pos.keys()) + 1
|
||||
prec = [None] * n_fg_class
|
||||
rec = [None] * n_fg_class
|
||||
|
||||
for l in n_pos.keys():
|
||||
score_l = np.array(score[l])
|
||||
match_l = np.array(match[l], dtype=np.int8)
|
||||
|
||||
order = score_l.argsort()[::-1]
|
||||
match_l = match_l[order]
|
||||
|
||||
tp = np.cumsum(match_l == 1)
|
||||
fp = np.cumsum(match_l == 0)
|
||||
|
||||
# If an element of fp + tp is 0,
|
||||
# the corresponding element of prec[l] is nan.
|
||||
prec[l] = tp / (fp + tp)
|
||||
# If n_pos[l] is 0, rec[l] is None.
|
||||
if n_pos[l] > 0:
|
||||
rec[l] = tp / n_pos[l]
|
||||
|
||||
return prec, rec
|
||||
|
||||
|
||||
def calc_detection_voc_ap(prec, rec, use_07_metric=False):
|
||||
"""Calculate average precisions based on evaluation code of PASCAL VOC.
|
||||
|
||||
This function calculates average precisions
|
||||
from given precisions and recalls.
|
||||
The code is based on the evaluation code used in PASCAL VOC Challenge.
|
||||
|
||||
Args:
|
||||
prec (list of numpy.array): A list of arrays.
|
||||
:obj:`prec[l]` indicates precision for class :math:`l`.
|
||||
If :obj:`prec[l]` is :obj:`None`, this function returns
|
||||
:obj:`numpy.nan` for class :math:`l`.
|
||||
rec (list of numpy.array): A list of arrays.
|
||||
:obj:`rec[l]` indicates recall for class :math:`l`.
|
||||
If :obj:`rec[l]` is :obj:`None`, this function returns
|
||||
:obj:`numpy.nan` for class :math:`l`.
|
||||
use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
|
||||
for calculating average precision. The default value is
|
||||
:obj:`False`.
|
||||
|
||||
Returns:
|
||||
~numpy.ndarray:
|
||||
This function returns an array of average precisions.
|
||||
The :math:`l`-th value corresponds to the average precision
|
||||
for class :math:`l`. If :obj:`prec[l]` or :obj:`rec[l]` is
|
||||
:obj:`None`, the corresponding value is set to :obj:`numpy.nan`.
|
||||
|
||||
"""
|
||||
|
||||
n_fg_class = len(prec)
|
||||
ap = np.empty(n_fg_class)
|
||||
for l in six.moves.range(n_fg_class):
|
||||
if prec[l] is None or rec[l] is None:
|
||||
ap[l] = np.nan
|
||||
continue
|
||||
|
||||
if use_07_metric:
|
||||
# 11 point metric
|
||||
ap[l] = 0
|
||||
for t in np.arange(0., 1.1, 0.1):
|
||||
if np.sum(rec[l] >= t) == 0:
|
||||
p = 0
|
||||
else:
|
||||
p = np.max(np.nan_to_num(prec[l])[rec[l] >= t])
|
||||
ap[l] += p / 11
|
||||
else:
|
||||
# correct AP calculation
|
||||
# first append sentinel values at the end
|
||||
mpre = np.concatenate(([0], np.nan_to_num(prec[l]), [0]))
|
||||
mrec = np.concatenate(([0], rec[l], [1]))
|
||||
|
||||
mpre = np.maximum.accumulate(mpre[::-1])[::-1]
|
||||
|
||||
# to calculate area under PR curve, look for points
|
||||
# where X axis (recall) changes value
|
||||
i = np.where(mrec[1:] != mrec[:-1])[0]
|
||||
|
||||
# and sum (\Delta recall) * prec
|
||||
ap[l] = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
|
||||
|
||||
return ap
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,84 @@
|
||||
from __future__ import print_function
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
from torch.autograd import Variable
|
||||
from ..one_hot_embedding import one_hot_embedding
|
||||
|
||||
|
||||
class FocalLoss(nn.Module):
|
||||
def __init__(self, num_classes):
|
||||
super(FocalLoss, self).__init__()
|
||||
self.num_classes = num_classes
|
||||
|
||||
def _focal_loss(self, x, y):
|
||||
'''Focal loss.
|
||||
|
||||
This is described in the original paper.
|
||||
With BCELoss, the background should not be counted in num_classes.
|
||||
|
||||
Args:
|
||||
x: (tensor) predictions, sized [N,D].
|
||||
y: (tensor) targets, sized [N,].
|
||||
|
||||
Return:
|
||||
(tensor) focal loss.
|
||||
'''
|
||||
alpha = 0.25 # balance param
|
||||
gamma = 2 # focus param
|
||||
size_average = False
|
||||
|
||||
t = one_hot_embedding(y, self.num_classes) # y-1
|
||||
p = x.sigmoid()
|
||||
pt = torch.where(t > 0, p, 1 - p) # pt = p if t > 0 else 1-p
|
||||
w = (1 - pt).pow(gamma)
|
||||
w = torch.where(t > 0, alpha * w, (1 - alpha) * w)
|
||||
loss = F.binary_cross_entropy_with_logits(x, t, w, size_average=size_average)
|
||||
|
||||
# according to https://github.com/c0nn3r/RetinaNet/blob/master/focal_loss.py
|
||||
# logpt = - F.cross_entropy(x, y)
|
||||
# pt = torch.exp(logpt)
|
||||
# focal_loss = -((1 - pt) ** gamma) * logpt
|
||||
# loss = alpha * focal_loss
|
||||
# averaging (or not) loss
|
||||
# if size_average:
|
||||
# loss = loss.mean()
|
||||
# else:
|
||||
# loss = loss.sum()
|
||||
return loss
|
||||
|
||||
def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
|
||||
'''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
|
||||
|
||||
Args:
|
||||
loc_preds: (tensor) predicted locations, sized [batch_size, #anchors, 4].
|
||||
loc_targets: (tensor) encoded target locations, sized [batch_size, #anchors, 4].
|
||||
cls_preds: (tensor) predicted class confidences, sized [batch_size, #anchors, #classes].
|
||||
cls_targets: (tensor) encoded target labels, sized [batch_size, #anchors].
|
||||
|
||||
loss:
|
||||
(tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + FocalLoss(cls_preds, cls_targets).
|
||||
'''
|
||||
batch_size, num_boxes = cls_targets.size()
|
||||
pos = cls_targets > 0 # [N,#anchors]
|
||||
num_pos = pos.sum().item()
|
||||
|
||||
# ===============================================================
|
||||
# loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
|
||||
# ===============================================================
|
||||
mask = pos.unsqueeze(2).expand_as(loc_preds) # [N,#anchors,4]
|
||||
loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
|
||||
|
||||
# ===============================================================
|
||||
# cls_loss = FocalLoss(cls_preds, cls_targets)
|
||||
# ===============================================================
|
||||
pos_neg = cls_targets > -1 # exclude ignored anchors
|
||||
mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
|
||||
masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
|
||||
cls_loss = self._focal_loss(masked_cls_preds, cls_targets[pos_neg])
|
||||
|
||||
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item() / num_pos, cls_loss.item() / num_pos), end=' | ')
|
||||
loss = (loc_loss + cls_loss) / num_pos
|
||||
return loss
|
||||
@@ -0,0 +1,71 @@
|
||||
from __future__ import print_function
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
|
||||
class SSDLoss(nn.Module):
|
||||
def __init__(self, num_classes):
|
||||
super(SSDLoss, self).__init__()
|
||||
self.num_classes = num_classes
|
||||
|
||||
def _hard_negative_mining(self, cls_loss, pos):
|
||||
'''Return negative indices that is 3x the number as positive indices.
|
||||
|
||||
Args:
|
||||
cls_loss: (tensor) cross entroy loss between cls_preds and cls_targets, sized [N,#anchors].
|
||||
pos: (tensor) positive class mask, sized [N,#anchors].
|
||||
|
||||
Return:
|
||||
(tensor) negative indices, sized [N,#anchors].
|
||||
'''
|
||||
cls_loss = cls_loss * (pos.float() - 1)
|
||||
|
||||
_, idx = cls_loss.sort(1) # sort by negative losses
|
||||
_, rank = idx.sort(1) # [N,#anchors]
|
||||
|
||||
num_neg = 3*pos.sum(1) # [N,]
|
||||
neg = rank < num_neg[:,None] # [N,#anchors]
|
||||
return neg
|
||||
|
||||
def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
|
||||
'''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
|
||||
|
||||
Args:
|
||||
loc_preds: (tensor) predicted locations, sized [N, #anchors, 4].
|
||||
loc_targets: (tensor) encoded target locations, sized [N, #anchors, 4].
|
||||
cls_preds: (tensor) predicted class confidences, sized [N, #anchors, #classes].
|
||||
cls_targets: (tensor) encoded target labels, sized [N, #anchors].
|
||||
|
||||
loss:
|
||||
(tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + CrossEntropyLoss(cls_preds, cls_targets).
|
||||
'''
|
||||
pos = cls_targets > 0 # [N,#anchors]
|
||||
batch_size = pos.size(0)
|
||||
num_pos = pos.sum().item()
|
||||
|
||||
#===============================================================
|
||||
# loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
|
||||
#===============================================================
|
||||
mask = pos.unsqueeze(2).expand_as(loc_preds) # [N,#anchors,4]
|
||||
loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
|
||||
|
||||
#===============================================================
|
||||
# cls_loss = CrossEntropyLoss(cls_preds, cls_targets)
|
||||
#===============================================================
|
||||
# TD: added clamp, because cross entropy does not handle negative indices well
|
||||
cls_loss = F.cross_entropy(cls_preds.view(-1,self.num_classes), \
|
||||
cls_targets.clamp(min=0).view(-1), reduce=False) # [N*#anchors,]
|
||||
cls_loss = cls_loss.view(batch_size, -1)
|
||||
cls_loss[cls_targets < 0] = 0 # set ignored loss to 0
|
||||
|
||||
neg = self._hard_negative_mining(cls_loss, pos) # [N,#anchors]
|
||||
cls_loss = cls_loss[pos|neg].sum()
|
||||
if num_pos > 0: # TD mod to prevent div by zero
|
||||
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
|
||||
loss = (loc_loss+cls_loss)/num_pos
|
||||
else:
|
||||
print('num_pos zero exception')
|
||||
loss = (loc_loss+cls_loss)/1.
|
||||
return loss
|
||||
@@ -0,0 +1,38 @@
|
||||
import torch
|
||||
|
||||
|
||||
def meshgrid(x, y, row_major=True):
|
||||
'''Return meshgrid in range x & y.
|
||||
|
||||
Args:
|
||||
x: (int) first dim range.
|
||||
y: (int) second dim range.
|
||||
row_major: (bool) row major or column major.
|
||||
|
||||
Returns:
|
||||
(tensor) meshgrid, sized [x*y,2]
|
||||
|
||||
Example:
|
||||
>> meshgrid(3,2)
|
||||
0 0
|
||||
1 0
|
||||
2 0
|
||||
0 1
|
||||
1 1
|
||||
2 1
|
||||
[torch.FloatTensor of size 6x2]
|
||||
|
||||
>> meshgrid(3,2,row_major=False)
|
||||
0 0
|
||||
0 1
|
||||
0 2
|
||||
1 0
|
||||
1 1
|
||||
1 2
|
||||
[torch.FloatTensor of size 6x2]
|
||||
'''
|
||||
a = torch.arange(0, x, dtype=torch.float) # TD: make it float (0.4.1)
|
||||
b = torch.arange(0, y, dtype=torch.float) # TD: make it float (0.4.1)
|
||||
xx = a.repeat(y).view(-1, 1)
|
||||
yy = b.view(-1, 1).repeat(1, x).view(-1, 1)
|
||||
return torch.cat([xx, yy], 1) if row_major else torch.cat([yy, xx], 1)
|
||||
@@ -0,0 +1,53 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
# this should import a specific architecture for cuneiform sign detection
|
||||
|
||||
|
||||
class FPNSSD(nn.Module):
|
||||
num_anchors = 12
|
||||
|
||||
def __init__(self, fpn_model, num_classes):
|
||||
super(FPNSSD, self).__init__()
|
||||
self.fpn = fpn_model
|
||||
self.num_classes = num_classes
|
||||
self.loc_head = self._make_head(self.num_anchors * 4)
|
||||
self.cls_head = self._make_head(self.num_anchors * self.num_classes)
|
||||
|
||||
def forward(self, x):
|
||||
loc_preds = []
|
||||
cls_preds = []
|
||||
fms = self.fpn(x)
|
||||
for fm in fms:
|
||||
loc_pred = self.loc_head(fm)
|
||||
cls_pred = self.cls_head(fm)
|
||||
loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
|
||||
4) # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
|
||||
cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
|
||||
self.num_classes) # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
|
||||
loc_preds.append(loc_pred)
|
||||
cls_preds.append(cls_pred)
|
||||
return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1)
|
||||
|
||||
def _make_head(self, out_planes):
|
||||
layers = []
|
||||
for _ in range(4):
|
||||
layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
|
||||
layers.append(nn.ReLU(True))
|
||||
layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def freeze_bn(self):
|
||||
'''Freeze BatchNorm layers.'''
|
||||
for layer in self.modules():
|
||||
if isinstance(layer, nn.BatchNorm2d):
|
||||
layer.eval()
|
||||
|
||||
|
||||
# def test():
|
||||
# net = FPNSSD(21)
|
||||
# loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
|
||||
# print(loc_preds.size(), cls_preds.size())
|
||||
|
||||
# test()
|
||||
@@ -0,0 +1,54 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
# this should import a specific architecture for cuneiform sign detection
|
||||
|
||||
|
||||
class RPN(nn.Module):
|
||||
num_anchors = 12
|
||||
|
||||
def __init__(self, fpn_model, num_classes, with_64):
|
||||
super(RPN, self).__init__()
|
||||
self.fpn = fpn_model
|
||||
self.num_classes = num_classes
|
||||
self.with_p4 = int(with_64)
|
||||
self.loc_head = self._make_head(self.num_anchors * 4)
|
||||
self.cls_head = self._make_head(self.num_anchors * self.num_classes)
|
||||
|
||||
def forward(self, x):
|
||||
loc_preds = []
|
||||
cls_preds = []
|
||||
fms = self.fpn(x)
|
||||
for fm in fms:
|
||||
loc_pred = self.loc_head(fm)
|
||||
cls_pred = self.cls_head(fm)
|
||||
loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
|
||||
4) # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
|
||||
cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
|
||||
self.num_classes) # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
|
||||
loc_preds.append(loc_pred)
|
||||
cls_preds.append(cls_pred)
|
||||
return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1), fms[self.with_p4]
|
||||
|
||||
def _make_head(self, out_planes):
|
||||
layers = []
|
||||
for _ in range(4):
|
||||
layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
|
||||
layers.append(nn.ReLU(True))
|
||||
layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def freeze_bn(self):
|
||||
'''Freeze BatchNorm layers.'''
|
||||
for layer in self.modules():
|
||||
if isinstance(layer, nn.BatchNorm2d):
|
||||
layer.eval()
|
||||
|
||||
|
||||
# def test():
|
||||
# net = FPNSSD(21)
|
||||
# loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
|
||||
# print(loc_preds.size(), cls_preds.size())
|
||||
|
||||
# test()
|
||||
@@ -0,0 +1,15 @@
|
||||
import torch
|
||||
|
||||
|
||||
def one_hot_embedding(labels, num_classes):
|
||||
'''Embedding labels to one-hot.
|
||||
|
||||
Args:
|
||||
labels: (LongTensor) class labels, sized [N,].
|
||||
num_classes: (int) number of classes.
|
||||
|
||||
Returns:
|
||||
(tensor) encoded labels, sized [N,#classes].
|
||||
'''
|
||||
y = torch.eye(num_classes, device=labels.device) # [D,D]
|
||||
return y[labels] # [N,D]
|
||||
@@ -0,0 +1,22 @@
|
||||
import torch
|
||||
|
||||
|
||||
def center_crop(img, boxes, size):
|
||||
'''Crops the given PIL Image at the center.
|
||||
Args:
|
||||
img: (PIL.Image) image to be cropped.
|
||||
boxes: (tensor) object boxes, sized [#ojb,4].
|
||||
size (tuple): desired output size of (w,h).
|
||||
Returns:
|
||||
img: (PIL.Image) center cropped image.
|
||||
boxes: (tensor) center cropped boxes.
|
||||
'''
|
||||
w, h = img.size
|
||||
ow, oh = size
|
||||
i = int(round((h - oh) / 2.))
|
||||
j = int(round((w - ow) / 2.))
|
||||
img = img.crop((j, i, j + ow, i + oh))
|
||||
boxes -= torch.Tensor([j, i, j, i])
|
||||
boxes[:, 0::2].clamp(min=0, max=ow - 1)
|
||||
boxes[:, 1::2].clamp(min=0, max=oh - 1)
|
||||
return img, boxes
|
||||
@@ -0,0 +1,25 @@
|
||||
import math
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
from ..box import box_iou, box_clamp
|
||||
|
||||
|
||||
def crop_box(img, boxes, labels, box):
|
||||
x, y, x2, y2 = box
|
||||
w = x2 - x
|
||||
h = y2 - y
|
||||
img = img.crop((x, y, x2, y2))
|
||||
# check if center is still inside tile_box, otherwise ignore box
|
||||
# (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
|
||||
center = (boxes[:, :2] + boxes[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
|
||||
if mask.any():
|
||||
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
|
||||
boxes = box_clamp(boxes, 0, 0, w, h)
|
||||
labels = labels[mask]
|
||||
else:
|
||||
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
|
||||
labels = torch.tensor([0], dtype=torch.long)
|
||||
return img, boxes, labels
|
||||
@@ -0,0 +1,23 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def pad(img, target_size):
|
||||
'''Pad image with zeros to the specified size.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be padded.
|
||||
target_size: (tuple) target size of (ow,oh).
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) padded image.
|
||||
|
||||
Reference:
|
||||
`tf.image.pad_to_bounding_box`
|
||||
'''
|
||||
w, h = img.size
|
||||
canvas = Image.new('RGB', target_size)
|
||||
canvas.paste(img, (0,0)) # paste on the left-up corner
|
||||
return canvas
|
||||
@@ -0,0 +1,23 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def pad(img, target_size):
|
||||
'''Pad image with zeros to the specified size.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be padded.
|
||||
target_size: (tuple) target size of (ow,oh).
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) padded image.
|
||||
|
||||
Reference:
|
||||
`tf.image.pad_to_bounding_box`
|
||||
'''
|
||||
w, h = img.size
|
||||
canvas = Image.new('L', target_size)
|
||||
canvas.paste(img, (0, 0)) # paste on the left-up corner
|
||||
return canvas
|
||||
@@ -0,0 +1,64 @@
|
||||
'''This random crop strategy is described in paper:
|
||||
[1] SSD: Single Shot MultiBox Detector
|
||||
'''
|
||||
import math
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
# from torchcv.utils.box import box_iou, box_clamp
|
||||
from ..box import box_iou, box_clamp
|
||||
|
||||
|
||||
def random_crop(
|
||||
img, boxes, labels,
|
||||
min_scale=0.3,
|
||||
max_aspect_ratio=2.):
|
||||
'''Randomly crop a PIL image.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image.
|
||||
boxes: (tensor) bounding boxes, sized [#obj, 4].
|
||||
labels: (tensor) bounding box labels, sized [#obj,].
|
||||
min_scale: (float) minimal image width/height scale.
|
||||
max_aspect_ratio: (float) maximum width/height aspect ratio.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) cropped image.
|
||||
boxes: (tensor) object boxes.
|
||||
labels: (tensor) object labels.
|
||||
'''
|
||||
imw, imh = img.size
|
||||
params = [(0, 0, imw, imh)] # crop roi (x,y,w,h) out
|
||||
for min_iou in (0, 0.1, 0.3, 0.5, 0.7, 0.9):
|
||||
for _ in range(100):
|
||||
scale = random.uniform(min_scale, 1)
|
||||
aspect_ratio = random.uniform(
|
||||
max(1 / max_aspect_ratio, scale * scale),
|
||||
min(max_aspect_ratio, 1 / (scale * scale)))
|
||||
w = int(imw * scale * math.sqrt(aspect_ratio))
|
||||
h = int(imh * scale / math.sqrt(aspect_ratio))
|
||||
|
||||
x = random.randrange(imw - w)
|
||||
y = random.randrange(imh - h)
|
||||
|
||||
roi = torch.tensor([[x, y, x + w, y + h]], dtype=torch.float)
|
||||
ious = box_iou(boxes, roi)
|
||||
if ious.min() >= min_iou:
|
||||
params.append((x, y, w, h))
|
||||
break
|
||||
|
||||
x, y, w, h = random.choice(params)
|
||||
img = img.crop((x, y, x + w, y + h))
|
||||
|
||||
center = (boxes[:, :2] + boxes[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
|
||||
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
|
||||
if mask.any():
|
||||
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
|
||||
boxes = box_clamp(boxes, 0, 0, w, h)
|
||||
labels = labels[mask]
|
||||
else:
|
||||
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
|
||||
labels = torch.tensor([0], dtype=torch.long)
|
||||
return img, boxes, labels
|
||||
@@ -0,0 +1,55 @@
|
||||
'''This random crop strategy is described in paper:
|
||||
[1] SSD: Single Shot MultiBox Detector
|
||||
'''
|
||||
import math
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
# from torchcv.utils.box import box_iou, box_clamp
|
||||
from ..box import box_iou, box_clamp
|
||||
|
||||
|
||||
def random_crop_tile(
|
||||
img, boxes, labels,
|
||||
scale_range=[0.8, 1],
|
||||
max_aspect_ratio=2.):
|
||||
'''Randomly crop a PIL image.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image.
|
||||
boxes: (tensor) bounding boxes, sized [#obj, 4].
|
||||
labels: (tensor) bounding box labels, sized [#obj,].
|
||||
scale_range: [float,float] minimal image width/height scale.
|
||||
max_aspect_ratio: (float) maximum width/height aspect ratio.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) cropped image.
|
||||
boxes: (tensor) object boxes.
|
||||
labels: (tensor) object labels.
|
||||
'''
|
||||
imw, imh = img.size
|
||||
|
||||
scale = random.uniform(scale_range[0], scale_range[1])
|
||||
aspect_ratio = random.uniform(
|
||||
max(1 / max_aspect_ratio, scale * scale),
|
||||
min(max_aspect_ratio, 1 / (scale * scale)))
|
||||
w = int(imw * scale * math.sqrt(aspect_ratio))
|
||||
h = int(imh * scale / math.sqrt(aspect_ratio))
|
||||
|
||||
x = random.randrange(imw - w)
|
||||
y = random.randrange(imh - h)
|
||||
|
||||
img = img.crop((x, y, x + w, y + h))
|
||||
|
||||
center = (boxes[:, :2] + boxes[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
|
||||
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
|
||||
if mask.any():
|
||||
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
|
||||
boxes = box_clamp(boxes, 0, 0, w, h)
|
||||
labels = labels[mask]
|
||||
else:
|
||||
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
|
||||
labels = torch.tensor([0], dtype=torch.long)
|
||||
return img, boxes, labels
|
||||
@@ -0,0 +1,56 @@
|
||||
import torch
|
||||
import random
|
||||
import torchvision.transforms as transforms
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def random_distort(
|
||||
img,
|
||||
brightness_delta=32 / 255.,
|
||||
contrast_delta=0.5,
|
||||
saturation_delta=0.5,
|
||||
hue_delta=0.1):
|
||||
'''A color related data augmentation used in SSD.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be color augmented.
|
||||
brightness_delta: (float) shift of brightness, range from [1-delta,1+delta].
|
||||
contrast_delta: (float) shift of contrast, range from [1-delta,1+delta].
|
||||
saturation_delta: (float) shift of saturation, range from [1-delta,1+delta].
|
||||
hue_delta: (float) shift of hue, range from [-delta,delta].
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) color augmented image.
|
||||
'''
|
||||
|
||||
def brightness(img, delta):
|
||||
if random.random() < 0.5:
|
||||
img = transforms.ColorJitter(brightness=delta)(img)
|
||||
return img
|
||||
|
||||
def contrast(img, delta):
|
||||
if random.random() < 0.5:
|
||||
img = transforms.ColorJitter(contrast=delta)(img)
|
||||
return img
|
||||
|
||||
def saturation(img, delta):
|
||||
if random.random() < 0.5:
|
||||
img = transforms.ColorJitter(saturation=delta)(img)
|
||||
return img
|
||||
|
||||
def hue(img, delta):
|
||||
if random.random() < 0.5:
|
||||
img = transforms.ColorJitter(hue=delta)(img)
|
||||
return img
|
||||
|
||||
img = brightness(img, brightness_delta)
|
||||
if random.random() < 0.5:
|
||||
img = contrast(img, contrast_delta)
|
||||
img = saturation(img, saturation_delta)
|
||||
img = hue(img, hue_delta)
|
||||
else:
|
||||
img = saturation(img, saturation_delta)
|
||||
img = hue(img, hue_delta)
|
||||
img = contrast(img, contrast_delta)
|
||||
return img
|
||||
@@ -0,0 +1,28 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def random_flip(img, boxes):
|
||||
'''Randomly flip PIL image.
|
||||
|
||||
If boxes is not None, flip boxes accordingly.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be flipped.
|
||||
boxes: (tensor) object boxes, sized [#obj,4].
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) randomly flipped image.
|
||||
boxes: (tensor) randomly flipped boxes.
|
||||
'''
|
||||
if random.random() < 0.5:
|
||||
img = img.transpose(Image.FLIP_LEFT_RIGHT)
|
||||
w = img.width
|
||||
if boxes is not None:
|
||||
xmin = w - boxes[:,2]
|
||||
xmax = w - boxes[:,0]
|
||||
boxes[:,0] = xmin
|
||||
boxes[:,2] = xmax
|
||||
return img, boxes
|
||||
@@ -0,0 +1,33 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def random_paste(img, boxes, max_ratio=4, fill=0):
|
||||
'''Randomly paste the input image on a larger canvas.
|
||||
|
||||
If boxes is not None, adjust boxes accordingly.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be flipped.
|
||||
boxes: (tensor) object boxes, sized [#obj,4].
|
||||
max_ratio: (int) maximum ratio of expansion.
|
||||
fill: (tuple) the RGB value to fill the canvas.
|
||||
|
||||
Returns:
|
||||
canvas: (PIL.Image) canvas with image pasted.
|
||||
boxes: (tensor) adjusted object boxes.
|
||||
'''
|
||||
w, h = img.size
|
||||
ratio = random.uniform(1, max_ratio)
|
||||
ow, oh = int(w*ratio), int(h*ratio)
|
||||
canvas = Image.new('RGB', (ow,oh), fill)
|
||||
|
||||
x = random.randint(0, ow - w)
|
||||
y = random.randint(0, oh - h)
|
||||
canvas.paste(img, (x,y))
|
||||
|
||||
if boxes is not None:
|
||||
boxes = boxes + torch.tensor([x,y,x,y], dtype=torch.float)
|
||||
return canvas, boxes
|
||||
@@ -0,0 +1,60 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def resize(img, boxes, size, max_size=1000, scale=None, random_interpolation=False):
|
||||
'''Resize the input PIL image to given size.
|
||||
|
||||
If boxes is not None, resize boxes accordingly.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be resized.
|
||||
boxes: (tensor) object boxes, sized [#obj,4].
|
||||
size: (tuple or int)
|
||||
- if is tuple, resize image to the size.
|
||||
- if is int, resize the shorter side to the size while maintaining the aspect ratio.
|
||||
max_size: (int) when size is int, limit the image longer size to max_size.
|
||||
This is essential to limit the usage of GPU memory.
|
||||
random_interpolation: (bool) randomly choose a resize interpolation method.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) resized image.
|
||||
boxes: (tensor) resized boxes.
|
||||
|
||||
Example:
|
||||
>> img, boxes = resize(img, boxes, 600) # resize shorter side to 600
|
||||
>> img, boxes = resize(img, boxes, (500,600)) # resize image size to (500,600)
|
||||
>> img, _ = resize(img, None, (500,600)) # resize image only
|
||||
'''
|
||||
w, h = img.size
|
||||
if scale is None:
|
||||
if isinstance(size, int):
|
||||
size_min = min(w,h)
|
||||
size_max = max(w,h)
|
||||
sw = sh = float(size) / size_min
|
||||
if sw * size_max > max_size:
|
||||
sw = sh = float(max_size) / size_max
|
||||
ow = int(w * sw + 0.5)
|
||||
oh = int(h * sh + 0.5)
|
||||
else:
|
||||
ow, oh = size
|
||||
sw = float(ow) / w
|
||||
sh = float(oh) / h
|
||||
else:
|
||||
ow = int(w * scale)
|
||||
oh = int(h * scale)
|
||||
sw, sh = scale, scale
|
||||
|
||||
method = random.choice([
|
||||
Image.BOX,
|
||||
Image.NEAREST,
|
||||
Image.HAMMING,
|
||||
Image.BICUBIC,
|
||||
Image.LANCZOS,
|
||||
Image.BILINEAR]) if random_interpolation else Image.BILINEAR
|
||||
img = img.resize((ow,oh), method)
|
||||
if boxes is not None:
|
||||
boxes = boxes * torch.tensor([sw,sh,sw,sh])
|
||||
return img, boxes
|
||||
@@ -0,0 +1,36 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def scale_jitter(img, boxes, sizes, max_size=1400):
|
||||
'''Randomly scale image shorter side to one of the sizes.
|
||||
|
||||
If boxes is not None, resize boxes accordingly.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be resized.
|
||||
boxes: (tensor) object boxes, sized [#obj,4].
|
||||
sizes: (tuple) scale sizes.
|
||||
max_size: (int) limit the image longer size to max_size.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) resized image.
|
||||
boxes: (tensor) resized boxes.
|
||||
'''
|
||||
w, h = img.size
|
||||
size_min = min(w,h)
|
||||
size_max = max(w,h)
|
||||
size = random.choice(sizes)
|
||||
sw = sh = float(size) / size_min
|
||||
if sw * size_max > max_size:
|
||||
sw = sh = float(max_size) / size_max
|
||||
|
||||
ow = int(w * sw + 0.5)
|
||||
oh = int(h * sh + 0.5)
|
||||
img = img.resize((ow,oh), Image.BILINEAR)
|
||||
|
||||
if boxes is not None:
|
||||
boxes = boxes * torch.tensor([sw,sh,sw,sh])
|
||||
return img, boxes
|
||||
@@ -0,0 +1,27 @@
|
||||
import math
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
from ..box import box_iou, box_clamp
|
||||
|
||||
|
||||
def crop_box_lm(img, boxes, labels, linemap, box):
|
||||
x, y, x2, y2 = box
|
||||
w = x2 - x
|
||||
h = y2 - y
|
||||
img = img.crop((x, y, x2, y2))
|
||||
linemap = linemap.crop((x, y, x2, y2))
|
||||
|
||||
# check if center is still inside tile_box, otherwise ignore box
|
||||
# (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
|
||||
center = (boxes[:, :2] + boxes[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
|
||||
if mask.any():
|
||||
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
|
||||
boxes = box_clamp(boxes, 0, 0, w, h)
|
||||
labels = labels[mask]
|
||||
else:
|
||||
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
|
||||
labels = torch.tensor([0], dtype=torch.long)
|
||||
return img, boxes, labels, linemap
|
||||
@@ -0,0 +1,26 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def pad_lm(img, linemap, target_size):
|
||||
'''Pad image with zeros to the specified size.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be padded.
|
||||
target_size: (tuple) target size of (ow,oh).
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) padded image.
|
||||
|
||||
Reference:
|
||||
`tf.image.pad_to_bounding_box`
|
||||
'''
|
||||
w, h = img.size
|
||||
canvas = Image.new('L', target_size)
|
||||
canvas.paste(img, (0, 0)) # paste on the left-up corner
|
||||
|
||||
canvas_line = Image.new('1', target_size)
|
||||
canvas_line.paste(linemap, (0, 0)) # paste on the left-up corner
|
||||
return canvas, canvas_line
|
||||
@@ -0,0 +1,56 @@
|
||||
'''This random crop strategy is described in paper:
|
||||
[1] SSD: Single Shot MultiBox Detector
|
||||
'''
|
||||
import math
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
# from torchcv.utils.box import box_iou, box_clamp
|
||||
from ..box import box_iou, box_clamp
|
||||
|
||||
|
||||
def random_crop_tile_lm(
|
||||
img, boxes, labels, linemap,
|
||||
scale_range=[0.8, 1],
|
||||
max_aspect_ratio=2.):
|
||||
'''Randomly crop a PIL image.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image.
|
||||
boxes: (tensor) bounding boxes, sized [#obj, 4].
|
||||
labels: (tensor) bounding box labels, sized [#obj,].
|
||||
scale_range: [float,float] minimal image width/height scale.
|
||||
max_aspect_ratio: (float) maximum width/height aspect ratio.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) cropped image.
|
||||
boxes: (tensor) object boxes.
|
||||
labels: (tensor) object labels.
|
||||
'''
|
||||
imw, imh = img.size
|
||||
|
||||
scale = random.uniform(scale_range[0], scale_range[1])
|
||||
aspect_ratio = random.uniform(
|
||||
max(1 / max_aspect_ratio, scale * scale),
|
||||
min(max_aspect_ratio, 1 / (scale * scale)))
|
||||
w = int(imw * scale * math.sqrt(aspect_ratio))
|
||||
h = int(imh * scale / math.sqrt(aspect_ratio))
|
||||
|
||||
x = random.randrange(imw - w)
|
||||
y = random.randrange(imh - h)
|
||||
|
||||
img = img.crop((x, y, x + w, y + h))
|
||||
linemap = linemap.crop((x, y, x + w, y + h))
|
||||
|
||||
center = (boxes[:, :2] + boxes[:, 2:]) / 2
|
||||
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
|
||||
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
|
||||
if mask.any():
|
||||
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
|
||||
boxes = box_clamp(boxes, 0, 0, w, h)
|
||||
labels = labels[mask]
|
||||
else:
|
||||
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
|
||||
labels = torch.tensor([0], dtype=torch.long)
|
||||
return img, boxes, labels, linemap
|
||||
@@ -0,0 +1,62 @@
|
||||
import torch
|
||||
import random
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def resize_lm(img, boxes, linemap, size, max_size=1000, scale=None, random_interpolation=False):
|
||||
'''Resize the input PIL image to given size.
|
||||
|
||||
If boxes is not None, resize boxes accordingly.
|
||||
|
||||
Args:
|
||||
img: (PIL.Image) image to be resized.
|
||||
boxes: (tensor) object boxes, sized [#obj,4].
|
||||
size: (tuple or int)
|
||||
- if is tuple, resize image to the size.
|
||||
- if is int, resize the shorter side to the size while maintaining the aspect ratio.
|
||||
max_size: (int) when size is int, limit the image longer size to max_size.
|
||||
This is essential to limit the usage of GPU memory.
|
||||
random_interpolation: (bool) randomly choose a resize interpolation method.
|
||||
|
||||
Returns:
|
||||
img: (PIL.Image) resized image.
|
||||
boxes: (tensor) resized boxes.
|
||||
|
||||
Example:
|
||||
>> img, boxes = resize(img, boxes, 600) # resize shorter side to 600
|
||||
>> img, boxes = resize(img, boxes, (500,600)) # resize image size to (500,600)
|
||||
>> img, _ = resize(img, None, (500,600)) # resize image only
|
||||
'''
|
||||
w, h = img.size
|
||||
if scale is None:
|
||||
if isinstance(size, int):
|
||||
size_min = min(w, h)
|
||||
size_max = max(w, h)
|
||||
sw = sh = float(size) / size_min
|
||||
if sw * size_max > max_size:
|
||||
sw = sh = float(max_size) / size_max
|
||||
ow = int(w * sw + 0.5)
|
||||
oh = int(h * sh + 0.5)
|
||||
else:
|
||||
ow, oh = size
|
||||
sw = float(ow) / w
|
||||
sh = float(oh) / h
|
||||
else:
|
||||
ow = int(w * scale)
|
||||
oh = int(h * scale)
|
||||
sw, sh = scale, scale
|
||||
|
||||
method = random.choice([
|
||||
Image.BOX,
|
||||
Image.NEAREST,
|
||||
Image.HAMMING,
|
||||
Image.BICUBIC,
|
||||
Image.LANCZOS,
|
||||
Image.BILINEAR]) if random_interpolation else Image.BILINEAR
|
||||
img = img.resize((ow, oh), method)
|
||||
linemap = linemap.resize((ow, oh), Image.NEAREST)
|
||||
|
||||
if boxes is not None:
|
||||
boxes = boxes * torch.tensor([sw, sh, sw, sh])
|
||||
return img, boxes, linemap
|
||||
@@ -0,0 +1,340 @@
|
||||
import numbers
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
from PIL import ImageOps
|
||||
import random
|
||||
from random import randint
|
||||
|
||||
import torch.functional as F
|
||||
# from skimage.transform import warp, AffineTransform
|
||||
|
||||
from bbox_utils import intersection_over_union
|
||||
|
||||
|
||||
# own stuff
|
||||
|
||||
def convert2binaryPIL(lbl_ind):
|
||||
# convert to PIL binary '1' without dither
|
||||
lbl_im = Image.fromarray(np.uint8(lbl_ind))
|
||||
fn = (lambda x: 255 if x > 0 else 0)
|
||||
lbl_im = lbl_im.convert('L').point(fn, mode='1')
|
||||
return lbl_im
|
||||
|
||||
def pad2square(bb, context_pad_ratio=0, context_pad=0, take_long_side=True):
|
||||
# -- extract square patches using ground truth bounding boxes
|
||||
# assert (context_pad >= 0 and context_pad_ratio == 0) or (context_pad_ratio >= 0 and context_pad == 0)
|
||||
width = bb[2] - bb[0]
|
||||
height = bb[3] - bb[1]
|
||||
diff = width - height
|
||||
width_is_smaller = 0 > diff
|
||||
height_is_smaller = 0 < diff
|
||||
if take_long_side:
|
||||
# take long side
|
||||
if context_pad == 0:
|
||||
if width_is_smaller:
|
||||
context_pad = np.round(context_pad_ratio * height)
|
||||
else:
|
||||
context_pad = np.round(context_pad_ratio * width)
|
||||
bb[0] = bb[0] - context_pad - (width_is_smaller * np.ceil(0.5 * (height - width)))
|
||||
bb[2] = bb[2] + context_pad + (width_is_smaller * np.floor(0.5 * (height - width)))
|
||||
bb[1] = bb[1] - context_pad - (height_is_smaller * np.ceil(0.5 * (width - height)))
|
||||
bb[3] = bb[3] + context_pad + (height_is_smaller * np.floor(0.5 * (width - height)))
|
||||
else:
|
||||
# take small side
|
||||
if context_pad == 0:
|
||||
if width_is_smaller:
|
||||
context_pad = np.round(context_pad_ratio * width)
|
||||
else:
|
||||
context_pad = np.round(context_pad_ratio * height)
|
||||
bb[0] = bb[0] - context_pad - (height_is_smaller * np.ceil(0.5 * (height - width)))
|
||||
bb[2] = bb[2] + context_pad + (height_is_smaller * np.floor(0.5 * (height - width)))
|
||||
bb[1] = bb[1] - context_pad - (width_is_smaller * np.ceil(0.5 * (width - height)))
|
||||
bb[3] = bb[3] + context_pad + (width_is_smaller * np.floor(0.5 * (width - height)))
|
||||
|
||||
return bb
|
||||
|
||||
|
||||
# BBOX sampling / cropping functions
|
||||
|
||||
def crop_image(im, bb, context_pad=0, pad_to_square=False, mean_values=[0, 0, 0]):
|
||||
"""
|
||||
Crop a window from the image for detection. Include surrounding context
|
||||
according to the `context_pad` configuration. Creates square crop which
|
||||
respects the aspect ratio.
|
||||
|
||||
window: bounding box coordinates as xmin, ymin, xmax, ymax.
|
||||
"""
|
||||
|
||||
# copy list and use as ndarray
|
||||
bb = np.array(bb, dtype=int) # list(bb)
|
||||
|
||||
imw, imh = im.shape[:2]
|
||||
|
||||
# pad to square while preserving aspect ratio
|
||||
if pad_to_square:
|
||||
bb = pad2square(bb, context_pad=context_pad)
|
||||
|
||||
# -- check whether bbox inside image
|
||||
# pad: [x_min, y_min, x_max, y_max]
|
||||
|
||||
pad = [0, 0, 0, 0]
|
||||
if (bb[0] < 0):
|
||||
pad[0] = abs(bb[0])
|
||||
bb[0] = 0
|
||||
if (bb[1] < 0):
|
||||
pad[1] = abs(bb[1])
|
||||
bb[1] = 0
|
||||
if (bb[2] > imh):
|
||||
pad[2] = bb[2] - imh
|
||||
bb[2] = imh
|
||||
if (bb[3] > imw):
|
||||
pad[3] = bb[3] - imw
|
||||
bb[3] = imw
|
||||
|
||||
# -- apply zero padding if necessary
|
||||
im = im[bb[1]:bb[3], bb[0]:bb[2], :]
|
||||
|
||||
channel_mean = np.reshape(mean_values, (1, 1, 3)).astype(np.uint8)
|
||||
if pad[0]>0:
|
||||
pad_left = np.multiply(np.ones(shape=(imw, pad[0], 3), dtype=np.uint8),
|
||||
np.tile(channel_mean,(imw, pad[0],1)))
|
||||
im = np.concatenate((pad_left, im), axis=1)
|
||||
if pad[1]>0:
|
||||
pad_up = np.multiply(np.ones(shape=(pad[1], imh, 3), dtype=np.uint8),
|
||||
np.tile(channel_mean, (pad[1], imh, 1)))
|
||||
im = np.concatenate((pad_up, im), axis=0)
|
||||
if pad[2]>0:
|
||||
pad_right = np.multiply(np.ones(shape=(imw, pad[2], 3), dtype=np.uint8),
|
||||
np.tile(channel_mean, (imw, pad[2], 1)))
|
||||
im = np.concatenate((im, pad_right), axis=1)
|
||||
if pad[3]>0:
|
||||
pad_down = np.multiply(np.ones(shape=(pad[3], imh, 3), dtype=np.uint8),
|
||||
np.tile(channel_mean, (pad[3], imh, 1)))
|
||||
im = np.concatenate((im, pad_down), axis=0)
|
||||
|
||||
return im, bb.tolist()
|
||||
else:
|
||||
if context_pad > 0:
|
||||
# better use crop_pil_image
|
||||
return NotImplemented
|
||||
# return simple crop
|
||||
return im[bb[1]:bb[3], bb[0]:bb[2], :]
|
||||
|
||||
|
||||
def crop_pil_image(im, bb, context_pad=0, pad_to_square=False, fill_values=None):
|
||||
"""
|
||||
Crop a window from the image for detection. Include surrounding context
|
||||
according to the `context_pad` configuration. Creates square crop which
|
||||
respects the aspect ratio.
|
||||
|
||||
window: bounding box coordinates as xmin, ymin, xmax, ymax.
|
||||
"""
|
||||
|
||||
# copy list and use as ndarray
|
||||
bb = np.array(bb, dtype=int) # list(bb)
|
||||
|
||||
imw, imh = im.size
|
||||
|
||||
# pad to square while preserving aspect ratio
|
||||
if pad_to_square:
|
||||
bb = pad2square(bb, context_pad=context_pad)
|
||||
|
||||
if fill_values is None:
|
||||
|
||||
# if cropped out of image range, pillow pads with zeros automatically
|
||||
im = im.crop((bb[0], bb[1], bb[2], bb[3]))
|
||||
|
||||
else:
|
||||
# check whether bbox inside image
|
||||
# pad: [x_min, y_min, x_max, y_max]
|
||||
pad = [0, 0, 0, 0]
|
||||
if bb[0] < 0:
|
||||
pad[0] = abs(bb[0])
|
||||
bb[0] = 0
|
||||
if bb[1] < 0:
|
||||
pad[1] = abs(bb[1])
|
||||
bb[1] = 0
|
||||
if bb[2] > imh:
|
||||
pad[2] = bb[2] - imh
|
||||
bb[2] = imh
|
||||
if bb[3] > imw:
|
||||
pad[3] = bb[3] - imw
|
||||
bb[3] = imw
|
||||
|
||||
# crop box
|
||||
im = im.crop((bb[0], bb[1], bb[2], bb[3]))
|
||||
# apply zero padding if necessary
|
||||
im = ImageOps.expand(im, border=(pad[0], pad[1], pad[2], pad[3]), fill=tuple(fill_values))
|
||||
|
||||
return im, bb.tolist()
|
||||
else:
|
||||
if context_pad > 0:
|
||||
bb[0] = max(bb[0] - context_pad, 0)
|
||||
bb[2] = min(bb[2] + context_pad, imw)
|
||||
bb[1] = max(bb[1] - context_pad, 0)
|
||||
bb[3] = min(bb[3] + context_pad, imh)
|
||||
# return simple crop
|
||||
return im.crop((bb[0], bb[1], bb[2], bb[3])), bb.tolist()
|
||||
|
||||
|
||||
def spatial_sample(im_pad, bb, spatial_sample_rng, rnd_scale_ratio=0.05):
|
||||
im = im_pad
|
||||
imh, imw = im.shape[:2]
|
||||
im_bb = [0, 0, imw, imh]
|
||||
|
||||
# make ground truth box square, and use its dimensions
|
||||
bb_gt = list(bb)
|
||||
w = bb[2] - bb[0]
|
||||
h = bb[3] - bb[1]
|
||||
if w > h:
|
||||
bb_gt[1] = int(bb_gt[1] - np.ceil(0.5 * (w - h)))
|
||||
bb_gt[3] = int(bb_gt[3] + np.floor(0.5 * (w - h)))
|
||||
h = w
|
||||
else:
|
||||
bb_gt[0] = int(bb_gt[0] - np.ceil(0.5 * (h - w)))
|
||||
bb_gt[2] = int(bb_gt[2] + np.floor(0.5 * (h - w)))
|
||||
w = h
|
||||
# add random scaling to test bbox
|
||||
# by treating dimension differently the aspect ratio will fluctuate a little (due to resizing afterwards!)
|
||||
wrange = round(rnd_scale_ratio * w)
|
||||
hrange = round(rnd_scale_ratio * h)
|
||||
w = min(w + random.randint(-wrange, 2*wrange), imw - 1) # ensure size is in im_pad
|
||||
h = w # min(h + random.randint(hrange, 2*hrange), imh - 1) # ensure size is in im_pad
|
||||
|
||||
# set ranges according to provided label
|
||||
min_IoU = spatial_sample_rng[0]
|
||||
max_IoU = spatial_sample_rng[1]
|
||||
|
||||
max_iter = 500
|
||||
curr_iter = 0
|
||||
ratio = 0.0
|
||||
while curr_iter < max_iter and (ratio >= max_IoU or ratio <= min_IoU):
|
||||
curr_iter += 1
|
||||
|
||||
# bbox sampling
|
||||
jxy = [randint(0, im_bb[2] - w), randint(0, im_bb[3] - h)]
|
||||
bb_test = list([jxy[0], jxy[1], w + jxy[0], h + jxy[1]])
|
||||
|
||||
# check if new box fits criteria
|
||||
if min(bb_test) >= 0 and bb_test[2] <= im.shape[1] and bb_test[3] <= im.shape[0]:
|
||||
ratio = intersection_over_union(bb_test, bb_gt)
|
||||
|
||||
if max_IoU >= ratio >= min_IoU:
|
||||
im = im[bb_test[1]:bb_test[3], bb_test[0]:bb_test[2], :]
|
||||
# new_bb_gt = [bb_gt[0] - bb_test[0], bb_gt[1] - bb_test[1], bb_gt[2] - bb_test[0], bb_gt[3] - bb_test[1]]
|
||||
new_bb_gt = bb_gt
|
||||
else:
|
||||
im = im
|
||||
new_bb_gt = bb_gt
|
||||
|
||||
# DEBUG_MODE = False
|
||||
# if DEBUG_MODE:
|
||||
# print "tricky box", w, h, imw, imh
|
||||
|
||||
return im, new_bb_gt, bb_test
|
||||
|
||||
|
||||
|
||||
# TRANSFORMS
|
||||
|
||||
|
||||
class MyRandomZoom(object):
|
||||
def __init__(self, scale_range, interpolation=Image.BILINEAR):
|
||||
self.scale_range = scale_range
|
||||
self.interpolation = interpolation
|
||||
|
||||
def __call__(self, img):
|
||||
scale = np.random.uniform(*self.scale_range)
|
||||
new_size = (int(img.height * scale), int(img.width * scale))
|
||||
return F.resize(img, new_size, self.interpolation)
|
||||
|
||||
|
||||
class MyFuzzyZoom(object):
|
||||
"""
|
||||
:param target_size: (2-tuple) height, width
|
||||
:param scale_range: (2-tuple) range from which target_size may deviate
|
||||
:param interpolation: ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional)
|
||||
"""
|
||||
def __init__(self, target_size, scale_range, interpolation=Image.BILINEAR):
|
||||
|
||||
self.target_size = target_size
|
||||
self.scale_range = scale_range
|
||||
self.interpolation = interpolation
|
||||
|
||||
@staticmethod
|
||||
def get_params(scale_range):
|
||||
return np.random.uniform(*scale_range)
|
||||
|
||||
def __call__(self, img):
|
||||
scale = self.get_params(self.scale_range)
|
||||
new_size = (int(self.target_size[0] * scale), int(self.target_size[1] * scale))
|
||||
return F.resize(img, new_size, self.interpolation)
|
||||
|
||||
|
||||
class MyRandomChoiceZoom(object):
|
||||
def __init__(self, scales, p=None, interpolation=Image.BILINEAR):
|
||||
self.scales = scales
|
||||
self.interpolation = interpolation
|
||||
self.p = p
|
||||
|
||||
def __call__(self, img):
|
||||
scale = np.random.choice(self.scales, replace=True, p=self.p)
|
||||
new_size = (int(img.height * scale), int(img.width * scale))
|
||||
return F.resize(img, new_size, self.interpolation)
|
||||
|
||||
|
||||
class MyRandomCenteredRotation(object):
|
||||
"""
|
||||
Args:
|
||||
degrees (sequence or float or int): Range of degrees to select from.
|
||||
If degrees is a number instead of sequence like (min, max), the range of degrees
|
||||
will be (-degrees, +degrees).
|
||||
translation_range (2-tuple): Range of pixels to select from.
|
||||
The center of rotation is shifted according to a number sampled from this range.
|
||||
resample ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional):
|
||||
An optional resampling filter.
|
||||
See http://pillow.readthedocs.io/en/3.4.x/handbook/concepts.html#filters
|
||||
If omitted, or if the image has mode "1" or "P", it is set to PIL.Image.NEAREST.
|
||||
"""
|
||||
def __init__(self, degrees, translation_range=(-3, 3), resample=Image.BILINEAR):
|
||||
if isinstance(degrees, numbers.Number):
|
||||
if degrees < 0:
|
||||
raise ValueError("If degrees is a single number, it must be positive.")
|
||||
self.degrees = (-degrees, degrees)
|
||||
else:
|
||||
if len(degrees) != 2:
|
||||
raise ValueError("If degrees is a sequence, it must be of len 2.")
|
||||
self.degrees = degrees
|
||||
self.translation_range = translation_range
|
||||
self.resample = resample
|
||||
|
||||
def __call__(self, img):
|
||||
|
||||
angle = np.random.uniform(*self.degrees)
|
||||
translated_center = None
|
||||
if self.translation_range:
|
||||
translated_center = (
|
||||
np.random.uniform(*self.translation_range) + int(img.height/2),
|
||||
np.random.uniform(*self.translation_range) + int(img.width/2)
|
||||
)
|
||||
return F.rotate(img, angle, resample=self.resample, expand=False, center=translated_center)
|
||||
|
||||
|
||||
class UnNormalize(object):
|
||||
def __init__(self, mean, std):
|
||||
self.mean = mean
|
||||
self.std = std
|
||||
|
||||
def __call__(self, tensor):
|
||||
"""
|
||||
Args:
|
||||
tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
|
||||
Returns:
|
||||
Tensor: Normalized image.
|
||||
"""
|
||||
for t, m, s in zip(tensor, self.mean, self.std):
|
||||
t.mul_(s).add_(m)
|
||||
# The normalize code -> t.sub_(m).div_(s)
|
||||
return tensor
|
||||
|
||||
|
||||
@@ -0,0 +1,62 @@
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
from scipy.spatial.distance import pdist, cdist, squareform
|
||||
|
||||
|
||||
def show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color):
|
||||
# Generating figure 1
|
||||
fig, axes = plt.subplots(1, 2, figsize=(15, 6), # (25, 10)
|
||||
subplot_kw={'adjustable': 'box-forced'})
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(center_im, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
ax[0].set_axis_off()
|
||||
|
||||
ax[1].imshow(lbl_ind_x, cmap='gray')
|
||||
|
||||
for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
|
||||
angle = line_rec.angle
|
||||
dist = line_rec.dist
|
||||
y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
|
||||
y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
|
||||
ax[1].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
|
||||
ax[1].text(0, y0, '{}'.format(int(line_rec.tl_line)),
|
||||
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
|
||||
|
||||
ax[1].set_xlim((0, lbl_ind_x.shape[1]))
|
||||
ax[1].set_ylim((lbl_ind_x.shape[0], 0))
|
||||
ax[1].set_axis_off()
|
||||
ax[1].set_title('Detected lines / Assigned tl line idx')
|
||||
|
||||
|
||||
def show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag):
|
||||
# Generating figure 1
|
||||
fig, axes = plt.subplots(1, 3, figsize=(15, 6),
|
||||
subplot_kw={'adjustable': 'box-forced'})
|
||||
ax = axes.ravel()
|
||||
|
||||
# weak score
|
||||
X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_weak_score(a_idx.squeeze(), b_idx.squeeze()))
|
||||
ax[0].imshow(X_dist, cmap='gray')
|
||||
ax[0].set_title('weak score')
|
||||
print(np.diag(X_dist))
|
||||
|
||||
# ransac score
|
||||
# X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
|
||||
X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
|
||||
max_dist_thresh=2, dist_weight=1)) # 5/5, 4/1
|
||||
ax[1].imshow(X_dist, cmap='gray_r')
|
||||
ax[1].set_title('ransac score')
|
||||
print(np.diag(X_dist))
|
||||
|
||||
# line matching score
|
||||
X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
|
||||
lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
|
||||
ax[2].imshow(X_dist, cmap='gray_r') # vmin=0, vmax=1
|
||||
ax[2].set_title('line matching score')
|
||||
print(np.diag(X_dist))
|
||||
|
||||
|
||||
@@ -0,0 +1,100 @@
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
from scipy import ndimage as ndi
|
||||
|
||||
|
||||
def show_line_skeleton(lbl_ind_x, skeleton):
|
||||
# display results
|
||||
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 4), sharex=True, sharey=True,
|
||||
subplot_kw={'adjustable': 'box-forced'})
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(lbl_ind_x, cmap=plt.cm.gray)
|
||||
ax[0].axis('off')
|
||||
ax[0].set_title('original', fontsize=20)
|
||||
|
||||
ax[1].imshow(skeleton, cmap=plt.cm.gray)
|
||||
ax[1].axis('off')
|
||||
ax[1].set_title('skeleton', fontsize=20)
|
||||
|
||||
ax[2].imshow(ndi.label(skeleton, structure=np.ones((3, 3)))[0], cmap=plt.cm.spectral)
|
||||
ax[2].axis('off')
|
||||
ax[2].set_title('skeleton', fontsize=20)
|
||||
|
||||
fig.tight_layout()
|
||||
|
||||
|
||||
def show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color):
|
||||
# Generating figure 1
|
||||
fig, axes = plt.subplots(1, 3, figsize=(15, 6),
|
||||
subplot_kw={'adjustable': 'box-forced'}) # (25, 15)
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(center_im, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
ax[0].set_axis_off()
|
||||
|
||||
ax[1].imshow(np.log(1 + h),
|
||||
extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]],
|
||||
cmap='gray', aspect=1 / 1.5)
|
||||
ax[1].set_title('Hough transform')
|
||||
ax[1].set_xlabel('Angles (degrees)')
|
||||
ax[1].set_ylabel('Distance (pixels)')
|
||||
ax[1].axis('image')
|
||||
|
||||
ax[2].imshow(lbl_ind_x, cmap='gray')
|
||||
|
||||
for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
|
||||
angle = line_rec.angle
|
||||
dist = line_rec.dist
|
||||
y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
|
||||
y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
|
||||
ax[2].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
|
||||
|
||||
ax[2].set_xlim((0, lbl_ind_x.shape[1]))
|
||||
ax[2].set_ylim((lbl_ind_x.shape[0], 0))
|
||||
ax[2].set_axis_off()
|
||||
ax[2].set_title('Detected lines')
|
||||
|
||||
# ax[2].imshow(lbl_ind, cmap='gray')
|
||||
# ax[2].set_title('Input image')
|
||||
# ax[2].set_axis_off()
|
||||
|
||||
|
||||
def show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color):
|
||||
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(center_im, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
|
||||
# ax[1].imshow(lbl_ind_x, cmap='gray')
|
||||
# ax[1].set_title('line det')
|
||||
|
||||
ax[1].imshow(lbl_ind_x * 0)
|
||||
for line, li in zip(line_segs, ls_labels):
|
||||
p0, p1 = line
|
||||
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[int(group2line[li])], linewidth=2)
|
||||
ax[1].text(p0[0], p0[1], '{}'.format(group2line[li]),
|
||||
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
|
||||
ax[1].set_xlim((0, lbl_ind_x.shape[1]))
|
||||
ax[1].set_ylim((lbl_ind_x.shape[0], 0))
|
||||
ax[1].set_title('Probabilistic Hough')
|
||||
|
||||
|
||||
def show_line_segms(image_label_overlay, segm_labels):
|
||||
fig, axes = plt.subplots(1, 2, figsize=(15, 9)) # 25, 15
|
||||
ax = axes.ravel()
|
||||
|
||||
ax[0].imshow(image_label_overlay, cmap='gray')
|
||||
ax[0].set_title('Input image')
|
||||
|
||||
ax[1].imshow(segm_labels)
|
||||
ax[1].set_title('Line segments')
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Alguns arquivos não foram exibidos porque demasiados arquivos foram alterados neste diff Mostrar Mais
Referência em uma Nova Issue
Bloquear um usuário