Initial commit

Esse commit está contido em:
to3i
2020-11-19 12:18:53 +01:00
commit d7e2349555
119 arquivos alterados com 23621 adições e 0 exclusões
+139
Ver Arquivo
@@ -0,0 +1,139 @@
# Cuneiform-Sign-Detection-Code
Author: Tobias Dencker - <tobias.dencker@gmail.com>
This is the code repository for the article submission on "Deep learning of cuneiform sign detection with weak supervision using transliteration alignment".
This repository contains code to execute the proposed iterative training procedure as well as code to evaluate and visualize results.
Moreover, we provide pre-trained models of the cuneiform sign detector for Neo-Assyrian script after iterative training on the [Cuneiform Sign Detection Dataset](https://compvis.github.io/cuneiform-sign-detection-dataset/).
Finally, we provide a web application for the analysis of tablet images with the help of a pre-trained cuneiform sign detector.
<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_decent.jpg" alt="sign detections on tablet images: yellow box indicate TP and blue FP detections" width="700"/>
<!--- <img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_difficult.jpg" alt="Web interface detection" width="500"/> -->
## Repository description
- General structure:
- `data`: tablet images, annotations, transliterations, metadata
- `experiments`: training, testing, evaluation and visualization
- `lib`: project library code
- `results`: generated detections (placed, raw and aligned), network weights, logs
- `scripts`: scripts to run the alignment and placement step of iterative training
### Use cases
- Pre-processing of training data
- line detection
- Iterative training
- generate sign annotations (aligned and placed detections)
- sign detector training
- Evaluation (on test set)
- raw detections
- placed detections
- aligned detections
- Test & visualize
- line segmentation and post-processing
- line-level and sign-level alignments
- TP/FP for raw, aligned and placed detections (full tablet and crop level)
### Pre-processing
As pre-processing of the training data line detections are obtained for all tablet images before iterative training.
- use jupyter notebooks (`experiments/line_segmentation/`) for train, eval of line segmentation network and to perform line detection on all tablet images of train set
### Training
*Iterative training* alternates between generating aligned and placed detections and training a new sign detector:
1. use command-line scripts (`scripts/generate/`) for running alignment and placement step of iterative training
2. use jupyter notebooks (`experiments/sign_detector/`) for sign detector training step of iterative training
To keep track of the sign detector and generated sign annotations of each iteration of iterative training (stored in `results/`),
we follow the convention to label the sign detector with a *model version* (e.g. v002)
which is also used to label the raw, aligned and placed detections based on this detector.
Besides providing a model version, a user also selects which subsets of the training data to use for the generation of new annotations.
In particular, *subsets of SAAo collections* (e.g. saa01, saa05, saa08) are selected, when running the scripts under `scripts/generate/`.
To enable the evaluation on the test set, it is necessary to include the collections (test, saa06).
### Evaluation
Use the [*test sign detector notebook*](./experiments/sign_detector/test_sign_detector.ipynb) in order to test the performance of the trained sign detector (mAP) on the test set or other subsets of the dataset.
In `experiments/alignment_evaluation/` you find further notebooks for evaluation and visualization of line-level and sign-level alignments and TP/FP for raw, aligned and placed detections (full tablet and crop level).
### Pre-trained models
We provide pre-trained models in the form of [PyTorch model files](https://pytorch.org/tutorials/beginner/saving_loading_models.html) for the line segmentation network as well as the sign detector.
| Model name | Model type | Train annotations |
|----------------|-------------------|------------------------|
| [lineNet_basic_vpub.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/lineNet_basic_vpub.pth) | line segmentation | 410 lines |
For the sign detector, we provide the best weakly supervised model (fpn_net_vA) and the best semi-supervised model (fpn_net_vF).
| Model name | Model type | Weak supervision in training | Annotations in training | mAP on test_full |
|----------------|-------------------|-------------------|------------------------|------------------------|
| [fpn_net_vA.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vA.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | None | 45.3 |
| [fpn_net_vF.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vF.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | train_full (4663 bboxes) | 65.6 |
### Web application
We also provide a demo web application that enables a user to apply a trained cuneiform sign detector to a large collection of tablet images.
The code of the web front-end is available in the [webapp repo](https://github.com/compvis/cuneiform-sign-detection-webapp/).
The back-end code is part of this repository and is located in [lib/webapp/](./lib/webapp/).
Below you find a short animation of how the sign detector is used with this web interface.
<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/demo_cuneiform_sign_detection.gif" alt="Web interface detection" width="700"/>
For demonstration purposes, we also host an instance of the web application: [Demo Web Application](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/).
If you would like to test the web application, please contact us for user credentials to log in.
Please note that this web application is a prototype for demonstration purposes only and not a production system.
In case the website is not reachable, or other technical issues occur, please contact us.
### Cuneiform font
For visualization of the cuneiform characters, we recommend installing the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren.
## Installation
#### Software
Install general dependencies:
- **OpenGM** with python wrapper - library for discrete graphical models. http://hciweb2.iwr.uni-heidelberg.de/opengm/
This library is needed for the alignment step during training. Testing is not affected. An installation guide for Ubuntu 14.04 can be found [here](./install_opengm.md).
- Python 2.7.X
- Python packages:
- torch 1.0
- torchvision
- scikit-image 0.14.0
- pandas, scipy, sklearn, jupyter
- pillow, tqdm, tensorboardX, nltk, Levensthein, editdistance, easydict
Clone this repository and place the [*cuneiform-sign-detection-dataset*](https://github.com/compvis/cuneiform-sign-detection-dataset) in the [./data sub-folder](./data/).
#### Hardware
Training and evaluation can be performed on a machine with a single GPU (we used a GeFore GTX 1080).
The demo web application can run on a web server without GPU support,
since detection inference with a lightweight MobileNetV2 backbone is fast even in CPU only mode
(less than 1s for an image with HD resolution, less than 10s for 4K resolution).
### References
This repository also includes external code. In particular, we want to mention:
> - kuangliu's *torchcv* and *pytorch-cifar* repositories from which we adapted the SSD and FPN detector code:
https://github.com/kuangliu/pytorch-cifar and
https://github.com/kuangliu/torchcv
> - Ross Girshick's *py-faster-rcnn* repository from which we adapted part of our evaluation routine:
https://github.com/rbgirshick/py-faster-rcnn
> - Rico Sennrich's *Bleualign* repository from which we adapted part of the Bleualign implementation:
https://github.com/rsennrich/Bleualign
+1
Ver Arquivo
@@ -0,0 +1 @@
theme: jekyll-theme-cayman
+15
Ver Arquivo
@@ -0,0 +1,15 @@
### Data folder
Place [*cuneiform-sign-detection-dataset*](https://github.com/to3i/cuneiform-sign-detection-dataset) folders here:
- ./data/annotations
- ./data/images
- ./data/segments
- ./data/transliterations
#### Meta data files:
- *cunei_mzl.csv* contains the sign code class index established by Borger's Mesopotamisches Zeichenlexikon (MZL)
- *newLabels.json* contains new labels (re-indexing) for the subset of Neo-Assyrian MZL code classes so that labels range from 0-360 instead of 0-910 which reduces the output dimension of the detector
- *unicode_sign_stats.csv* contains estimates for sign length and height for individual cuneiform sign classes. These estimates were derived from the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren.
+1055
Ver Arquivo
Diferenças do arquivo suprimidas por serem muito extensas Carregar Diff
+1
Ver Arquivo
@@ -0,0 +1 @@
[0, 2, 0, 191, 0, 184, 196, 238, 239, 40, 26, 221, 240, 241, 24, 109, 73, 236, 210, 0, 205, 0, 242, 0, 58, 0, 133, 0, 0, 243, 0, 199, 244, 0, 0, 0, 0, 245, 0, 0, 0, 0, 0, 0, 246, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 249, 0, 0, 250, 208, 0, 0, 0, 0, 0, 193, 0, 251, 0, 252, 0, 0, 0, 162, 154, 0, 0, 0, 66, 23, 48, 0, 0, 108, 228, 129, 140, 0, 0, 0, 0, 253, 41, 97, 0, 254, 0, 0, 0, 255, 256, 0, 257, 258, 4, 8, 17, 31, 0, 202, 0, 224, 83, 213, 49, 259, 260, 0, 0, 0, 0, 20, 0, 175, 207, 261, 90, 0, 6, 0, 156, 93, 189, 152, 110, 7, 76, 64, 0, 0, 0, 0, 145, 262, 263, 194, 264, 265, 0, 0, 0, 266, 0, 0, 267, 0, 29, 0, 78, 268, 142, 231, 269, 0, 63, 0, 89, 270, 271, 272, 34, 273, 186, 0, 101, 107, 274, 275, 104, 0, 0, 0, 276, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 277, 0, 278, 0, 0, 0, 69, 0, 279, 0, 0, 280, 0, 0, 281, 0, 0, 0, 0, 0, 176, 215, 116, 0, 0, 0, 0, 0, 0, 282, 0, 283, 0, 0, 0, 284, 0, 157, 0, 0, 0, 61, 0, 0, 0, 232, 206, 5, 0, 0, 0, 80, 124, 222, 36, 0, 0, 183, 195, 84, 160, 237, 0, 0, 0, 119, 0, 0, 0, 285, 71, 0, 0, 0, 229, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 0, 0, 132, 286, 0, 0, 287, 115, 52, 0, 197, 233, 209, 0, 0, 0, 0, 0, 0, 200, 0, 204, 288, 46, 0, 0, 289, 0, 0, 0, 0, 0, 0, 0, 290, 0, 50, 0, 0, 0, 0, 0, 291, 0, 0, 0, 292, 0, 293, 192, 294, 295, 0, 0, 0, 0, 0, 0, 296, 0, 25, 120, 297, 212, 123, 0, 146, 134, 21, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 298, 299, 0, 0, 0, 300, 141, 44, 62, 45, 0, 0, 0, 0, 0, 138, 0, 0, 0, 0, 301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 302, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 303, 0, 0, 0, 304, 305, 0, 0, 306, 0, 165, 307, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 308, 0, 0, 0, 113, 309, 137, 0, 0, 28, 0, 0, 310, 0, 159, 0, 0, 0, 0, 0, 0, 0, 0, 181, 99, 158, 311, 0, 0, 0, 30, 102, 0, 74, 177, 3, 126, 312, 19, 67, 188, 130, 128, 313, 178, 0, 0, 163, 0, 314, 0, 42, 315, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 225, 72, 0, 139, 223, 0, 0, 316, 57, 317, 318, 319, 13, 35, 320, 321, 217, 322, 179, 190, 121, 65, 150, 0, 323, 324, 148, 96, 0, 0, 226, 325, 0, 326, 327, 0, 219, 328, 98, 43, 87, 0, 0, 234, 329, 112, 0, 60, 0, 18, 198, 136, 330, 0, 0, 331, 39, 0, 155, 27, 0, 92, 0, 0, 0, 0, 0, 0, 332, 0, 0, 333, 105, 0, 334, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 218, 0, 0, 94, 135, 0, 0, 103, 174, 0, 0, 0, 75, 32, 0, 201, 187, 0, 335, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 211, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 167, 0, 0, 22, 168, 0, 0, 230, 12, 0, 0, 336, 85, 166, 0, 227, 0, 53, 337, 0, 37, 0, 0, 185, 0, 338, 339, 171, 0, 0, 173, 0, 0, 86, 340, 0, 153, 0, 0, 0, 0, 341, 216, 342, 343, 0, 79, 180, 144, 0, 0, 125, 0, 161, 169, 9, 0, 0, 77, 100, 0, 0, 0, 344, 0, 0, 214, 131, 55, 95, 14, 0, 47, 345, 0, 81, 56, 117, 106, 0, 0, 0, 235, 33, 0, 0, 0, 0, 346, 347, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 348, 0, 349, 0, 0, 0, 0, 0, 0, 350, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 38, 351, 352, 0, 82, 353, 0, 164, 354, 0, 355, 0, 0, 172, 0, 0, 356, 51, 357, 358, 88, 0, 359, 0, 360, 127, 54, 361, 147, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 362, 0, 363, 0, 111, 0, 0, 1, 0, 220, 364, 365, 366, 0, 0, 0, 367, 10, 0, 0, 0, 0, 0, 0, 0, 368, 0, 0, 0, 369, 0, 170, 203, 0, 0, 151, 0, 91, 0, 370, 371, 372, 0, 0, 0, 0, 0, 149, 59, 0, 0, 0, 0, 373, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 374, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+239
Ver Arquivo
@@ -0,0 +1,239 @@
train_lbl,width,height
1,0.71875,0.9375
2,0.9296875,0.9375
3,1.71875,0.9375
4,1.546875,0.9375
5,1.6640625,1.0234375
6,1.8125,0.9375
7,1.6484375,0.9375
8,1.2109375,1.0703125
9,2.7421875,1.0625
10,0.4765625,1.0703125
11,0.7109375,0.9375
12,1.71875,1.0
13,1.171875,0.9375
14,0.453125,0.9375
15,1.3125,0.9375
16,0.4375,0.9375
17,0.9375,1.015625
18,0.984375,0.9375
19,1.3046875,0.9375
20,1.890625,1.078125
21,1.2421875,0.984375
22,1.296875,0.9375
23,2.109375,1.078125
24,1.4296875,1.0546875
25,1.7890625,1.0078125
26,1.171875,0.9375
27,1.203125,0.9375
28,1.171875,0.9375
29,2.96875,0.96875
30,1.84375,0.9375
31,1.21875,0.9375
32,1.5078125,0.9453125
33,1.5859375,1.0390625
34,1.5234375,0.9375
35,1.46875,0.9375
36,1.2578125,0.9375
37,1.7109375,0.9375
38,1.4453125,1.0703125
39,0.7421875,0.9375
40,1.171875,0.9375
41,1.2421875,0.9375
42,1.7265625,0.9921875
43,0.65625,0.9375
44,0.9453125,0.9375
45,1.2578125,1.015625
46,2.3046875,0.9375
47,1.3515625,0.9375
48,1.6015625,1.0
49,0.9296875,0.9375
50,2.53125,1.0625
51,0.703125,0.9375
52,1.5234375,0.984375
53,1.4921875,0.9453125
54,0.8828125,0.9375
55,1.0,0.9375
56,2.0703125,0.9375
57,0.921875,0.9375
58,2.0859375,1.0703125
59,1.3828125,1.0
60,2.09375,0.9375
61,2.15625,0.9453125
62,0.9375,0.9375
63,1.625,0.9453125
64,1.59375,0.9453125
65,1.6328125,0.9375
66,2.53125,0.984375
67,1.5,0.9453125
68,0.671875,0.9375
69,2.1171875,1.0546875
70,2.5546875,0.9921875
71,2.3125,1.078125
72,2.25,1.0703125
73,1.6015625,1.0234375
74,3.1328125,1.0546875
75,1.296875,0.9375
76,1.7265625,0.9375
77,1.4453125,0.9375
78,1.1640625,1.078125
79,1.3671875,0.9375
80,1.2578125,0.9375
81,1.1484375,0.9375
82,1.515625,1.0546875
83,1.515625,0.9375
84,1.7421875,0.9375
85,1.6953125,0.953125
86,0.9453125,0.9375
87,1.6015625,0.9375
88,1.390625,1.0625
89,1.1875,0.9453125
90,1.4140625,0.9453125
91,1.9921875,1.0546875
92,2.40625,0.9375
93,2.0625,0.9453125
94,0.671875,0.9375
95,1.234375,0.9375
96,1.1953125,1.046875
97,1.171875,0.9453125
98,0.671875,0.9375
99,1.5078125,0.9375
100,1.46875,1.078125
101,1.4140625,1.0703125
102,1.6328125,0.9375
103,1.59375,0.9375
104,2.1953125,0.9375
105,0.71875,0.9375
106,1.6328125,1.0390625
107,1.4140625,1.0546875
108,1.4921875,0.9375
109,1.6171875,1.03125
110,1.59375,0.9375
111,0.765625,0.9453125
112,1.875,0.9375
113,0.9296875,0.9375
114,1.6015625,1.0703125
115,2.0,1.0
116,1.2734375,0.9375
117,1.5859375,1.03125
118,1.78125,1.0234375
119,2.109375,1.03125
120,1.984375,1.0546875
121,1.546875,0.9375
122,1.3046875,0.9375
123,1.765625,1.078125
124,1.265625,0.9375
125,2.0703125,0.9375
126,1.5390625,0.9375
127,2.3515625,1.078125
128,1.6171875,0.9375
129,1.75,1.0625
130,0.8515625,0.9375
131,0.890625,0.9453125
132,1.84375,1.0546875
133,2.2890625,1.0859375
134,1.5703125,0.96875
135,0.671875,0.9375
136,0.75,0.9375
137,2.53125,1.078125
138,1.1875,0.9375
139,1.2890625,0.9375
140,0.9296875,0.9375
141,0.8515625,0.9375
142,2.046875,1.0703125
143,1.625,0.9375
144,3.0546875,0.9453125
145,1.4296875,0.9453125
146,2.359375,1.03125
147,0.8515625,0.9375
148,1.640625,0.9375
149,1.859375,1.0078125
150,1.1328125,0.9375
151,2.609375,1.078125
152,1.7890625,0.9375
153,0.984375,0.9375
154,1.953125,0.9765625
155,1.46875,0.9375
156,1.65625,0.9375
157,1.96875,0.9375
158,1.5078125,1.078125
159,1.8828125,1.046875
160,2.0625,1.0078125
161,2.671875,1.0625
162,2.296875,0.984375
163,1.7109375,0.9375
164,1.5234375,1.09375
165,0.9296875,0.9609375
166,2.2734375,1.03125
167,1.25,0.9375
168,2.1640625,0.9375
169,2.84375,1.078125
170,1.1796875,1.0
171,2.484375,0.9375
172,3.9609375,1.0078125
173,0.8515625,0.9375
174,1.796875,0.9375
175,2.2578125,1.0546875
176,1.046875,0.9375
177,1.671875,0.9375
178,1.828125,1.0
179,1.4765625,0.9375
180,2.5546875,1.0703125
181,1.5078125,0.9453125
182,3.0078125,1.078125
183,1.4921875,0.9375
184,1.3359375,0.9375
185,1.2890625,0.9375
186,2.578125,0.9375
187,1.59375,0.9453125
188,1.5234375,0.9375
189,3.828125,0.9375
190,1.28125,0.9375
191,1.125,0.9375
192,2.0625,0.9375
193,1.640625,0.96875
194,1.0234375,0.9375
195,1.7421875,0.9375
196,1.5859375,0.9375
197,1.84375,0.9375
198,1.6875,0.9375
199,2.171875,1.0703125
200,1.3359375,0.9375
201,1.953125,0.9375
202,1.7578125,1.0
203,1.7734375,1.078125
204,2.203125,0.9375
205,1.515625,1.046875
206,2.234375,0.9375
207,1.34375,0.9375
208,2.2109375,1.0625
209,1.3671875,0.9375
210,1.4609375,1.0234375
211,2.5078125,1.0703125
212,1.765625,1.0703125
213,0.9296875,1.0625
214,1.859375,0.9375
215,2.234375,1.015625
216,1.8671875,1.078125
217,1.7890625,1.0703125
218,0.59375,0.9375
219,0.53125,0.9375
220,0.8046875,0.9375
221,1.9453125,0.9375
223,2.328125,1.03125
224,1.5859375,0.9375
225,1.3046875,0.9375
226,1.7265625,1.046875
227,1.84375,0.9375
228,1.5234375,0.9375
229,2.6796875,1.046875
230,1.53125,0.984375
231,1.8046875,0.953125
232,1.25,0.9375
233,1.5859375,0.9375
234,2.0625,0.9453125
235,1.5859375,1.03125
236,1.84375,1.0234375
237,2.171875,1.0703125
238,1.5859375,0.9375
239,1.6875,0.9375
1 train_lbl width height
2 1 0.71875 0.9375
3 2 0.9296875 0.9375
4 3 1.71875 0.9375
5 4 1.546875 0.9375
6 5 1.6640625 1.0234375
7 6 1.8125 0.9375
8 7 1.6484375 0.9375
9 8 1.2109375 1.0703125
10 9 2.7421875 1.0625
11 10 0.4765625 1.0703125
12 11 0.7109375 0.9375
13 12 1.71875 1.0
14 13 1.171875 0.9375
15 14 0.453125 0.9375
16 15 1.3125 0.9375
17 16 0.4375 0.9375
18 17 0.9375 1.015625
19 18 0.984375 0.9375
20 19 1.3046875 0.9375
21 20 1.890625 1.078125
22 21 1.2421875 0.984375
23 22 1.296875 0.9375
24 23 2.109375 1.078125
25 24 1.4296875 1.0546875
26 25 1.7890625 1.0078125
27 26 1.171875 0.9375
28 27 1.203125 0.9375
29 28 1.171875 0.9375
30 29 2.96875 0.96875
31 30 1.84375 0.9375
32 31 1.21875 0.9375
33 32 1.5078125 0.9453125
34 33 1.5859375 1.0390625
35 34 1.5234375 0.9375
36 35 1.46875 0.9375
37 36 1.2578125 0.9375
38 37 1.7109375 0.9375
39 38 1.4453125 1.0703125
40 39 0.7421875 0.9375
41 40 1.171875 0.9375
42 41 1.2421875 0.9375
43 42 1.7265625 0.9921875
44 43 0.65625 0.9375
45 44 0.9453125 0.9375
46 45 1.2578125 1.015625
47 46 2.3046875 0.9375
48 47 1.3515625 0.9375
49 48 1.6015625 1.0
50 49 0.9296875 0.9375
51 50 2.53125 1.0625
52 51 0.703125 0.9375
53 52 1.5234375 0.984375
54 53 1.4921875 0.9453125
55 54 0.8828125 0.9375
56 55 1.0 0.9375
57 56 2.0703125 0.9375
58 57 0.921875 0.9375
59 58 2.0859375 1.0703125
60 59 1.3828125 1.0
61 60 2.09375 0.9375
62 61 2.15625 0.9453125
63 62 0.9375 0.9375
64 63 1.625 0.9453125
65 64 1.59375 0.9453125
66 65 1.6328125 0.9375
67 66 2.53125 0.984375
68 67 1.5 0.9453125
69 68 0.671875 0.9375
70 69 2.1171875 1.0546875
71 70 2.5546875 0.9921875
72 71 2.3125 1.078125
73 72 2.25 1.0703125
74 73 1.6015625 1.0234375
75 74 3.1328125 1.0546875
76 75 1.296875 0.9375
77 76 1.7265625 0.9375
78 77 1.4453125 0.9375
79 78 1.1640625 1.078125
80 79 1.3671875 0.9375
81 80 1.2578125 0.9375
82 81 1.1484375 0.9375
83 82 1.515625 1.0546875
84 83 1.515625 0.9375
85 84 1.7421875 0.9375
86 85 1.6953125 0.953125
87 86 0.9453125 0.9375
88 87 1.6015625 0.9375
89 88 1.390625 1.0625
90 89 1.1875 0.9453125
91 90 1.4140625 0.9453125
92 91 1.9921875 1.0546875
93 92 2.40625 0.9375
94 93 2.0625 0.9453125
95 94 0.671875 0.9375
96 95 1.234375 0.9375
97 96 1.1953125 1.046875
98 97 1.171875 0.9453125
99 98 0.671875 0.9375
100 99 1.5078125 0.9375
101 100 1.46875 1.078125
102 101 1.4140625 1.0703125
103 102 1.6328125 0.9375
104 103 1.59375 0.9375
105 104 2.1953125 0.9375
106 105 0.71875 0.9375
107 106 1.6328125 1.0390625
108 107 1.4140625 1.0546875
109 108 1.4921875 0.9375
110 109 1.6171875 1.03125
111 110 1.59375 0.9375
112 111 0.765625 0.9453125
113 112 1.875 0.9375
114 113 0.9296875 0.9375
115 114 1.6015625 1.0703125
116 115 2.0 1.0
117 116 1.2734375 0.9375
118 117 1.5859375 1.03125
119 118 1.78125 1.0234375
120 119 2.109375 1.03125
121 120 1.984375 1.0546875
122 121 1.546875 0.9375
123 122 1.3046875 0.9375
124 123 1.765625 1.078125
125 124 1.265625 0.9375
126 125 2.0703125 0.9375
127 126 1.5390625 0.9375
128 127 2.3515625 1.078125
129 128 1.6171875 0.9375
130 129 1.75 1.0625
131 130 0.8515625 0.9375
132 131 0.890625 0.9453125
133 132 1.84375 1.0546875
134 133 2.2890625 1.0859375
135 134 1.5703125 0.96875
136 135 0.671875 0.9375
137 136 0.75 0.9375
138 137 2.53125 1.078125
139 138 1.1875 0.9375
140 139 1.2890625 0.9375
141 140 0.9296875 0.9375
142 141 0.8515625 0.9375
143 142 2.046875 1.0703125
144 143 1.625 0.9375
145 144 3.0546875 0.9453125
146 145 1.4296875 0.9453125
147 146 2.359375 1.03125
148 147 0.8515625 0.9375
149 148 1.640625 0.9375
150 149 1.859375 1.0078125
151 150 1.1328125 0.9375
152 151 2.609375 1.078125
153 152 1.7890625 0.9375
154 153 0.984375 0.9375
155 154 1.953125 0.9765625
156 155 1.46875 0.9375
157 156 1.65625 0.9375
158 157 1.96875 0.9375
159 158 1.5078125 1.078125
160 159 1.8828125 1.046875
161 160 2.0625 1.0078125
162 161 2.671875 1.0625
163 162 2.296875 0.984375
164 163 1.7109375 0.9375
165 164 1.5234375 1.09375
166 165 0.9296875 0.9609375
167 166 2.2734375 1.03125
168 167 1.25 0.9375
169 168 2.1640625 0.9375
170 169 2.84375 1.078125
171 170 1.1796875 1.0
172 171 2.484375 0.9375
173 172 3.9609375 1.0078125
174 173 0.8515625 0.9375
175 174 1.796875 0.9375
176 175 2.2578125 1.0546875
177 176 1.046875 0.9375
178 177 1.671875 0.9375
179 178 1.828125 1.0
180 179 1.4765625 0.9375
181 180 2.5546875 1.0703125
182 181 1.5078125 0.9453125
183 182 3.0078125 1.078125
184 183 1.4921875 0.9375
185 184 1.3359375 0.9375
186 185 1.2890625 0.9375
187 186 2.578125 0.9375
188 187 1.59375 0.9453125
189 188 1.5234375 0.9375
190 189 3.828125 0.9375
191 190 1.28125 0.9375
192 191 1.125 0.9375
193 192 2.0625 0.9375
194 193 1.640625 0.96875
195 194 1.0234375 0.9375
196 195 1.7421875 0.9375
197 196 1.5859375 0.9375
198 197 1.84375 0.9375
199 198 1.6875 0.9375
200 199 2.171875 1.0703125
201 200 1.3359375 0.9375
202 201 1.953125 0.9375
203 202 1.7578125 1.0
204 203 1.7734375 1.078125
205 204 2.203125 0.9375
206 205 1.515625 1.046875
207 206 2.234375 0.9375
208 207 1.34375 0.9375
209 208 2.2109375 1.0625
210 209 1.3671875 0.9375
211 210 1.4609375 1.0234375
212 211 2.5078125 1.0703125
213 212 1.765625 1.0703125
214 213 0.9296875 1.0625
215 214 1.859375 0.9375
216 215 2.234375 1.015625
217 216 1.8671875 1.078125
218 217 1.7890625 1.0703125
219 218 0.59375 0.9375
220 219 0.53125 0.9375
221 220 0.8046875 0.9375
222 221 1.9453125 0.9375
223 223 2.328125 1.03125
224 224 1.5859375 0.9375
225 225 1.3046875 0.9375
226 226 1.7265625 1.046875
227 227 1.84375 0.9375
228 228 1.5234375 0.9375
229 229 2.6796875 1.046875
230 230 1.53125 0.984375
231 231 1.8046875 0.953125
232 232 1.25 0.9375
233 233 1.5859375 0.9375
234 234 2.0625 0.9453125
235 235 1.5859375 1.03125
236 236 1.84375 1.0234375
237 237 2.171875 1.0703125
238 238 1.5859375 0.9375
239 239 1.6875 0.9375
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
+6
Ver Arquivo
@@ -0,0 +1,6 @@
### Train & eval line segmentation network
- use `train_line_segmentation.ipynb` for training and `test_line_segmentation.ipynb` for eval
### Pre-processing before iterative training
- use `precompute_line_segmentations.ipynb` obtain line detections for all tablet images in the training set as pre-processing before iterative training starts
@@ -0,0 +1,301 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-compute and store line segmentations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import pandas as pd\n",
"from PIL import Image\n",
"from ast import literal_eval"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# torch\n",
"import torch\n",
"import torchvision\n",
"# addons\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#%pylab inline\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"relative_path = '../../'\n",
"# ensure that parent path is on the python path in order to have all packages available\n",
"import sys, os\n",
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
"sys.path.insert(0, parent_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from lib.models.trained_model_loader import get_line_net_fcn\n",
"from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
"from lib.transliteration.sign_labels import get_label_list\n",
"from lib.utils.transform_utils import UnNormalize\n",
"\n",
"from lib.detection.run_gen_line_detection import gen_line_detections"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Config Basics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# toggle generation\n",
"save_line_detections = True"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# set line segmentation network\n",
"line_model_version = 'v002'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# dataset config\n",
"collections = ['test', 'train', 'saa01', 'saa05', 'saa06', 'saa08', 'saa09', 'saa10', 'saa13', 'saa16'] \n",
"#collections = ['saa01', 'saa05']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"data_layer_params = dict(batch_size=[128, 16],\n",
" img_channels=1,\n",
" gray_mean=[0.5],\n",
" gray_std=[1.0], \n",
" num_classes = 2\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Config Data Augmentation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"num_classes = data_layer_params['num_classes']\n",
"num_c = data_layer_params['img_channels']\n",
"gray_mean = data_layer_params['gray_mean']\n",
"gray_std = data_layer_params['gray_std']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"re_transform = torchvision.transforms.Compose([\n",
" UnNormalize(mean=gray_mean, std=gray_std),\n",
" torchvision.transforms.ToPILImage(),\n",
" ])\n",
"re_transform_rgb = torchvision.transforms.Compose([\n",
" UnNormalize(mean=gray_mean * 3, std=gray_std * 3),\n",
" torchvision.transforms.ToPILImage(),\n",
" ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#use_gpu = torch.cuda.is_available()\n",
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model_fcn = get_line_net_fcn(line_model_version, device, num_classes=num_classes, num_c=num_c)\n",
"print(model_fcn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": false
},
"outputs": [],
"source": [
"for saa_version in collections:\n",
" print('collection: <><>{}<><>'.format(saa_version))\n",
" \n",
" ### Get collection dataset\n",
" dataset = CuneiformSegments(collections=[saa_version], relative_path=relative_path, \n",
" only_annotated=False, only_assigned=True, preload_segments=False)\n",
" \n",
" # filter collection dataset - OPTIONAL\n",
" didx_list = range(len(dataset))\n",
" \n",
" ### Generate line detections\n",
" gen_line_detections(didx_list, dataset, saa_version, relative_path,\n",
" line_model_version, model_fcn, re_transform, device,\n",
" save_line_detections) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
+11
Ver Arquivo
@@ -0,0 +1,11 @@
### Perform sign detector training
After the sign annotations (aligned and placed detections) have been generated and stored under `results/results_ssd/` using the scripts in `scripts/generate/`,
the sign detector is trained by performing the following steps:
1) use `train_sign_classifier.ipynb` as template to train sign classifier
2) use `train_sign_detector.ipynb` as template to train sign detector (initialized with pre-trained sign classifier from 1.)
3) in semi-supervised case, use `finetune_sign_detector.ipynb` to fine-tune sign detector on manual annotations
### Eval sign detector
- use `test_sign_detector.ipynb` for evaluation of the sign detector
@@ -0,0 +1,623 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fine-tune sign detector network (in semi-supervised case)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from PIL import Image\n",
"from ast import literal_eval\n",
"import os.path\n",
"from tqdm import tqdm\n",
"import copy\n",
"\n",
"import torch\n",
"import torch.optim as optim\n",
"from torch.optim import lr_scheduler\n",
"import torch.utils.data as data\n",
"\n",
"from torchvision import transforms as trafos\n",
"import torchvision.transforms as transforms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"relative_path = '../../'\n",
"# ensure that parent path is on the python path in order to have all packages available\n",
"import sys, os\n",
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
"sys.path.insert(0, parent_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
"\n",
"from lib.alignment.LineFragment import plot_boxes\n",
"from lib.utils.pytorch_utils import get_tensorboard_writer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
"from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
"from lib.utils.torchcv.models.net import FPNSSD\n",
"from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import time\n",
"hh = 0.001\n",
"## time.sleep(60*60*hh)\n",
"for i in tqdm(range(int(6*60*hh))):\n",
" time.sleep(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Config Basics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model_version = 'v001ft01'\n",
"\n",
"# config pretrained detector\n",
"pretrained_model_version = 'v001' # 'v191' \n",
"\n",
"# config datasets for training and testing\n",
"train_collections = ['train_D'] \n",
"test_collections = ['testEXT'] # ['test_full']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# config generated data\n",
"with_gen_data = False\n",
"\n",
"gen_model_version = 'v001' \n",
"\n",
"gen_folder = 'results_ssd/{}/'.format(gen_model_version) \n",
"gen_file_path = None\n",
"\n",
"gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
"gen_collections += ['train']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# config backbone architecture\n",
"arch_opt = 1\n",
"arch_type = 'mobile'\n",
"width_mult = 0.625\n",
"\n",
"# config detector\n",
"with_64 = False\n",
"create_bg_class = False\n",
"img_size = 512\n",
"num_classes = 240\n",
"\n",
"# config schedule\n",
"num_epochs = 11 \n",
"lr_milestones = [60]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# set log file name\n",
"if with_gen_data:\n",
" version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
"else:\n",
" version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparing Datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
" from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
" from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
" from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
"else:\n",
" from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
" from lib.utils.torchcv.transforms.resize import resize\n",
" from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
" from lib.utils.torchcv.transforms.pad_gs import pad"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
"print('num_anchors', len(box_coder.anchor_boxes))\n",
"print('anchor areas', np.sqrt(box_coder.anchor_areas))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data: \n",
" def transform_train(img, boxes, labels, linemap):\n",
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
" transforms.Lambda(lambda x: x) # identity\n",
" ])(img) \n",
" img, linemap = pad_lm(img, linemap, (600, 600))\n",
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
"\n",
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
"else:\n",
" def transform_train(img, boxes, labels):\n",
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
" transforms.Lambda(lambda x: x) # identity\n",
" ])(img) \n",
" img = pad(img, (600, 600))\n",
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels)\n",
" return img, boxes, labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
" gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
" relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
" remove_empty_tiles=False, min_align_ratio=0.2)\n",
"else:\n",
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
" gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" def transform_test(img, boxes, labels, linemap):\n",
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
"else:\n",
" def transform_test(img, boxes, labels):\n",
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels)\n",
" return img, boxes, labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
" gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
"else:\n",
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
" gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
"testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# load FPN model from pretrained detector model\n",
"fpnssd_net = get_fpn_ssd_net(pretrained_model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
" relative_path, num_classes, num_c=1)\n",
"fpnssd_net.train()\n",
"\n",
"# print model\n",
"print(fpnssd_net)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Test net\n",
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
"print(loc_preds.size(), cls_preds.size())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optimization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"criterion = SSDLoss(num_classes=num_classes)\n",
"#criterion = FocalLoss(num_classes=num_classes)\n",
"optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.0001, momentum=0.9, weight_decay=1e-4)\n",
"\n",
"# lr policy\n",
"# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
"scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# init logger\n",
"if version_remark == '':\n",
" comment_str = '_{}'.format(model_version)\n",
"else:\n",
" comment_str = '_{}_{}'.format(model_version, version_remark)\n",
"writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Training\n",
"best_loss = float('inf') # best test loss\n",
"best_epoch = 0\n",
"best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
"\n",
"\n",
"def train(epoch):\n",
" fpnssd_net.train()\n",
" train_loss = 0\n",
"\n",
" scheduler.step()\n",
"\n",
" if with_gen_data:\n",
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" optimizer.zero_grad()\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" train_loss += loss.item()\n",
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
" else:\n",
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" optimizer.zero_grad()\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" train_loss += loss.item()\n",
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
"\n",
" # write to logger\n",
" phase = 'train'\n",
" writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
"\n",
"def test(epoch):\n",
" fpnssd_net.eval()\n",
" test_loss = 0\n",
" with torch.no_grad():\n",
"\n",
" if with_gen_data:\n",
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" test_loss += loss.item()\n",
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
" else:\n",
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" test_loss += loss.item()\n",
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
"\n",
" # write to logger\n",
" phase = 'test'\n",
" writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
"\n",
" # deep copy the model\n",
" global best_loss\n",
" global best_epoch\n",
" test_loss /= len(testloader)\n",
" if test_loss < best_loss and epoch > 5:\n",
" # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
" weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
" torch.save(fpnssd_net.state_dict(), weights_path)\n",
" best_epoch = epoch\n",
" best_loss = test_loss"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": [
"for epoch in tqdm(range(num_epochs)):\n",
" print('\\nEpoch: %d' % epoch)\n",
" train(epoch)\n",
" if epoch % 2 == 0:\n",
" print('\\nTest')\n",
" test(epoch)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# choose model filename\n",
"weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
"# Save only the model parameters\n",
"torch.save(fpnssd_net.state_dict(), weights_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@@ -0,0 +1,907 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test and visualize sign detector"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from PIL import Image\n",
"from ast import literal_eval\n",
"\n",
"from tqdm import tqdm\n",
"\n",
"import torch"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"relative_path = '../../'\n",
"# ensure that parent path is on the python path in order to have all packages available\n",
"import sys, os\n",
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
"sys.path.insert(0, parent_path)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tobias/.virtualenvs/pytorch/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.\n",
" utils.DeprecatedIn23,\n"
]
}
],
"source": [
"from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
"from lib.detection.run_gen_ssd_detection import gen_ssd_detections"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"collections = ['test'] # e.g. train test saa06\n",
"only_annotated = True\n",
"only_assigned = True\n",
"\n",
"# store detections for re-use\n",
"save_detections = False\n",
"\n",
"# show detections\n",
"show_detections = True"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model_version = 'v191ft01' \n",
"\n",
"arch_type = 'mobile' # resnet, mobile\n",
"arch_opt = 1\n",
"width_mult = 0.625 # 0.5 0.625 0.75\n",
"\n",
"crop_shape = [600, 600]\n",
"tile_shape = [600, 600]\n",
"\n",
"num_classes = 240\n",
"\n",
"with_64 = False \n",
"create_bg_class = False \n",
"with_4_aspects = False "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='complconf'>Config Completeness</a>\n",
"\n",
"[Jump to results](#results)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"test_min_score_thresh = 0.01 # 0.01 0.05\n",
"test_nms_thresh = 0.5 \n",
"\n",
"eval_ovthresh = 0.5 # 0.4"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Load Model"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FPNSSD(\n",
" (fpn): MobileNetV2FPN(\n",
" (features): Sequential(\n",
" (0): Sequential(\n",
" (0): Conv2d(1, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
" (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" )\n",
" (1): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(20, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(20, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=20, bias=False)\n",
" (4): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (2): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(10, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(60, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=60, bias=False)\n",
" (4): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(60, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=90, bias=False)\n",
" (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(90, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (3): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=90, bias=False)\n",
" (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(90, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (2): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (4): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=120, bias=False)\n",
" (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (2): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (3): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (5): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
" (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(240, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (2): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (6): MobileBlock(\n",
" (mobile_block): Sequential(\n",
" (0): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
" (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(360, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
" (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (2): InvertedResidual(\n",
" (conv): Sequential(\n",
" (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
" (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): ReLU6(inplace)\n",
" (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (7): Sequential(\n",
" (0): Conv2d(100, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (2): ReLU6(inplace)\n",
" )\n",
" (8): AvgPool2d(kernel_size=7, stride=1, padding=0)\n",
" )\n",
" (conv6): Conv2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
" (toplayer): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))\n",
" )\n",
" (loc_head): Sequential(\n",
" (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (1): ReLU(inplace)\n",
" (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (3): ReLU(inplace)\n",
" (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (5): ReLU(inplace)\n",
" (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (7): ReLU(inplace)\n",
" (8): Conv2d(256, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" )\n",
" (cls_head): Sequential(\n",
" (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (1): ReLU(inplace)\n",
" (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (3): ReLU(inplace)\n",
" (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (5): ReLU(inplace)\n",
" (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (7): ReLU(inplace)\n",
" (8): Conv2d(256, 2880, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" )\n",
")\n"
]
}
],
"source": [
"fpnssd_net = get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
" relative_path, num_classes, num_c=1, rnd_init_model=False)\n",
"\n",
"print(fpnssd_net)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(torch.Size([1, 15360, 4]), torch.Size([1, 15360, 240]))\n"
]
}
],
"source": [
"### Test net\n",
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, 1024, 1024).to(device))\n",
"print(loc_preds.size(), cls_preds.size())"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Prepare dataset"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setup dataset spanning 3 collections with 4465 annotations [67 segments, 67 indices]\n"
]
}
],
"source": [
"dataset = CuneiformSegments(collections=collections, relative_path=relative_path, \n",
" only_annotated=only_annotated, only_assigned=only_assigned, preload_segments=False)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Predict"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"train: 5%|▌ | 1/19 [00:00<00:15, 1.20it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334921', 'Obv')\n",
"mAP 0.7926 | global AP: 0.7473 | mAP (align): 0.8859\n",
"total_tp: 22 | total_fp: 17 [46] | acc: 0.56\n",
"('P334921', 'Rev')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 11%|█ | 2/19 [00:01<00:08, 1.98it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.7143 | global AP: 0.7286 | mAP (align): 1.0\n",
"total_tp: 7 | total_fp: 1 [9] | acc: 0.88\n",
"('P334863', 'Obv')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/tobias/Dropbox/NeuralNets/caffe_workspace/pycaffe/cuneiform-sign-detection/lib/evaluations/sign_evaluator.py:184: RuntimeWarning: invalid value encountered in divide\n",
" return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)\n",
"\r",
"train: 16%|█▌ | 3/19 [00:01<00:06, 2.43it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.0 | global AP: 0.0 | mAP (align): nan\n",
"total_tp: 0 | total_fp: 0 [2] | acc: nan\n",
"('P334831', 'Rev')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 21%|██ | 4/19 [00:01<00:05, 2.57it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.8409 | global AP: 0.7323 | mAP (align): 0.881\n",
"total_tp: 27 | total_fp: 38 [74] | acc: 0.42\n",
"('P334831', 'Obv')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 26%|██▋ | 5/19 [00:01<00:05, 2.60it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.9274 | global AP: 0.8946 | mAP (align): 0.9518\n",
"total_tp: 59 | total_fp: 55 [97] | acc: 0.52\n",
"('P334892', 'Rev')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 32%|███▏ | 6/19 [00:02<00:04, 2.73it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.9236 | global AP: 0.8063 | mAP (align): 0.9236\n",
"total_tp: 18 | total_fp: 10 [28] | acc: 0.64\n",
"('P334892', 'Obv')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 37%|███▋ | 7/19 [00:02<00:04, 2.91it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.9667 | global AP: 0.9474 | mAP (align): 0.9667\n",
"total_tp: 18 | total_fp: 8 [27] | acc: 0.69\n",
"('P336635', 'Obv')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 42%|████▏ | 8/19 [00:02<00:03, 2.94it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.9061 | global AP: 0.7393 | mAP (align): 0.9061\n",
"total_tp: 13 | total_fp: 8 [35] | acc: 0.62\n",
"('P334865', 'Obv')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 47%|████▋ | 9/19 [00:03<00:03, 2.93it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.9157 | global AP: 0.887 | mAP (align): 0.9443\n",
"total_tp: 40 | total_fp: 34 [52] | acc: 0.54\n",
"('P334865', 'Rev')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"train: 58%|█████▊ | 11/19 [00:03<00:02, 3.14it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"mAP 0.8139 | global AP: 0.8114 | mAP (align): 0.8879\n",
"total_tp: 49 | total_fp: 47 [71] | acc: 0.51\n",
"('P334842', 'Obv')\n",
"mAP 0.0 | global AP: 0.0 | mAP (align): 0.0\n",
"total_tp: 0 | total_fp: 0 [0] | acc: 0.0\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 63%|██████▎ | 12/19 [00:03<00:02, 3.16it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334848', 'Obv')\n",
"mAP 0.9074 | global AP: 0.9346 | mAP (align): 0.9074\n",
"total_tp: 22 | total_fp: 14 [25] | acc: 0.61\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 68%|██████▊ | 13/19 [00:04<00:01, 3.23it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334848', 'Rev')\n",
"mAP 0.9375 | global AP: 0.8014 | mAP (align): 1.0\n",
"total_tp: 16 | total_fp: 4 [38] | acc: 0.8\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 74%|███████▎ | 14/19 [00:04<00:01, 3.23it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334839', 'Obv')\n",
"mAP 0.9333 | global AP: 0.8562 | mAP (align): 0.9956\n",
"total_tp: 22 | total_fp: 17 [52] | acc: 0.56\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 79%|███████▉ | 15/19 [00:04<00:01, 3.24it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334896', 'Obv')\n",
"mAP 0.9375 | global AP: 0.8586 | mAP (align): 0.9375\n",
"total_tp: 26 | total_fp: 20 [36] | acc: 0.57\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 84%|████████▍ | 16/19 [00:04<00:00, 3.21it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334836', 'Rev')\n",
"mAP 1.0 | global AP: 0.9705 | mAP (align): 1.0\n",
"total_tp: 45 | total_fp: 24 [65] | acc: 0.65\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 89%|████████▉ | 17/19 [00:05<00:00, 3.17it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334894', 'Rev')\n",
"mAP 1.0 | global AP: 0.8218 | mAP (align): 1.0\n",
"total_tp: 15 | total_fp: 9 [39] | acc: 0.62\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 95%|█████████▍| 18/19 [00:05<00:00, 3.14it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P334894', 'Obv')\n",
"mAP 0.8727 | global AP: 0.8457 | mAP (align): 0.8727\n",
"total_tp: 41 | total_fp: 39 [62] | acc: 0.51\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"train: 100%|██████████| 19/19 [00:06<00:00, 3.14it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"('P336178', 'Obv')\n",
"mAP 0.9375 | global AP: 0.8319 | mAP (align): 0.9375\n",
"total_tp: 31 | total_fp: 20 [34] | acc: 0.61\n",
"train | v191ft01\n",
"RESULTS ON FULL COLLECTION :\n",
"mAP 0.7739 | global AP: 0.7816 | mAP (align): 0.7958\n",
"total_tp: 471 | total_fp: 690 [792] | prec: 0.406\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"# filter collection dataset - OPTIONAL\n",
"didx_list = range(len(dataset))\n",
"didx_list = didx_list[:19] #19\n",
"\n",
"### Generate ssd detections\n",
"(list_seg_ap, \n",
" list_seg_name_with_anno) = gen_ssd_detections(didx_list, dataset, collections[0], relative_path, \n",
" model_version, fpnssd_net, with_64, create_bg_class, device,\n",
" test_min_score_thresh, test_nms_thresh, eval_ovthresh,\n",
" save_detections, show_detections, with_4_aspects=with_4_aspects, \n",
" verbose_mode=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='results'>Results</a>\n",
"\n",
"[Jump to completeness config](#complconf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Jump to Results](#results)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Diff do arquivo suprimido porque uma ou mais linhas são muito longas
@@ -0,0 +1,634 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train sign detector network"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from PIL import Image\n",
"from ast import literal_eval\n",
"import os.path\n",
"from tqdm import tqdm\n",
"import copy\n",
"\n",
"import torch\n",
"import torch.optim as optim\n",
"from torch.optim import lr_scheduler\n",
"import torch.utils.data as data\n",
"\n",
"from torchvision import transforms as trafos\n",
"import torchvision.transforms as transforms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# for auto-reloading external modules\n",
"# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"relative_path = '../../'\n",
"# ensure that parent path is on the python path in order to have all packages available\n",
"import sys, os\n",
"parent_path = os.path.join(os.getcwd(), relative_path)\n",
"parent_path = os.path.realpath(parent_path) # os.path.abspath(...)\n",
"sys.path.insert(0, parent_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
"\n",
"from lib.alignment.LineFragment import plot_boxes\n",
"from lib.utils.pytorch_utils import get_tensorboard_writer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
"from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
"from lib.models.trained_model_loader import get_fpn_ssd_net\n",
"from lib.utils.torchcv.models.net import FPNSSD\n",
"from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import time\n",
"hh = 0.001\n",
"## time.sleep(60*60*hh)\n",
"for i in tqdm(range(int(6*60*hh))):\n",
" time.sleep(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Config Basics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model_version = 'v001'\n",
"\n",
"# config pretrained classifier\n",
"pretrained_model_version = 'v001' #'v239' \n",
"\n",
"# config datasets for training and testing\n",
"train_collections = ['train_E'] \n",
"test_collections = ['test_full']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# config generated data\n",
"with_gen_data = True\n",
"\n",
"gen_model_version = 'v001' #'v171_hp04' \n",
"\n",
"gen_folder = 'results_ssd/{}/'.format(gen_model_version) \n",
"gen_file_path = None\n",
"\n",
"gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
"#gen_collections = ['saa01', 'saa05']\n",
"gen_collections += ['train']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# config backbone architecture\n",
"arch_opt = 1\n",
"arch_type = 'mobile'\n",
"width_mult = 0.625\n",
"\n",
"# config detector\n",
"with_64 = False\n",
"create_bg_class = False\n",
"img_size = 512\n",
"num_classes = 240\n",
"\n",
"# config schedule\n",
"num_epochs = 51 \n",
"lr_milestones = [60]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# set log file name\n",
"if with_gen_data:\n",
" version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
"else:\n",
" version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
" version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparing Datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
" from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
" from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
" from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
"else:\n",
" from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
" from lib.utils.torchcv.transforms.resize import resize\n",
" from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
" from lib.utils.torchcv.transforms.pad_gs import pad"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
"print('num_anchors', len(box_coder.anchor_boxes))\n",
"print('anchor areas', np.sqrt(box_coder.anchor_areas))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data: \n",
" def transform_train(img, boxes, labels, linemap):\n",
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
" transforms.Lambda(lambda x: x) # identity\n",
" ])(img) \n",
" img, linemap = pad_lm(img, linemap, (600, 600))\n",
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
"\n",
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
"else:\n",
" def transform_train(img, boxes, labels):\n",
" # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
" img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
" transforms.Lambda(lambda x: x) # identity\n",
" ])(img) \n",
" img = pad(img, (600, 600))\n",
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5], std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels)\n",
" return img, boxes, labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
" gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
" relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
" remove_empty_tiles=False, min_align_ratio=0.2)\n",
"else:\n",
" trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
" gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" def transform_test(img, boxes, labels, linemap):\n",
" img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
" img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
" return img, boxes, labels, transforms.ToTensor()(linemap)\n",
"else:\n",
" def transform_test(img, boxes, labels):\n",
" img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
" img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
" img = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.5],std=[1.0])\n",
" ])(img)\n",
" boxes, labels = box_coder.encode(boxes, labels)\n",
" return img, boxes, labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"if with_gen_data:\n",
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
" gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
"else:\n",
" testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
" gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
"testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# load classifier model\n",
"basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=1, arch_opt=arch_opt)\n",
"\n",
"# load pretrained weights\n",
"weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, pretrained_model_version)\n",
"basic_net.load_state_dict(torch.load(weights_path)) # , strict=False\n",
"basic_net = basic_net.to(device)\n",
"\n",
"# load FPN model with classifier model\n",
"fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64).to(device)\n",
"\n",
"# load full detector net\n",
"fpnssd_net = FPNSSD(fpn_net, num_classes=num_classes).to(device)\n",
"fpnssd_net.train()\n",
"\n",
"# print model\n",
"print(fpnssd_net)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Test net\n",
"loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
"print(loc_preds.size(), cls_preds.size())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optimization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"criterion = SSDLoss(num_classes=num_classes)\n",
"#criterion = FocalLoss(num_classes=num_classes)\n",
"optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)\n",
"\n",
"# lr policy\n",
"# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
"scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# init logger\n",
"if version_remark == '':\n",
" comment_str = '_{}'.format(model_version)\n",
"else:\n",
" comment_str = '_{}_{}'.format(model_version, version_remark)\n",
"writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Training\n",
"best_loss = float('inf') # best test loss\n",
"best_epoch = 0\n",
"best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
"\n",
"\n",
"def train(epoch):\n",
" fpnssd_net.train()\n",
" train_loss = 0\n",
"\n",
" scheduler.step()\n",
"\n",
" if with_gen_data:\n",
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" optimizer.zero_grad()\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" train_loss += loss.item()\n",
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
" else:\n",
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" optimizer.zero_grad()\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" train_loss += loss.item()\n",
" print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
"\n",
" # write to logger\n",
" phase = 'train'\n",
" writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
"\n",
"def test(epoch):\n",
" fpnssd_net.eval()\n",
" test_loss = 0\n",
" with torch.no_grad():\n",
"\n",
" if with_gen_data:\n",
" for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" test_loss += loss.item()\n",
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
" else:\n",
" for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
" inputs = inputs.to(device)\n",
" loc_targets = loc_targets.to(device)\n",
" cls_targets = cls_targets.to(device)\n",
"\n",
" loc_preds, cls_preds = fpnssd_net(inputs)\n",
" loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
" test_loss += loss.item()\n",
" print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
" % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
"\n",
" # write to logger\n",
" phase = 'test'\n",
" writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
"\n",
" # deep copy the model\n",
" global best_loss\n",
" global best_epoch\n",
" test_loss /= len(testloader)\n",
" if test_loss < best_loss and epoch > 5:\n",
" # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
" weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
" torch.save(fpnssd_net.state_dict(), weights_path)\n",
" best_epoch = epoch\n",
" best_loss = test_loss"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": [
"for epoch in tqdm(range(num_epochs)):\n",
" print('\\nEpoch: %d' % epoch)\n",
" train(epoch)\n",
" if epoch % 2 == 0:\n",
" print('\\nTest')\n",
" test(epoch)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# choose model filename\n",
"weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
"# Save only the model parameters\n",
"torch.save(fpnssd_net.state_dict(), weights_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
+76
Ver Arquivo
@@ -0,0 +1,76 @@
### Compile and intall OpenGM with python wrapper in virtualenv
#### Online references
- conda install guide:
- https://groups.google.com/forum/#!searchin/opengm/nose%7Csort:relevance/opengm/Nte5Zpu9RL0/YSanK09kNwAJ
- plain ubuntu install guides:
- http://cvlab-dresden.de/HTML/people/bogdan/teaching/slides-script/ml2-ss15/installation-readme.txt
- https://memoryaux.wordpress.com/2014/08/15/installing-opengm-with-python-wrapper/
#### Instructions (tested for Ubuntu 14.04)
clone source using
`git clone https://github.com/opengm/opengm.git`
make build dir under opengm/
`makedir build/`
and enter build/
`cd build/`
using ccmake and try to configure with 'c'
`ccmake ../`
run ccmake again and select options
`ccmake ../`
build:
- command line ?
- converter ?
- docs ?
- examples ? (requires external lib like cplex)
- python docs ? (requires pip install sphinx and produces ugly outputs)
- python wrapper
- testing
- tutorials
with:
- boost
- hdf5
python:
- python exectuable: /home/USER/.virtualenvs/VNAME/bin
- include dir: /home/USER/.virtualenvs/VNAME/include
- include dir2: /home/USER/.virtualenvs/VNAME/include/python2.7
- library: /usr/lib/x86_64-linux-gnu/libpython2.7.so
(alternative is /home/USER/.virtualenvs/VNAME/lib/python2.7, but no *.so file here)
- library debug: PYTHON_LIBRARY_DEBUG-NOTFOUND (default)
- numpy include directory: /home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/numpy/core/include
*for some unkown reason* opengm python site-package is installed under `/usr/local/lib/python0./`
therefore, better to skip make install and simply copy files by hand (see below)
To build run (-j only if multicore system):
```
make -j4
make -j2 test
make install
```
simply copy it to `/home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/`
now test in python:
`import opengm`
Hopefully things work :)
Ver Arquivo
Diferenças do arquivo suprimidas por serem muito extensas Carregar Diff
+528
Ver Arquivo
@@ -0,0 +1,528 @@
from scipy.spatial.distance import cdist, seuclidean, euclidean, squareform
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from timeit import default_timer as timer
import opengm
from ..utils.bbox_utils import bb_intersection_over_union
class LineMatching1D(object):
def __init__(self, tl_line_rec, region_det, line_rec, line_pts, stats, scale=1.0, sign_hypos=None, param_dict=None):
# create graphical model from fragement
self.stats = stats
self.scale = scale
self.scaled_sign_height = stats.tblSignHeight * scale
self.min_sign_dist = self.scaled_sign_height / 2. # distance between sign centers
self.tl_line_rec = tl_line_rec
# null hypothesis for signs in tl
self.sign_hypos = sign_hypos
# detections contained in rectangluar area around respective alignments
# [ID, cx, cy, score, x1, y1, x2, y2, idx]
self.region_det = region_det
# init
self.num_vars = len(self.tl_line_rec)
self.num_relevant = 0
self.max_cost = 1e10 # 1e11 # "inifinite" cost
# only continue, if there is a sign in line to match
if self.num_vars > 0:
# compute num_lbls_per_var from detections
ulbls, counts = np.unique(self.region_det[:, 0], return_counts=True)
hypo_det_counts = np.array([counts[ulbls == item] if item in ulbls else 0 for item in self.tl_line_rec.lbl],
dtype=int).squeeze()
self.tl_line_rec['det_count'] = hypo_det_counts
# optional: remove vars without detections
if False:
# self.tl_line_rec = self.tl_line_rec[hypo_det_counts > 0]
self.tl_line_rec = self.tl_line_rec.iloc[np.where(hypo_det_counts > 0)] # deal with scalar case of boolean indexing
# hypo_det_counts = hypo_det_counts[hypo_det_counts > 0]
# only continue, at least a single matching detection
self.num_relevant = np.sum(counts[np.isin(ulbls, self.tl_line_rec.lbl)])
if self.num_relevant > 0:
# update num_vars
self.num_vars = len(self.tl_line_rec)
# opengm setup
self.num_lbls_per_var = max(counts[np.isin(ulbls, self.tl_line_rec.lbl)]) + 1 # + 1 outlier detection
var_space = np.ones(self.num_vars) * self.num_lbls_per_var
self.gm = opengm.gm(var_space)
# parameter setup
if param_dict is not None:
self.params = param_dict
else:
self.params = dict()
# extra settings
self.params['outlier_cost'] = 10
self.params['angle_long_range'] = True
# unary potentials
self.params['lambda_score'] = 0.3
self.params['sigma_score'] = 0.4
self.params['lambda_offset'] = 1 # currently offset used linearly without exp function
self.params['sigma_offset'] = 1 # lambda & sigma have no influence!
# pairwise binary potentials
self.params['lambda_p'] = 3 # 1
self.params['sigma_p'] = 3
self.params['lambda_angle'] = 2
self.params['sigma_angle'] = 0.6
self.params['lambda_iou'] = 2
self.params['sigma_iou'] = 0.4
# OPTIONAL: strong penalties for long range connections
if True:
self.params['lr_lambda_angle'] = 0.05
self.params['lr_sigma_angle'] = 0.1
self.params['lr_lambda_iou'] = 0.1
self.params['lr_sigma_iou'] = 0.05
else:
self.params['lr_lambda_angle'] = self.params['lambda_angle']
self.params['lr_sigma_angle'] = self.params['sigma_angle']
self.params['lr_lambda_iou'] = self.params['lambda_iou']
self.params['lr_sigma_iou'] = self.params['sigma_iou']
# angle of hypothesis line
self.b = line_pts[-1, :] - line_pts[0, :]
# print 'hypo angle:', np.arctan2(self.b[1], self.b[0]) * (180 / np.pi), self.b
# offset
self.Xb = line_pts[0, :].reshape(1, -1)
# define variance between line distance and sign distance - for seculidean and mahalanobis
self.variance_p = np.array([1, 0.2], dtype=np.float) # [8, 1] [1, 1]
if False:
print('#syms:', len(self.tl_line_rec), 'max#dets_per_sym:', self.num_lbls_per_var - 1,
'relevant#dets:', self.num_relevant, 'total#dets:', self.region_det.shape[0])
# print(np.vstack([self.fm_hypo_df.lbl, hypo_det_counts])).astype(int)
# print self.fm_hypo_df
# assemble potentials
self.add_unary()
self.add_pairwise()
def add_unary(self):
# for monitoring costs
self.unary_score_ct = {}
self.unary_offset_ct = {}
self.unary_det_ct = {}
# compute for later usage by alignment vector
self.det_same_label_ct_idx = []
# assemble unary potentials
for vidx, (tl_sign_idx, fm_sign) in enumerate(self.tl_line_rec.iterrows()):
lbl = int(fm_sign.lbl)
# ctr = [float(fm_sign.ctr_l), float(fm_sign.ctr_r)]
# print vidx, lbl, ctr
# get boxes of certain lbl
det_same_label = self.region_det[self.region_det[:, 0] == lbl]
self.det_same_label_ct_idx.append(det_same_label[:, -1])
# detection locations
Xa = det_same_label[:, [1, 2]]
# incorporate score
unary_vec = np.ones(self.num_lbls_per_var) * self.max_cost
U1, U2, U3 = [], [], []
if det_same_label.shape[0] > 0:
# compute partial cost (vectorized)
U1 = 1 - det_same_label[:, 3]
# since goal of matching is to incorporate low confidence detections,
# linear contribution of score might be enough / otherwise penalize only low confidence below 0.01
if True:
U1 = self.params['lambda_score'] * (np.exp(U1 / self.params['sigma_score']) - 1)
# incorporate distance from hypothesis line
# cx, cy
U2 = np.zeros(len(U1))
if False: # disabled in favour of null hypo offset
if self.params['lambda_offset'] != 0:
U2 = cdist(Xa, self.Xb, lambda u, v: np.linalg.norm(np.cross(u - v, self.b))
/ np.linalg.norm(self.b)) / self.min_sign_dist
# compute partial cost
U2 = self.params['lambda_offset'] * (np.exp(U2.squeeze() / self.params['sigma_offset']) - 1)
# incorporate null hypothesis of signs
U3 = np.zeros(len(U1))
if self.params['lambda_offset'] != 0 and self.sign_hypos is not None:
# sign hypo location
X0 = self.sign_hypos[vidx, 0:2].reshape(1, -1)
# get sign width and set variance
if lbl in self.stats.sign_df.index:
sign_width = self.stats.get_sign_width(lbl)
else:
sign_width = 1
var = np.array([sign_width * 1, 1], dtype=np.float)
# compute pairwise distance
U3 = cdist(X0, Xa, metric='seuclidean', V=var) / self.min_sign_dist
# U3 = self.params['lambda_offset'] * (np.exp(U3.squeeze() / self.params['sigma_offset']) - 1)
# U3 = np.clip(U3, 0, 1e-5 * self.max_cost)
# sum up cost and insert into unary vector (only replace values if there a detections
unary_vec[:len(U1)] = U1 + U2 + U3
# for outlier detection set specific unary cost
unary_vec[-1] = self.params['outlier_cost']
# add function and factor
func_id = self.gm.addFunction(unary_vec)
self.gm.addFactor(func_id, vidx)
# for debugging
# self.unary_score_ct.append(U1)
# self.unary_offset_ct.append(U3)
# self.unary_det_ct.append(det_same_label)
self.unary_score_ct[vidx] = U1
self.unary_offset_ct[vidx] = U3
self.unary_det_ct[vidx] = det_same_label
def add_pairwise(self):
# assemble pairwise potentials
# Assumption: vars are in order of symbols in line
# ATTENTION: ORDER of fm_hypo_lbls is important for pairwise potential generation!!!
self.pairwise_dist_ct = {}
self.pairwise_angle_ct = {}
self.pairwise_iou_ct = {}
self.pairwise_long_range = {}
for vidx in range(self.num_vars - 1):
# setup basic matrix with maximum cost
dist_mat = np.ones([self.num_lbls_per_var] * 2) * self.max_cost
sym_lt = self.tl_line_rec.lbl.iat[vidx]
sym_rt = self.tl_line_rec.lbl.iat[vidx + 1]
# get boxes according to labels
# [ID, cx, cy, score, x1, y1, x2, y2]
det_sym_lt = self.region_det[self.region_det[:, 0] == sym_lt]
det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
# x2, cy
# sym_lt_right_border = det_sym_lt[:,[6,2]]
# x1, cy
# sym_rt_left_border = det_sym_rt[:,[4,2]]
# cx, cy
sym_lt_right_border = det_sym_lt[:, [1, 2]]
sym_rt_left_border = det_sym_rt[:, [1, 2]]
# bboxes
sym_lt_bboxes = det_sym_lt[:, 4:]
sym_rt_bboxes = det_sym_rt[:, 4:]
# compute pairwise distances between detections of lt and rt sym
# 1) basic computation
# X = cdist(sym_lt_right_border, sym_rt_left_border, metric='euclidean')
X = cdist(sym_lt_right_border, sym_rt_left_border, metric='seuclidean', V=self.variance_p)
# because vertical offset always depends on underlying rotation, mahalanobis should be used here
# X = cdist(sym_lt_right_border, sym_rt_left_border, metric='mahalanobis', VI=self.VI)
# reduce distances to normal scale and normalize with 10 * times sign_height
X = ((X/self.scaled_sign_height) - 1)
inX = X.copy()
# compute partial cost
X = self.params['lambda_p'] * (np.exp(X / self.params['sigma_p']) - 1)
# 2) penalty for wrong side
# if on wrong side, increase cost by factor 4 [is deprecated due to angle computation!!!]
#X2 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: u[0] > v[0])
#X[X2.astype(bool)] *= 5
# 3) penalize distance only in x-dimension
# X8 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: v[0] - u[0])
# X8 = self.params['lambda_p'] * np.exp((self.min_sign_dist - X8) / self.params['sigma_p'])
# incorporate angle
# angle with x-axis: np.arctan((u[1]-v[1])/(u[0]-v[0]))
# b=np.array([1,0])
# angle between vectors less stable: acos(dot(v1, v2) / (norm(v1) * norm(v2)))
# X3 = cdist(sym_lt_right_border, sym_rt_left_border,
# lambda u,v: np.arccos(np.dot(v-u,b) / (np.linalg.norm(v-u) * np.linalg.norm(b))))/pi
# angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
X3 = cdist(sym_lt_right_border, sym_rt_left_border,
lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)), np.dot(v - u, self.b))) / np.pi
inX3 = X3.copy()
# compute partial cost
X3 = self.params['lambda_angle'] * (np.exp(X3 / self.params['sigma_angle']) - 1)
# incorporate IoU
X4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
lambda u, v: bb_intersection_over_union(u, v))
inX4 = X4.copy()
# compute partial cost
X4 = self.params['lambda_iou'] * (np.exp(X4 / self.params['sigma_iou']) - 1)
# sum up cost and insert into dist_mat
dist_mat[:X.shape[0], :X.shape[1]] = X + X3 + X4
# for outlier class set pairwise cost to 0
dist_mat[-1, :] = 0
dist_mat[:, -1] = 0
# avoid identity solutions
if sym_lt == sym_rt:
np.fill_diagonal(dist_mat, self.max_cost)
# add function and factor
func_id = self.gm.addFunction(dist_mat)
self.gm.addFactor(func_id, [vidx, vidx + 1])
# for debugging
# self.pairwise_dist_ct[(vidx, vidx + 1)] = inX
# self.pairwise_angle_ct[(vidx, vidx + 1)] = inX3
# self.pairwise_iou_ct[(vidx, vidx + 1)] = inX4
self.pairwise_dist_ct[(vidx, vidx + 1)] = X
self.pairwise_angle_ct[(vidx, vidx + 1)] = X3
self.pairwise_iou_ct[(vidx, vidx + 1)] = X4
# in the case of angles add pairwise potentials for all possible combinations
if self.params['angle_long_range']:
# add combinations on the right of var
# not necessary to add combinations on the left of var due to symmetry
for vidx_rt in range(vidx + 2, self.num_vars):
sym_rt = self.tl_line_rec.lbl.iat[vidx_rt]
# detections
det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
# cx, cy
sym_rt_left_border = det_sym_rt[:, [1, 2]]
# bboxes
sym_rt_bboxes = det_sym_rt[:, 4:]
# incorporate angle
# angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
XY3 = cdist(sym_lt_right_border, sym_rt_left_border,
lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)),
np.dot(v - u, self.b))) / np.pi
# compute partial cost
XY3 = self.params['lr_lambda_angle'] * (np.exp(XY3 / self.params['lr_sigma_angle']) - 1)
# incorporate iou
XY4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
lambda u, v: bb_intersection_over_union(u, v))
# compute partial cost
XY4 = self.params['lr_lambda_iou'] * (np.exp(XY4 / self.params['lr_sigma_iou']) - 1)
# sum up cost and insert into dist_mat
dist_mat[:XY3.shape[0], :XY3.shape[1]] = XY3 + XY4
# for outlier class set pairwise cost to 0
dist_mat[-1, :] = 0
dist_mat[:, -1] = 0
# avoid identity solutions
if sym_lt == sym_rt:
np.fill_diagonal(dist_mat, self.max_cost)
# add function and factor
func_id = self.gm.addFunction(dist_mat)
self.gm.addFactor(func_id, [vidx, vidx_rt])
# for debugging
self.pairwise_long_range[(vidx, vidx_rt)] = XY3 + XY4 # XY3, XY4, XY3 + XY4
def run_inference(self):
# only continue, if there is a sign/detection in line to match
if len(self.tl_line_rec) > 0 and self.num_relevant > 0:
if False:
# basic belief propagation (slower)
bfprop = opengm.inference.BeliefPropagation(gm=self.gm)
if True:
# TRWS: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/pyTrws.cxx
# default params: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/param/trws_external_param.hxx
parameter = opengm.InfParam(steps=200)
bfprop = opengm.inference.TrwsExternal(gm=self.gm, accumulator='minimizer', parameter=parameter)
#start = timer()
bfprop.infer()
#run_time = timer() - start
#print('{}'.format(run_time))
# get and save labeling
self.labeling = bfprop.arg()
self.tl_line_rec['lbl_arg'] = bfprop.arg()
# get raw energy and check if inference failed
self.raw_energy = self.gm.evaluate(bfprop.arg())
self.inference_failed = self.raw_energy > self.num_vars * self.params['outlier_cost']
# get energy, normalize by num_vars * outlier_cost
# worst case should be outliers only
max_line_cost = self.num_vars * self.params['outlier_cost']
# clip energy, because inference sometimes fails !?
self.energy = min(self.raw_energy, max_line_cost) / float(max_line_cost)
# attributes cost to individual assignments (selected detections) and normalize using outlier_cost
self.tl_line_rec['nE'] = np.around(self.compute_labeling_energy() / self.params['outlier_cost'], decimals=2)
if self.inference_failed:
# all outlier
self.tl_line_rec['aligned_det_idx'] = -1
self.tl_line_rec['region_det_idx'] = -1
else:
# compute actual alignments with respect to original detections indices
self._compute_global_alignments()
# compute alignments with respect to region detection indices
self._compute_region_alignments()
def _compute_global_alignments(self):
alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
for i, lbl in enumerate(self.labeling):
if lbl != (self.num_lbls_per_var - 1) and len(self.det_same_label_ct_idx[i]) > 0:
alignments[i] = self.det_same_label_ct_idx[i][lbl]
else:
# outlier
alignments[i] = -1
# set values in dataframe
self.tl_line_rec['aligned_det_idx'] = alignments.astype(int)
def _compute_region_alignments(self):
alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
for ii, global_det_idx in enumerate(self.tl_line_rec.aligned_det_idx.values):
if global_det_idx != -1:
# map global to region detection index
alignments[ii] = np.where(self.region_det[:, -1] == global_det_idx)[0]
else:
# outlier detection
alignments[ii] = -1
# set values in dataframe
self.tl_line_rec['region_det_idx'] = alignments.astype(int)
def get_region_alignments(self):
# maybe I should also return the self.tl_line_rec.index
# problem arises if self.tl_line_rec is changed inside LineMatching1D
if 'region_det_idx' in self.tl_line_rec.columns:
return self.tl_line_rec.region_det_idx.values
else:
return []
def visualize_matching(self, input_im, sign_hypos, ax=None):
# only continue, if there is a sign in line to match
if len(self.tl_line_rec) > 0:
# select detections using alignment index
alignments = self.get_region_alignments()
aligned = self.region_det[alignments[alignments >= 0], 1:3]
if ax is None:
fig, ax = plt.subplots(figsize=(12, 8))
# plot hypo
ax.plot(sign_hypos[:, 0], sign_hypos[:, 1], '*b', markersize=10, label='null hypo')
ax.plot(aligned[:, 0], aligned[:, 1], 'oy', markersize=8, label='gm aligned detections')
# plot tablet
ax.imshow(input_im, cmap=plt.cm.Greys_r)
# annotate
for i, pos_idx in enumerate(self.tl_line_rec.iloc[alignments >= 0].pos_idx.values):
ax.annotate(pos_idx, (aligned[i, 0], aligned[i, 1]), fontsize=15)
ax.legend(shadow=True, fancybox=True)
ax.axis('off')
# plt.show()
# energy marginal computation
def _get_unary_cost(self, unary_dict, vidx, didx):
unary = unary_dict[vidx]
if len(unary) > 0:
return unary.flatten()[didx]
else:
# in cases there inference fails and labeling is out of bounds
return self.max_cost
def _get_pairwise_val(self, pairwise_dict, idx0, idx1):
outlier_lbl = self.num_lbls_per_var - 1
pairwise = pairwise_dict[idx0, idx1]
didx0 = self.labeling[idx0]
didx1 = self.labeling[idx1]
if didx0 != outlier_lbl and didx1 != outlier_lbl:
if pairwise.size > 0:
return pairwise[didx0, didx1]
else:
# in cases there inference fails and labeling is out of bounds
return self.max_cost
else:
return 0
def _get_pairwise_cost(self, pairwise_dict, vidx):
# deal with boundary cases
if vidx == self.num_vars - 1:
return self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
elif vidx == 0:
return self._get_pairwise_val(pairwise_dict, vidx, vidx + 1)
else:
return (self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
+ self._get_pairwise_val(pairwise_dict, vidx, vidx + 1))
def _get_lr_pairwise_cost(self, lr_pairwise_dict, vidx):
energy = 0
for vidx_rt in range(vidx + 2, self.num_vars):
energy += self._get_pairwise_val(lr_pairwise_dict, vidx, vidx_rt)
return energy
def compute_unary_cost(self):
list_unary = [self.unary_score_ct, self.unary_offset_ct]
outlier_lbl = self.num_lbls_per_var - 1
u_marginals = np.zeros_like(self.labeling, dtype=np.float)
for vidx, didx in enumerate(self.labeling):
if didx != outlier_lbl:
for unary_dict in list_unary:
u_marginals[vidx] += self._get_unary_cost(unary_dict, vidx, didx)
else:
u_marginals[vidx] += self.params['outlier_cost']
return u_marginals
def compute_pairwise_cost(self):
list_pairwise = [self.pairwise_angle_ct, self.pairwise_dist_ct, self.pairwise_iou_ct]
p_marginals = np.zeros_like(self.labeling, dtype=np.float)
if len(self.labeling) > 1: # only compute if there are any pairs
for vidx, dvidx in enumerate(self.labeling):
for pairwise_dict in list_pairwise:
p_marginals[vidx] += self._get_pairwise_cost(pairwise_dict, vidx)
return p_marginals
def compute_pairwise_cost_lr(self):
lr_pairwise_dict = self.pairwise_long_range
p_marginals = np.zeros_like(self.labeling, dtype=np.float)
if len(self.labeling) > 1: # only compute if there are any pairs
for vidx, dvidx in enumerate(self.labeling):
p_marginals[vidx] += self._get_lr_pairwise_cost(lr_pairwise_dict, vidx)
return p_marginals
def compute_labeling_energy(self):
# compute an energy vector that attributes cost to individual labels
# if the output vector summed up, this equals the un-normalized energy
# deal with case when inference failed
if self.inference_failed:
return self.max_cost
else:
u_marginals = self.compute_unary_cost()
p_marginals = self.compute_pairwise_cost()
plr_marginals = self.compute_pairwise_cost_lr()
return u_marginals + (p_marginals/2. + plr_marginals)
Ver Arquivo
+464
Ver Arquivo
@@ -0,0 +1,464 @@
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
from operator import itemgetter
from scipy.stats import norm
from scipy import ndimage as ndi
from scipy.spatial.distance import pdist, cdist, squareform
from LineFragment import LineFragment
# LINES - TRANSLITERATION ALIGNMENT PROBLEM
# associate lines with transliteration lines
# OPTION 0) use line_models sorted by dist as basic alignment
def align_lines_tl_by_sort(line_hypos, tl_df):
# use tl_line_indices
tl_line_indices = tl_df.line_idx.unique()
# extend or cut if too short or long respectively
diff_len = line_hypos.label.nunique() - len(tl_line_indices)
if diff_len > 0:
last_idx = tl_line_indices[-1] + 1
tl_line_indices = np.concatenate([tl_line_indices, range(last_idx, last_idx + diff_len)])
else:
tl_line_indices = tl_line_indices[:line_hypos.label.nunique()]
# print tl_line_indices, line_hypos.groupby('label').mean().sort_values('dist').index
# find basic alignment by sorting (enumerate line models sorted according to dist)
tl_line_assignment = pd.DataFrame({'tl_line': tl_line_indices, # np.arange(line_hypos.label.nunique())
'hypo_line_lbl': line_hypos.groupby('label').mean().sort_values('dist').index})
# add tl_line column in line_hypos using join on line_hypos
return line_hypos.join(tl_line_assignment.set_index('hypo_line_lbl'), on='label')
# OPTION 1) use ground truth annotations as alignment
# use gt line annotations (implicit update tl_line column in line_hypos)
# (unreliable, because gt line annotations and transliteration are not necessarily aligned themselves!)
def align_lines_tl_by_ground_truth(line_hypos, tl_df):
# update tl_line with gt_line_idx (set nan to -1)
#line_hypos['tl_line'] = line_hypos['gt_line_idx'].fillna(-1)
line_hypos = line_hypos.assign(tl_line=line_hypos['gt_line_idx'].fillna(-1))
# if there are more gt_lines than tl_lines ...
# gt_line_idx that are not in tl are replaced with -1
not_tl_line_idx = ~line_hypos['tl_line'].isin(tl_df.line_idx.unique())
line_hypos.loc[not_tl_line_idx, 'tl_line'] = -1
return line_hypos
# OPTION 2) adopted from GALE-CHURCH algorithm for sentence alignment
# relies on line lengths only
norm_logsf = norm.logsf
LOG2 = math.log(2)
AVERAGE_CHARACTERS = 1
VARIANCE_CHARACTERS = 6.8
BEAD_COSTS = {(1, 1): 0, (2, 1): 1000, # (2, 1): 230
(1, 2): 1000, (0, 1): 230,
(1, 0): 230, (2, 2): 2000} # (1, 0): 450
# BEAD_COSTS = {(1, 1): 0, (2, 1): 230, (1, 2): 230, (0, 1): 450,
# (1, 0): 450, (2, 2): 440}
def length_cost(sx, sy, mean_xy, variance_xy):
"""
Code from https://github.com/alvations/gachalign:
Calculate length cost given 2 sentence. Lower cost = higher prob.
The original Gale-Church (1993:pp. 81) paper considers l2/l1 = 1 hence:
delta = (l2-l1*c)/math.sqrt(l1*s2)
If l2/l1 != 1 then the following should be considered:
delta = (l2-l1*c)/math.sqrt((l1+l2*c)/2 * s2)
substituting c = 1 and c = l2/l1, gives the original cost function.
"""
lx, ly = sum(sx), sum(sy)
m = (lx + ly * mean_xy) / 2
try:
delta = (lx - ly * mean_xy) / math.sqrt(m * variance_xy)
except ZeroDivisionError:
return float('-inf')
return - 100 * (LOG2 + norm_logsf(abs(delta)))
def _align(x, y, mean_xy, variance_xy, bead_costs):
"""
The minimization function to choose the sentence pair with
cheapest alignment cost.
"""
m = {}
for i in range(len(x) + 1):
for j in range(len(y) + 1):
if i == j == 0:
m[0, 0] = (0, 0, 0)
else:
m[i, j] = min((m[i - di, j - dj][0] + length_cost(x[i - di:i], y[j - dj:j], mean_xy, variance_xy)
+ bead_cost, di, dj)
for (di, dj), bead_cost in BEAD_COSTS.iteritems()
if i - di >= 0 and j - dj >= 0)
i, j = len(x), len(y)
while True:
(c, di, dj) = m[i, j]
if di == dj == 0:
break
yield (i - di, i), (j - dj, j)
i -= di
j -= dj
def align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=3.0):
# updates line_hypos with tl_line idx
# actually uses line_hypos_agg
# get line lengths
tl_line_len = tl_df.groupby('line_idx').mean().prior_line_len
det_line_len = line_hypos.groupby('label').mean().sort_values('dist').accum
# define input
cx = tl_line_len.values
cy = det_line_len.values
# use detection line lengths to normalize (better range than tl lengths)
max_char = int(cy.max())
# normalize
cx /= cx.max()
cx *= max_char
#cy /= cy.max()
#cy *= max_char
bc = BEAD_COSTS
# iterate over aligned pairs
for (i1, i2), (j1, j2) in reversed(list(_align(cx, cy, 1.0, variance_characters, bc))):
# print (i1, i2), (j1, j2)
# print (tl_line_len.index[i1:i2].values, det_line_len.index[j1:j2].values)
# check if line_hypo exists
if len(det_line_len.index[j1:j2].values) > 0:
tl_line_idx = -1
if len(tl_line_len.index[i1:i2].values) > 0:
tl_line_idx = int(tl_line_len.index[i1:i2].values[0])
# assign tl line idx to detected line
line_hypos.loc[line_hypos.label.isin(det_line_len.index[j1:j2].values), 'tl_line'] = tl_line_idx
# return cx, cy
return line_hypos
# OPTION 3) adopted from Bleualign algorithm for sentence alignment
# relies on matching score between tl null hypothesis and sign detections (sign detector)
# the problem is to align hypo_line_indices (detected lines) with tl_line_indices (transliteration lines)
# all information required is contained in line fragment
# a) make sure that score_mat forms valid positive weights for edges in graph
# b) get matching score matrix with shape=[len(hypo_line_indices), len(tl_line_indices)]
# c) alignment consists of segments that are connected diagonally
def compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag):
# ransac score
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_bleu_score(a_idx.squeeze(), b_idx.squeeze())) # 5/5, 4/1
# score in range [0, 1], but order needs to be reversed
score_mat = 1 - score_mat
return score_mat
def compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag):
# ransac score
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
max_dist_thresh=2, dist_weight=1)) # 5/5, 4/1
# score in range [0, 1], but order needs to be reversed
score_mat = 1 - score_mat
return score_mat
def compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag):
# line matching score
score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
# score in range [0, 1], but order needs to be reversed
score_mat = 1 - score_mat
return score_mat
# use this if you want to implement your own similarity score
def eval_sents_dummy(translist, targetlist, max_alternatives=3):
scoredict = {}
for testID, testSent in enumerate(translist):
scores = []
for refID, refSent in enumerate(targetlist):
score = 100 - abs(len(testSent) - len(refSent)) # replace this with your own similarity score
if score > 0:
scores.append((score, refID, score))
# sorted by first item in tuple (i.e. score)
scoredict[testID] = sorted(scores, key=itemgetter(0), reverse=True)[:max_alternatives]
return scoredict
# follow the backpointers in score matrix to extract best path of 1-to-1 alignments
def extract_best_path(pointers):
i = len(pointers)-1
j = len(pointers[0])-1
pointer = ''
best_path = []
while i >= 0 and j >= 0:
pointer = pointers[i][j]
if pointer == '^':
i -= 1
elif pointer == '<':
j -= 1
elif pointer == 'match':
best_path.append((i, j))
i -= 1
j -= 1
best_path.reverse()
return best_path
# dynamic programming search for best path of alignments (maximal score)
def pathfinder(translist, targetlist, scoremat): # scoredict
# add an extra row/column to the matrix and start filling it from 1,1 (to avoid exceptions for first row/column)
matrix = [[0 for column in range(len(targetlist)+1)] for row in range(len(translist)+1)]
pointers = [['' for column in range(len(targetlist))] for row in range(len(translist))]
for i in range(len(translist)):
for j in range(len(targetlist)):
best_score = matrix[i][j+1]
best_pointer = '^'
score = matrix[i+1][j]
if score > best_score:
best_score = score
best_pointer = '<'
#if np.abs(j - i) < 5: # distance from diagonal
score = scoremat[i, j] + matrix[i][j]
if score > best_score:
best_score = score
best_pointer = 'match'
matrix[i+1][j+1] = best_score
pointers[i][j] = best_pointer
bleualign = extract_best_path(pointers)
return bleualign
def align_lines_tl_by_score(line_hypos, line_frag, visualize=True):
# alignment based on longest path through score mat (topological sort)
assert 'tl_line' in line_hypos.columns, "tl_line needs to be set (e.g. use align by sort"
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
# print(hypo_line_indices, tl_line_indices)
align_opts = [1, 0, 0, 0, 0, 0] # align + ransac, most accurate, slow [NORMAL]
#align_opts = [0, 0, 1, 0, 0, 0] # bleu + ransac, a little less accurate, fast [use with high number of detections]
#align_opts = [0, 0, 0, 0, 0, 1] # bleu
#align_opts = [0, 0, 0, 0, 1, 0] # ransac
#align_opts = [0, 0, 0, 1, 0, 0] # align
assert(np.sum(align_opts) <= 1)
# prepare score mats
if align_opts[0]:
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['ransac', 'gm matching']
multi_score = True
if align_opts[1]:
score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag),
compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['bleu', 'gm matching'] # bleu
multi_score = True
if align_opts[2]:
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['ransac', 'bleu']
multi_score = True
if align_opts[3]:
score_mats = [compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['gm matching']
multi_score = False
if align_opts[4]:
score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['ransac']
multi_score = False
if align_opts[5]:
score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
title_strs = ['bleu']
multi_score = False
if visualize:
# prepare plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # 15, 5
ax = axes.ravel()
best_paths = []
for i, score_mat in enumerate(score_mats):
best_path = pathfinder(hypo_line_indices, tl_line_indices, score_mat)
best_paths.append(best_path)
path_pts = np.asarray(best_path)
if visualize:
# plot score mats with shortest path
if len(path_pts) > 0:
ax[i].plot(path_pts[:, 1], path_pts[:, 0])
ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
ax[i].imshow(score_mat)
ax[i].set_title(title_strs[i])
# compute joint
if multi_score:
path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
# path_pts = np.asarray(best_paths[1]) # use gm_matching only
else:
path_pts = np.asarray(best_paths[0])
if visualize:
# plot score mats with shortest path
if len(path_pts) > 0:
ax[2].plot(path_pts[:, 1], path_pts[:, 0])
ax[2].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
ax[2].imshow((score_mats[0] + score_mats[1]) / 2.)
ax[2].set_title('joint')
if len(path_pts) > 0:
# map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
# hl_indices = hypo_line_indices[path_pts[:, 0]] # already equals index to dataframe
tl_indices = tl_line_indices[path_pts[:, 1]]
# print tl_line_assignment, path_pts[:, 0], tl_indices
# create assignment table for join
basic_index = line_frag.line_hypos.tl_line.sort_values().unique()
tl_line_assignment = pd.DataFrame({'hypo_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
# join line_hypos on tl_line
line_hypos['tl_line'] = line_hypos.join(tl_line_assignment.set_index('hypo_tl_line'), on='tl_line')[
'tl_line_update']
return line_hypos, path_pts
#### full pipeline to solve the line-transliteration alignment problem ####
def compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment, segm_labels, stats, center_im, sign_detections,
visualize=True, align_opt=[False, False, True]):
path_pts = None
# BASIC:
# use line_models sorted by dist as basic alignment
line_hypos = align_lines_tl_by_sort(line_hypos, tl_df)
# OPTION I:
# find basic alignment using line lengths
# apply Gale-Church algorithm (implicit update tl_line column in line_hypos)
if align_opt[0]: # False
line_hypos = align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=6.0)
# OPTION II:
# use gt line annotations (implicit update tl_line column in line_hypos)
if align_opt[1]: # False
if len(gt_line_assignment) > 0:
line_hypos = align_lines_tl_by_ground_truth(line_hypos, tl_df)
# OPTION III:
# alignment based on longest path through score mat (topological sort)
if align_opt[2]: # True
# create line fragment (tl_line should be assigned before!)
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
# compute lines tl alignment based on score
(line_hypos, path_pts) = align_lines_tl_by_score(line_hypos, line_frag, visualize=visualize)
return line_hypos, path_pts
## GT function
def gt_align_lines_tl_by_ed(line_gt, visualize=True):
# alignment based on longest path through score mat (topological sort)
# get assignment space (cartesian product of tl_line_indices and gt_line_indices)
gt_line_indices, tl_line_indices = line_gt.get_alignment_space()
# prepare score mats
score_mats = [compute_bleu_score_mat(gt_line_indices, tl_line_indices, line_gt)]
title_strs = ['edit distance']
multi_score = False
if visualize:
# prepare plot
fig, axes = plt.subplots(1, 1, figsize=(15, 5), squeeze=False) # 1,3
ax = axes.ravel()
best_paths = []
for i, score_mat in enumerate(score_mats):
best_path = pathfinder(gt_line_indices, tl_line_indices, score_mat)
best_paths.append(best_path)
path_pts = np.asarray(best_path)
if visualize:
# plot score mats with shortest path
if len(path_pts) > 0:
ax[i].plot(path_pts[:, 1], path_pts[:, 0])
ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
ax[i].imshow(score_mat)
ax[i].set_title(title_strs[i])
# compute joint
if multi_score:
path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
# path_pts = np.asarray(best_paths[1]) # use gm_matching only
else:
path_pts = np.asarray(best_paths[0])
if len(path_pts) > 0:
# map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
# gt_indices = gt_line_indices[path_pts[:, 0]] # already equals index to dataframe
tl_indices = tl_line_indices[path_pts[:, 1]]
# print tl_line_assignment, path_pts[:, 0], tl_indices
lines_df = line_gt.lines_df
# create assignment table for join
#basic_index = lines_df.tl_line.sort_values().unique() # this is not necessary, because gt_line_idx
basic_index = lines_df.gt_line_idx.sort_values().unique()
tl_line_assignment = pd.DataFrame({'gt_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
# print tl_line_assignment, path_pts
# join line_hypos on tl_line
line_gt.lines_df['tl_line'] = lines_df.join(tl_line_assignment.set_index('gt_tl_line'), on='gt_line_idx')['tl_line_update']
return line_gt, path_pts
+289
Ver Arquivo
@@ -0,0 +1,289 @@
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from skimage.color import label2rgb
from ..transliteration.TransliterationSet import TransliterationSet
from ..transliteration.SignsStats import SignsStats
from ..evaluations.sign_tl_evaluation import compute_accuracy
from ..evaluations.line_tl_evaluation import eval_line_tl_alignment
from ..evaluations.sign_evaluation_prep import get_pred_boxes_df, get_gt_boxes_df
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
from ..evaluations.sign_evaluation import eval_detector_on_collection
from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
from ..alignment.line_tl_alignment import compute_line_tl_alignment
from ..alignment.LineFragment import (LineFragment, compute_line_points, compute_line_polygon, plot_boxes)
from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector,
post_process_line_detections, compute_image_label_map)
from ..detection.detection_helpers import (visualize_net_output, radius_in_image, convert_detections_to_array,
label_map2image, vis_detections, coord_in_image)
#from ..detection.tablet_scale_estimation import print_scale_stats
from ..visualizations.line_visuals import (show_hough_transform_w_lines, show_line_segms, show_line_skeleton, show_probabilistic_hough)
from ..visualizations.line_tl_visuals import show_lines_tl_alignment, show_score_mats_with_paths
def gen_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version, re_transform,
sign_model_version, model_fcn, device,
generate_and_save, show_sign_alignments, collection_subfolder, train_data_ext_file, lbl_list,
line_model_version='v007', use_precomp_lines=False, param_dict=None,
show_line_matching=False, verbose=True):
"""
Generate tl-line pairs for seq model training. Store pairs in file.
Additionally compute some useful filter criterion for generated pairs.
"""
# config tl_line matching
# 1: line length
# 2: use gt line anno (if available)
# 3: shortest path through score matrix
align_opt = [False, False, True]
visualize_tl_line_matching = show_line_matching
# setup evaluators
use_new_eval = True
num_classes = 240
eval_ovthresh = 0.5
eval_basic = SignEvalBasic(sign_model_version, saa_version, eval_ovthresh)
eval_fast = SignEvalFast(sign_model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
# setup transliteration set
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
# setup sign statistics
stats = SignsStats(tblSignHeight=128)
list_pred_boxes_df, list_gt_boxes_df = [], []
acc_array = np.zeros(len(didx_list))
naligned_array = np.zeros(len(didx_list))
for didx in tqdm(didx_list, desc=saa_version):
seg_im, seg_idx = dataset[didx]
# access meta
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
print(didx, image_name, view_desc)
# load transliteration dataframe
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=verbose)
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
#print(float(len_min) / len_max, num_vis_lines)
# boxes file
res_name = "{}{}".format(image_name, view_desc)
res_path = "{}results/results_ssd/{}/{}".format(relative_path, sign_model_version, saa_version)
boxes_file = "{}/{}_all_boxes.npy".format(res_path, res_name)
# load detections
all_boxes = np.load(boxes_file)
sign_detections = convert_detections_to_array(all_boxes)
# load and prepare annotations of segment
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
with_star_crop=False) # depends on sign_detections!
if verbose:
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
# make seg image is large enough for line detector
if seg_im.size[0] > 224 and seg_im.size[1] > 224:
if use_precomp_lines:
# to numpy
center_im = np.asarray(seg_im)
# lbl_ind
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
lbl_ind_x = np.load(lines_file).astype(int)
else:
# prepare input
inputs = preprocess_line_input(seg_im, 1, shift=0)
center_im = re_transform(inputs[4]) # to pil image
center_im = np.asarray(center_im) # to numpy
# apply network
#print(inputs.shape)
output = apply_detector(inputs, model_fcn, device)
# visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
# plt.show()
# prepare output
outprob = np.mean(output, axis=0)
lbl_ind = np.argmax(outprob, axis=0)
lbl_ind_x = lbl_ind.copy()
lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0 # line detector dependent (VIP) # outprob.squeeze() # this fixes a bug!
lbl_ind_80 = lbl_ind.copy()
lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0 # outprob.squeeze() # this fixes a bug!
# only continue if there is a positive line detection
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
if np.any(lbl_ind_x):
# for line detection apply postprocessing pipeline
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max, verbose=verbose)
if len(line_segs) > 0:
# compute overlay
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
image_label_overlay = label2rgb(seg_canvas, image=center_im)
# using line annotations: gt_line_idx for hypo_lines
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx, line_hypos.groupby('label').mean())
if len(gt_line_assignment) > 0:
# clean join on line_hypos
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
## clean join on line_hypos_agg
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
if len(tl_df) > 0:
# abort if obvious transliteration / lines mismatch
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
print("CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
continue
#### line-transliteration alignment problem ####
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
segm_labels, stats, center_im, sign_detections,
visualize=visualize_tl_line_matching,
align_opt=align_opt)
# FINISH lines-tl alignment
# create line fragment (tl_line should be assigned before?!)
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
# get assigned tl indices
assigned_tl_indices = line_frag.get_assigned_lines_idx()
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
# evaluate line-tl alignment using gt-line annotations; only quality indicator because unreliable
if len(gt_line_assignment) > 0 and verbose:
eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines)
# common colormap
# color = plt.cm.jet(np.linspace(0,1,len(angles)))
cmap = plt.get_cmap('nipy_spectral')
color = cmap(np.linspace(0, 1, len(line_hypos)))
# estimate scale
if False:
if len(tl_df) == 0:
# use line detection estimates
num_lines = line_hypos.label.nunique()
len_max = line_hypos.groupby('label').mean().accum.max() / dist_interline_median
# get scales using different approaches
# use num_lines for scale estimation (NOT num_vis_lines!)
print_scale_stats(seg_rec, scale, lbl_ind_x, lbl_ind_80, num_lines, len_max,
line_hypos, dist_interline_median)
if False:
show_line_skeleton(lbl_ind_x, skeleton)
plt.show()
if False:
show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color)
if len(line_segs) > 0:
if False:
show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color)
if False:
show_line_segms(image_label_overlay, segm_labels)
if len(tl_df) > 0:
if False:
show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color)
if False:
show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag)
if True:
if show_sign_alignments:
aligned_list, tablet_tl_df = line_frag.tab_visualize_gm_alignments(refined=True) # refined=True, does not help/hurt
else:
refined = False
if param_dict is not None:
if 'refined' in param_dict:
refined = param_dict['refined']
aligned_list, tablet_tl_df = line_frag.tab_get_gm_alignments(refined=refined,
param_dict=param_dict) # refined=True, does not help/hurt
if len(gt_boxes) > 0:
if use_new_eval:
if len(aligned_list) > 0:
all_boxes = [[el] for el in aligned_list]
if False:
# standard mAP eval
eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
# fast evaluation
eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
# get segment statistics of current segment [-1]
num_tp, num_fp, _, acc, mean_ap, global_ap = eval_fast.get_seg_summary(-1)
# save acc to array
acc_array[didx_list.index(didx)] = acc
# save naligned to array
naligned_array[didx_list.index(didx)] = num_tp + num_fp
else:
# prepare full collection evaluation
list_pred_boxes_df.append(get_pred_boxes_df([[el] for el in aligned_list], seg_idx))
list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
# get num aligned across all classes
naligned = np.sum([len(el) for i, el in enumerate(aligned_list) if i > 0])
if verbose and len(aligned_list) > 0:
# [METHOD B]: evaluate mAP and print stats for a single segment
# (these results can strongly differ from collection-wise evaluation)
acc, df_stats = compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=True)
# save acc to array
acc_array[didx_list.index(didx)] = acc
ntfpos = df_stats.tp.sum() + df_stats.fp.sum()
# print ntfpos, naligned
# save naligned to array
naligned_array[didx_list.index(didx)] = ntfpos # naligned
if generate_and_save:
line_frag.tab_generate_training_data(collection_subfolder, train_data_ext_file,
image_name, image_path, scale, seg_idx, seg_bbox,
tablet_tl_df, lbl_list, append=True)
else:
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
else:
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
# make plots appear
plt.show()
# full collection eval
acc = 0
df_stats = []
if use_new_eval:
eval_fast.prepare_eval_collection()
df_stats, global_ap = eval_fast.eval_collection(verbose=verbose)
num_tp, num_fp, num_fp_global, acc = eval_fast.get_col_summary()
else:
if len(list_gt_boxes_df) > 0:
# [METHOD C]: compute mAP across all instances of individual classes
# (these results can strongly differ from segment-wise evaluation)
gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None) # set fixed!
return acc, df_stats # acc_array, naligned_array
+187
Ver Arquivo
@@ -0,0 +1,187 @@
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from skimage.color import label2rgb
from ..transliteration.TransliterationSet import TransliterationSet
from ..transliteration.SignsStats import SignsStats
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
from ..alignment.line_tl_alignment import compute_line_tl_alignment
from ..alignment.LineFragment import LineFragment, plot_boxes
from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
from ..utils.bbox_utils import convert_bbox_global2local, box_iou
from ..utils.nms import nms
def convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale):
""" Maintains data frame index in last column """
list_detections = []
for anno_idx, anno_rec in seg_gen_annos.iterrows():
# [ID, cx, cy, score, x1, y1, x2, y2, idx]
temp = np.zeros(9)
box = np.array(relative_bboxes[anno_idx]) * scale
temp[0] = anno_rec.newLabel
temp[1] = (box[2] + box[0]) / 2
temp[2] = (box[3] + box[1]) / 2
temp[3] = anno_rec.det_score
temp[4:8] = box[0:4]
temp[8] = anno_idx
list_detections.append(temp)
# stack
detections_arr = np.vstack(list_detections)
return detections_arr
def gen_cond_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, anno_df, relative_path, saa_version,
collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
min_dets_inline=2, ncompl_thresh=20, smooth_y=True, max_dist_det=3,
line_model_version='v007', visualize_hypos=False):
# setup transliteration set
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
# setup sign statistics
stats = SignsStats(tblSignHeight=128)
# for seg_im, seg_idx in dataset:
for didx in tqdm(didx_list, desc=saa_version):
seg_im, seg_idx = dataset[didx]
# access meta
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# load transliteration dataframe
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
if len(tl_df) > 0: # only continue if transliteration is available
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
print(float(len_min) / len_max, num_vis_lines)
# boxes file
# select generated annos per segment
seg_gen_annos = anno_df[anno_df.seg_idx == seg_idx]
if False:
# control completeness filter (redundant - additional filter inside create conditional hypos)
filter_nms = False
compl_thresh = -1 # 0, 2, 3, 4, 5, 6 disable: -1
ncompl_thresh = -1 # 10, 15, 20 disable: -1
# filter using nms
if filter_nms:
seg_gen_annos = seg_gen_annos[seg_gen_annos.nms_keep]
if compl_thresh > -1:
# filter using compl
seg_gen_annos = seg_gen_annos[seg_gen_annos.compl > compl_thresh]
if ncompl_thresh > -1:
# filter using compl
seg_gen_annos = seg_gen_annos[seg_gen_annos.ncompl > ncompl_thresh]
if len(seg_gen_annos) > 0:
# convert to all boxes
relative_bboxes = seg_gen_annos.bbox.apply(lambda x: convert_bbox_global2local(x, list(seg_bbox)))
sign_detections = convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale)
# load and prepare annotations of segment
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
with_star_crop=False) # depends on sign_detections!
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
# make seg image is large enough for line detector
if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
# prepare input
# to numpy
center_im = np.asarray(seg_im)
# lbl_ind
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
lbl_ind_x = np.load(lines_file).astype(int)
# only continue if there is a positive line detection
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
if np.any(lbl_ind_x):
# for line detection apply postprocessing pipeline
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
if len(line_segs) > 0:
# compute overlay
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
image_label_overlay = label2rgb(seg_canvas, image=center_im)
# using line annotations: gt_line_idx for hypo_lines
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
line_hypos.groupby('label').mean())
if len(gt_line_assignment) > 0:
# clean join on line_hypos
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
## clean join on line_hypos_agg
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
if len(tl_df) > 0:
# abort if obvious transliteration / lines mismatch
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
print(
"CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
continue
#### line-transliteration alignment problem ####
# for train use: align_opt=[False, True, False] (use line annos)
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
segm_labels, stats, center_im, sign_detections,
visualize=False,
align_opt=[False, False, True]) # CHANGE HERE
# FINISH lines-tl alignment
# create line fragment (tl_line should be assigned before?!)
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
# get assigned tl indices
assigned_tl_indices = line_frag.get_assigned_lines_idx()
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
if visualize_hypos:
# generate conditional hypo
(tab_t_hypos, tab_t_anno_idx,
tab_t_meta) = line_frag.tab_create_conditional_hypo_alignments(anno_df=anno_df,
min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
smooth_y=smooth_y, max_dist_det=max_dist_det)
if len(tab_t_hypos) > 0:
if False:
# filter using nms
nms_th = 0.6
keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
tab_t_hypos = tab_t_hypos[keep]
# visualize
plot_boxes(tab_t_hypos[:, 4:8])
plt.imshow(line_frag.input_im, cmap='gray')
# save to test
if generate_and_save:
line_frag.tab_generate_cond_hypo_training_data(collection_subfolder, train_data_ext_file,
image_name, image_path, scale, seg_idx, seg_bbox,
lbl_list, append=True, anno_df=anno_df,
min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
smooth_y=smooth_y, max_dist_det=max_dist_det)
else:
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
else:
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
else:
print('No detections for {}[{}]!'.format(image_name, seg_idx))
plt.show()
+136
Ver Arquivo
@@ -0,0 +1,136 @@
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from skimage.color import label2rgb
from ..transliteration.TransliterationSet import TransliterationSet
from ..transliteration.SignsStats import SignsStats
from ..evaluations.sign_evaluation_gt import prepare_segment_gt
from ..alignment.line_tl_alignment import compute_line_tl_alignment
from ..alignment.LineFragment import LineFragment, plot_boxes
from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
def gen_null_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version,
collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
line_model_version='v007', visualize_hypos=False):
# setup transliteration set
tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
# setup sign statistics
stats = SignsStats(tblSignHeight=128)
# for seg_im, seg_idx in dataset:
for didx in tqdm(didx_list, desc=saa_version):
seg_im, seg_idx = dataset[didx]
# access meta
seg_rec = dataset.assigned_segments_df.loc[seg_idx]
image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# load transliteration dataframe
tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
if len(tl_df) > 0: # only continue if transliteration is available
tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
print(float(len_min) / len_max, num_vis_lines)
# load and prepare annotations of segment
gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
with_star_crop=False) # depends on sign_detections!
print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
sign_detections = None
# make seg image is large enough for line detector
if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
# prepare input
# to numpy
center_im = np.asarray(seg_im)
# lbl_ind
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
lbl_ind_x = np.load(lines_file).astype(int)
# only continue if there is a positive line detection
# (avoids unnecessary computation and an error in skimage hough_line_peaks)
if np.any(lbl_ind_x):
# for line detection apply postprocessing pipeline
(line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
if len(line_segs) > 0:
# compute overlay
seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
image_label_overlay = label2rgb(seg_canvas, image=center_im)
# using line annotations: gt_line_idx for hypo_lines
gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
line_hypos.groupby('label').mean())
if len(gt_line_assignment) > 0:
# clean join on line_hypos
line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
## clean join on line_hypos_agg
# line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
if len(tl_df) > 0:
# abort if obvious transliteration / lines mismatch
if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
print(
"CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
continue
#### line-transliteration alignment problem ####
# for train use: align_opt=[False, True, False] (use line annos)
line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
segm_labels, stats, center_im, sign_detections,
visualize=False,
align_opt=[True, False, False]) # CHANGE HERE
# FINISH lines-tl alignment
# create line fragment (tl_line should be assigned before?!)
line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
# get assigned tl indices
assigned_tl_indices = line_frag.get_assigned_lines_idx()
# get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
if visualize_hypos:
# generate conditional hypo
tab_t_hypos = line_frag.tab_create_null_hypo_alignments()
if len(tab_t_hypos) > 0:
if False:
# filter using nms
nms_th = 0.6
keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
tab_t_hypos = tab_t_hypos[keep]
# visualize
plot_boxes(tab_t_hypos[:, 4:8])
plt.imshow(line_frag.input_im, cmap='gray')
# save to test
if generate_and_save:
line_frag.tab_generate_null_hypo_training_data(collection_subfolder,
train_data_ext_file,
image_name, image_path, scale, seg_idx,
seg_bbox,
lbl_list, append=True)
else:
print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
else:
print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
# print plot
plt.show()
+7
Ver Arquivo
@@ -0,0 +1,7 @@
### Dataset classes
- `lines_dataset.py`: used for line segmentation training
- `cunei_dataset.py` : used for sign classification training
- `cunei_dataset_ssd.py` : used for sign detector training
- `cunei_dataset_segments.py` : used for evaluation on full tablet image (image + bbox annotations)
- `segments_dataset.py` : used for evaluation on full tablet image (image only)
Ver Arquivo
+237
Ver Arquivo
@@ -0,0 +1,237 @@
import pandas as pd
from future.utils import iteritems
from tqdm import tqdm
from ast import literal_eval
from PIL import Image
import torch.utils.data as data
from ..utils.bbox_utils import *
from ..utils.transform_utils import crop_pil_image, spatial_sample
DEBUG_MODE = False
class CuneiformCollection(data.Dataset):
def __init__(self, params, transform=None, target_transform=None, relative_path='../', split='train', top_k=-1, top_k_pick=-1, pad_to_square=True):
self.gray_mean = params['gray_mean']
self.context_pad = params['context_pad']
if 'test' in split:
self.context_pad = 0 # no padding needed
self.num_classes = params['num_classes']
self.min_align_ratio = 0.6
if 'min_align_ratio' in params:
self.min_align_ratio = params['min_align_ratio']
# transforms for data preparation
self.transform = transform
self.target_transform = target_transform
self.pad_to_square = pad_to_square
self.compl_thresh, self.ncompl_thresh = -1, -1
if 'compl_thresh' in params:
self.compl_thresh = params['compl_thresh']
if 'ncompl_thresh' in params:
self.ncompl_thresh = params['ncompl_thresh']
# load annotations
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, split)
meta_df = pd.read_csv(annotation_file, engine='python') # read annotation file
# additional annos (investigate impact of additional train data)
if 'train' in split and 'extra_collections' in params:
list_annos = [meta_df]
for collection in params['extra_collections']:
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
list_annos.append(anno_df)
meta_df = pd.concat(list_annos, ignore_index=True)
# add missing columns to meta_df
nd_bbox = np.array(meta_df['bbox'].apply(literal_eval).tolist()) # convert to ndarray
meta_df['x1'] = nd_bbox[:, 0]
meta_df['y1'] = nd_bbox[:, 1]
meta_df['x2'] = nd_bbox[:, 2]
meta_df['y2'] = nd_bbox[:, 3]
meta_df['imageName'] = meta_df['tablet_CDLI'] + '.jpg'
meta_df['image_path'] = '{}data/images/'.format(relative_path) + meta_df['collection'] \
+ '/' + meta_df['imageName']
### load and prepare gen_df
# append with gen alignments
gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
# segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
collections_ext = [split]
if 'train' in split:
# OPT I : use csv file that contains list of generated boxes
if 'gen_file' in params:
gen_df = pd.read_csv(params['gen_file'], engine='python', header=None, delimiter=', ', names=gen_cols) # delimiter might need to be removed?!
# OPT II : load csv files for collection specific collections and concatenate
elif 'gen_collections' in params:
assert params['gen_folder'] is not None, 'When using gen_collections, user needs to provide gen_model!'
df_list = []
for gen_coll in params['gen_collections']:
gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path,
params['gen_folder'], gen_coll)
gen_df = pd.read_csv(gen_file_path, delimiter=',\s*', engine='python', header=None, names=gen_cols) # delimiter=', ', delimiter=',\s*',
df_list.append(gen_df)
gen_df = pd.concat(df_list, ignore_index=True)
# prepare gen_df
if ('gen_file' in params) or ('gen_collections' in params):
# IMPORTANT: filter gen data according to align ratio
gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
# IMPORTANT: fill nan values in a way that avoids filtering
gen_df.compl = gen_df.compl.fillna(50)
gen_df.ncompl = gen_df.ncompl.fillna(100)
num_before_filter = len(gen_df)
if self.compl_thresh > -1:
# filter using compl
gen_df = gen_df[gen_df.compl > self.compl_thresh] # 0, 2, 4, 5
print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh,
num_before_filter - len(gen_df),
len(gen_df)))
elif self.ncompl_thresh > -1:
# filter using compl
gen_df = gen_df[gen_df.ncompl > self.ncompl_thresh] # 0, 2, 4, 5
print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh,
num_before_filter - len(gen_df),
len(gen_df)))
print('class sample count stats: ')
print(gen_df.train_label.value_counts().describe())
# add/update additional columns
gen_df['collection'] = gen_df.folder.str.split('/').str[0]
gen_df['generated'] = True
gen_df['imageName'] = gen_df['imageName'].astype(str) + '.jpg'
# identify all collections with generated annotations
list_gen_collection = gen_df.collection.unique().tolist()
collections_ext += list_gen_collection
# concatenate
meta_df = pd.concat([meta_df, gen_df], ignore_index=True)
# drop outlier classes for now (dirty fix)
class_outlier_select = meta_df.train_label < 240
if np.any(class_outlier_select):
print('Drop {} outlier samples!'.format(np.sum(~class_outlier_select)))
meta_df = meta_df[class_outlier_select]
# reset index
self.meta_df = meta_df.reset_index(drop=True)
# make sure there is width and height
self.meta_df['width'] = self.meta_df['x2'] - self.meta_df['x1'] + 1
self.meta_df['height'] = self.meta_df['y2'] - self.meta_df['y1'] + 1
# only keep top 100 classes
if top_k > 0:
top_labels = self.meta_df.label.value_counts()[:top_k].index.values
top_select = self.meta_df.label.isin(top_labels)
self.meta_df = self.meta_df[top_select].reset_index()
if top_k > top_k_pick >= 0:
print(top_labels)
print('Only select samples from class {}'.format(top_labels[top_k_pick]))
class_select = self.meta_df.label == top_labels[top_k_pick]
self.meta_df = self.meta_df[class_select].reset_index(drop=True)
# all annotations are used
self.osd_valid_ind = self.meta_df.index
# crop pre-processing
# save longest side of each sign
self.meta_df['square'] = self.meta_df[['width', 'height']].max(axis=1)
# for each tablet compute median of longest side, and assign it to each sign
median_table = self.meta_df[self.meta_df.train_label > 0].groupby('imageName')[['square']].median()
self.meta_df = self.meta_df.join(median_table, on='imageName', rsuffix='_md')
# self.meta_df['square_new'] = self.meta_df[['square', 'square_md']].max(axis=1)
# pre-load all images
self.use_preload = True
if self.use_preload:
map = {key: value for (key, value) in enumerate(self.meta_df['image_path'][self.osd_valid_ind].unique())}
inv_map = {value: key for key, value in iteritems(map)} # use items
self.meta_df['mem_idx'] = self.meta_df['image_path'].replace(inv_map)
self.image_data_list = []
for key, impath in tqdm(iteritems(map), total=len(map)):
im_ref = None
try:
im_ref = Image.open(impath)
except IOError:
print('could not read image: {}'.format(impath))
# due to memory constraints not .convert('RGB')
im_ref = im_ref.convert('L')
self.image_data_list.append(im_ref)
# setup finished
print("Setup {} dataset spanning {} collections.".format(split, collections_ext))
num_segs = self.meta_df['image_path'].nunique()
print("Select {} bboxes from {} tablets.".format(len(self), num_segs))
def __getitem__(self, index):
# map index to csv index
csv_idx = self.osd_valid_ind[index]
impath = self.meta_df.iloc[csv_idx]['image_path']
target = self.meta_df.iloc[csv_idx]['train_label']
square = self.meta_df.iloc[csv_idx]['square']
square_md = self.meta_df.iloc[csv_idx]['square_md']
# load image data
if self.use_preload:
mem_idx = self.meta_df.iloc[csv_idx]['mem_idx']
im_ref = self.image_data_list[mem_idx]
else:
im_ref = None
try:
im_ref = Image.open(impath).convert('L') # due to memory constraints not .convert('RGB')
except IOError:
print('could not read image: {}'.format(impath))
# bounding box meta
bb = [self.meta_df.iloc[csv_idx]['x1'], self.meta_df.iloc[csv_idx]['y1'],
self.meta_df.iloc[csv_idx]['x2'], self.meta_df.iloc[csv_idx]['y2']]
# context crop
context_pad = self.context_pad # int(square * self.context_pad) #
if self.pad_to_square:
# if background, context_pad = 0
if target == 0:
context_pad = 0
# if largest side of bbox is smaller than median of tablet, add additional context pad
elif square_md > square:
context_pad += (square_md - square) / 2. # divide by 2, because w,h of im_pad grow by 2 * context_pad
# new fast
im, bb_pad = crop_pil_image(im_ref, bb, context_pad=context_pad, pad_to_square=self.pad_to_square)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im = self.transform(im)
if self.target_transform is not None:
target = self.target_transform(target)
return im, target
def __len__(self):
return len(self.osd_valid_ind)
def test(params, split='train', top_k=-1, top_k_pick=-1, pad_to_square=True, relative_path='../../'):
dataset = CuneiformCollection(params, relative_path=relative_path, split=split, top_k=top_k, top_k_pick=top_k_pick, pad_to_square=pad_to_square)
return dataset
+334
Ver Arquivo
@@ -0,0 +1,334 @@
import torch
import numpy as np
import pandas as pd
from PIL import Image
from ast import literal_eval
import os.path
from tqdm import tqdm
import torch.utils.data as data
from ..detection.sign_detection import crop_segment_from_tablet_im
# from utils.cython_bbox import bbox_overlaps
from ..utils.bbox_utils import clip_boxes
from ..utils.torchcv.transforms.crop_box import crop_box
from ..utils.torchcv.transforms.resize import resize
# helper functions
def convert_bbox_global2local(gbbox, seg_bbox):
x, y = seg_bbox[:2]
relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
return relative_bbox.tolist()
def get_segment_meta(segment_rec):
image_name = segment_rec.tablet_CDLI
# this should control which scale is used in consecutive processing
scale = segment_rec.scale #* self.rescale
seg_bbox = segment_rec.bbox
path_to_image = segment_rec.im_path
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
return image_name, scale, seg_bbox, path_to_image, view_desc
def bbox_ctr_overlaps(boxes1, boxes2):
# check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
for ii, box in enumerate(boxes1):
x, y, x2, y2 = box
# check if center is still inside tile_box, otherwise ignore box
# if center is not inside tile box,
# not possible to get IoU >= 0.5 --> treated as background anyways
center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
& (center[:, 1] >= y) & (center[:, 1] <= y2)
overlaps_mat[ii, :] = mask
return overlaps_mat
# Cuneiform SSD dataset
class CuneiformSegments(data.Dataset):
def __init__(self, collections=['train'], transform=None, relative_path='../',
only_annotated=True, only_assigned=True, preload_segments=True, use_gray_scale=True):
# merge multiple data sources in order to provide following function:
# f(idx) -> image, boxes, labels
# uses gt annotations only
# if no annotations available boxes and labels are empty lists
# transforms for data preparation
self.transform = transform
self.preload_segments = preload_segments
self.use_gray_scale = use_gray_scale
### load and prepare list_sign_anno_df
# manual annotation files may be based on multiple collections
# for each collection
# store in list_sign_anno_df
# load bbox annotations
list_anno_collections = []
sign_anno_df_list = []
for collection in collections:
# load sign annotations
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
# ATTENTION: only use gt annotations if collection is provided in collections parameter
if os.path.exists(annotation_file):
sign_anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
# add additional columns
sign_anno_df['generated'] = False
sign_anno_df['global_segm_idx'] = -1
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array) # convert to ndarray
# slice sign_anno_df if there are multiple different collections contained
for sub_collection in sign_anno_df.collection.unique():
# store collection name
list_anno_collections.append(sub_collection)
# store collection specific slice of data frame
sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
sign_anno_df_list.append(sub_sign_anno_df)
### extend collections
# create list of elementary collections
collections_ext = np.unique(list_anno_collections).tolist()
###################
# II) on collection level: load annotations and meta data
### load segment, sign meta information
# for each collection
# store in segments_df_list
# reduced set of columns - only keep what is needed and maintained
segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
segments_df_list = []
#sign_anno_df_list = []
for collection in collections_ext:
# load segment metadata
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
# convert string of list to list
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
# add collection column
file_names = tablet_segments_df['tablet_CDLI'] + '.jpg'
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + tablet_segments_df['collection'] + '/' + file_names
# get assigned segment (can be edited from outside without harm)
if only_assigned:
assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
else:
assigned_segments_df = tablet_segments_df
# collect data frames in lists
segments_df_list.append(assigned_segments_df[segments_df_columns])
### assemble ssd_segments_df with new index
# search all segments with annotations
list_segments_df_anno = []
for collection in collections_ext:
coll_idx = collections_ext.index(collection)
list_segm_indices = []
# get all segment indices for this collection that contain annotations
if only_annotated:
# if there are gt annotations
if collection in list_anno_collections:
anno_coll_idx = list_anno_collections.index(collection)
# if there are gt annotations
if len(sign_anno_df_list[anno_coll_idx]) > 0:
# load their indices
segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
# filter annotations without assigned segment
segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
list_segm_indices.append(segm_indices_anno)
# append only segments with anno
if len(list_segm_indices) > 0:
# stack to obtain list of segment indices with annotations
segm_indices = np.unique(np.hstack(list_segm_indices))
# append
list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
else:
# append all segments from collection
list_segments_df_anno.append(segments_df_list[coll_idx])
# create new datasets ssd_segment_df
# concat dataframes and use reset_index to create column with old indices
ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
# rename column to segm_idx
ssd_segments_df.columns.values[0] = 'segm_idx'
###################
# III) on segment level: load data and prepare dataset index
### assemble ssd_sign_anno_df and update ssd_segments_df
# additional column for ssd_sign_anno_df: global_segm_idx
# additional column for ssd_segments_df: with num_anno
sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
'generated', 'relative_bbox', 'global_segm_idx']
# segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
list_ssd_sign_anno_df = []
list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
# iterate over segments
for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
segm_idx = seg_rec.segm_idx
collection = seg_rec.collection
# print(image_name, view_desc, segm_idx)
coll_idx = collections_ext.index(collection)
### if annotations available for segment, append to list
if collection in list_anno_collections:
anno_coll_idx = list_anno_collections.index(collection)
if len(sign_anno_df_list[anno_coll_idx]) > 0:
sign_anno_df = sign_anno_df_list[anno_coll_idx]
# select sign annos for segment
segm_select = sign_anno_df.segm_idx == segm_idx
if len(sign_anno_df[segm_select]) > 0:
# update data frame column
sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
# collect information
sign_anno_seg = sign_anno_df[segm_select]
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
# add columns to ssd_segments_df
ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
if len(list_ssd_sign_anno_df) > 0:
# assemble ssd_sign_anno_df (drop old index)
ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
else:
# create empty data frame with correct columns
ssd_sign_anno_df = pd.DataFrame(columns=sign_anno_df_cols)
###################
# IV) Preload: line detections and segment images
### preload segment images
# crop segment and convert to gray scale
# IMPORTANT: preload segment crops (without scaling, because memory)
image_data_list = []
if self.preload_segments:
# iterate over segments
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# load composite image
pil_im = Image.open(image_path)
# crop segment
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
# convert to gray scale and store in list
if self.use_gray_scale:
# convert to gray scale
image_data_list.append(tablet_seg.convert('L'))
else:
image_data_list.append(tablet_seg)
###################
# VI) Dataset index
sample2tile_list = ssd_segments_df.index.values
###################
# attach resulting data structures to class
self.collections = collections
self.collections_ext = collections_ext
self.ssd_segments_df = ssd_segments_df
self.ssd_sign_anno_df = ssd_sign_anno_df
self.image_data_list = image_data_list
# self.sign_anno_df_list = sign_anno_df_list
# self.segments_df_list = segments_df_list
self.sample2tile_list = sample2tile_list
# map from seg idx to dataset idx
self.sidx2didx = dict(zip(ssd_segments_df.segm_idx.values, range(len(ssd_segments_df))))
# setup finished
print("Setup dataset spanning {} collections with {} annotations [{} segments, {} indices]".format(
len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df), len(sample2tile_list)))
def __getitem__(self, index):
# get segment
global_seg_idx = self.sample2tile_list[index]
seg_rec = self.ssd_segments_df.loc[global_seg_idx]
# load segment meta data
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
# get sign annos
select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
segm_annos = self.ssd_sign_anno_df[select_segm]
# get annotated boxes and their labels
if len(segm_annos) > 0:
seg_boxes = np.stack(segm_annos.relative_bbox)
labels = segm_annos.train_label.values
# convert to torch tensors
seg_boxes = torch.from_numpy(seg_boxes).float()
labels = torch.from_numpy(labels)
else:
seg_boxes = None
labels = None
# get segment image
if self.preload_segments:
pil_im = self.image_data_list[global_seg_idx]
else:
# load composite image
pil_im = Image.open(image_path)
# crop segment
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
if self.use_gray_scale:
# convert to gray scale
pil_im = tablet_seg.convert('L')
else:
pil_im = tablet_seg
# tensor functions adapted from kuangliu's code
# https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
# scale segment
im, boxes = resize(pil_im, seg_boxes, None, scale=scale)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im, boxes, labels = self.transform(im, boxes, labels)
return im, boxes, labels
def get_seg_rec(self, index):
# get segment
global_seg_idx = self.sample2tile_list[index]
return self.ssd_segments_df.loc[global_seg_idx]
def __len__(self):
return len(self.sample2tile_list)
+696
Ver Arquivo
@@ -0,0 +1,696 @@
import numpy as np
import pandas as pd
from PIL import Image
from ast import literal_eval
import os.path
from tqdm import tqdm
import torch.utils.data as data
from ..detection.sign_detection import *
# from utils.cython_bbox import bbox_overlaps
from ..utils.bbox_utils import clip_boxes
from ..utils.transform_utils import convert2binaryPIL
from ..utils.torchcv.transforms.crop_box import crop_box
from ..utils.torchcv.transforms.resize import resize
from ..utils.torchcv.transforms_lm.crop_box import crop_box_lm
from ..utils.torchcv.transforms_lm.resize import resize_lm
from .lines_dataset import collect_line_coords, create_line_trafo
from ..detection.line_detection import compute_image_label_map
# helper functions
def convert_bbox_global2local(gbbox, seg_bbox):
x, y = seg_bbox[:2]
relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
return relative_bbox.tolist()
def get_segment_meta(segment_rec):
image_name = segment_rec.tablet_CDLI
# this should control which scale is used in consecutive processing
scale = segment_rec.scale #* self.rescale
seg_bbox = segment_rec.bbox
path_to_image = segment_rec.im_path
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
return image_name, scale, seg_bbox, path_to_image, view_desc
def compute_tiles(imw, imh, scale, tile_shape=[600, 600], border_sz=100, w_step_sz=300, h_step_sz=400):
# TODO: improve using linespace and allow overlap to vary
# signs height should be around 130px, however, length can be up to 300px
# -> overlap along lines (300px) should be larger than between lines (200px)
# -> this means for step sizes: w_step_sz < h_step_sz
inv_scale = 1. / scale
tile_shape = np.array(tile_shape) * inv_scale
border_sz *= inv_scale
w_step_sz *= inv_scale
h_step_sz *= inv_scale
tile_ol_w = tile_shape[0] - w_step_sz
tile_ol_h = tile_shape[0] - h_step_sz
w_list = np.arange(border_sz, imw - border_sz - tile_ol_w, step=w_step_sz)
h_list = np.arange(border_sz, imh - border_sz - tile_ol_h, step=h_step_sz)
# grid pts represent upper left corner of tile box
# tiles can be larger than image and need to be padded
XX, YY = np.meshgrid(w_list, h_list)
# compute bboxes
ul_corner = np.rint(np.stack([XX.ravel(), YY.ravel()], axis=1)).astype(int)
lr_corner = ul_corner + np.rint(tile_shape)
bboxes = np.hstack([ul_corner, lr_corner])
# make sure tiles inside image boundaries
bboxes = clip_boxes(bboxes, [imh, imw]) # [imh, imw] is correct order for this function
return bboxes, XX, YY
def bbox_ctr_overlaps(boxes1, boxes2):
# check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
for ii, box in enumerate(boxes1):
x, y, x2, y2 = box
# check if center is still inside tile_box, otherwise ignore box
# if center is not inside tile box,
# not possible to get IoU >= 0.5 --> treated as background anyways
center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
& (center[:, 1] >= y) & (center[:, 1] <= y2)
overlaps_mat[ii, :] = mask
return overlaps_mat
# Cuneiform SSD dataset
class CuneiformSSD(data.Dataset):
def __init__(self, collections=['train'], gen_file_path=None, gen_collections=[], gen_folder=None, transform=None,
relative_path='../', use_balanced_idx=True, tile_shape=[600, 600], use_linemaps=False,
remove_empty_tiles=False, min_align_ratio=0.6, filter_nms=False, compl_thresh=-1, ncompl_thresh=-1,
num_top_ncompl=0, min_ncompl_thresh=10):
# merge multiple data sources in order to form a single dataset that can be used for SSD style detector training
# provides following function:
# f(idx) -> image, bboxes, labels
# or more general:
# f(idx) -> image, bboxes, labels, line_map
# join multiple levels of supervision: three cases for sign annotations
# 1) tablets completely annotated (no need to load line annotations nor line detections)
# 2) tablets partly annotated and line annotations available (no need to load line detections)
# 3) tablets partly annotated and line detections required
# transforms for data preparation
self.transform = transform
self.line_model_version = None
self.use_linemaps = use_linemaps
self.min_align_ratio = min_align_ratio
self.filter_nms = filter_nms
self.compl_thresh = compl_thresh
self.ncompl_thresh = ncompl_thresh
self.num_top_ncompl = num_top_ncompl
self.min_ncompl_thresh = min_ncompl_thresh
line_model_version = 'v007'
num_classes = 240
###################
# I) load generated and manual annotations
### load and prepare gen_df
# generated annotations may be based on multiple collections
gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
# OPT I : use csv file that contains list of generated boxes
if gen_file_path:
gen_file_path = "{}results{}".format(relative_path, gen_file_path)
gen_df = pd.read_csv(gen_file_path, engine='python', header=None, names=gen_cols)
# OPT II : load csv files for collection specific collections and concatenate
elif len(gen_collections) > 0:
assert gen_folder is not None, 'When using gen_collections, user needs to provide gen_model!'
df_list = []
for gen_coll in gen_collections:
gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path, gen_folder, gen_coll)
# special delimiter because of legacy support, thanks to regex possible to support new and old formats
gen_df = pd.read_csv(gen_file_path, engine='python', delimiter=',\s*', header=None, names=gen_cols) #delimiter=', ',
df_list.append(gen_df)
gen_df = pd.concat(df_list, ignore_index=True)
# prepare gen_df
list_gen_collection = []
if gen_file_path or (len(gen_collections) > 0):
num_before_filter = len(gen_df)
# IMPORTANT: filter gen data according to align ratio
gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
print('Align Ratio {} :: Removed {} samples. [{}]'.format(self.min_align_ratio, num_before_filter - len(gen_df), len(gen_df)))
num_before_filter = len(gen_df)
# only keep inlier classes [0-240] (only required when using null hypos)
gen_df = gen_df[gen_df.train_label < num_classes]
print('Class Range {} :: Removed {} samples. [{}]'.format(num_classes, num_before_filter - len(gen_df), len(gen_df)))
# IMPORTANT: fill nan values in a way that avoids filtering
gen_df.nms_keep = gen_df.nms_keep.fillna(1).astype(bool)
gen_df.compl = gen_df.compl.fillna(50)
gen_df.ncompl = gen_df.ncompl.fillna(100)
num_before_filter = len(gen_df)
if self.filter_nms:
# filter using nms
gen_df = gen_df[gen_df.nms_keep]
print('NMS :: Removed {} samples. [{}]'.format(num_before_filter - len(gen_df), len(gen_df)))
num_before_filter = len(gen_df)
select_topn = False
if self.num_top_ncompl > 0:
# find top 5 for each class with more relaxed ncompl condition
select_min_ncompl = (gen_df.ncompl > self.min_ncompl_thresh) # necessary condition
index_list = gen_df[select_min_ncompl].groupby('train_label').ncompl.nlargest(self.num_top_ncompl).index.values
select_topn = gen_df.index.isin(np.stack(index_list)[:, 1])
if self.compl_thresh > -1:
# filter using compl
gen_df = gen_df[gen_df.compl > self.compl_thresh] # 0, 2, 4, 5
print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh, num_before_filter - len(gen_df), len(gen_df)))
elif self.ncompl_thresh > -1:
# filter using compl
gen_df = gen_df[(gen_df.ncompl > self.ncompl_thresh) | select_topn] # 0, 2, 4, 5
print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh, num_before_filter - len(gen_df), len(gen_df)))
print('class sample count stats: ')
print(gen_df.train_label.value_counts().describe())
# add additional columns
gen_df['collection'] = gen_df.folder.str.split('/').str[0]
gen_df['generated'] = True
gen_df['global_segm_idx'] = -1
gen_df['relative_bbox'] = gen_df[['x1', 'y1', 'x2', 'y2']].values.tolist()
gen_df['relative_bbox'] = gen_df['relative_bbox'].apply(np.array)
gen_df['mzl_label'] = gen_df['label']
gen_df['tablet_CDLI'] = gen_df['imageName']
# identify all collections with generated annotations
list_gen_collection = gen_df.collection.unique().tolist()
### load and prepare list_sign_anno_df
# manual annotation files may be based on multiple collections
# for each collection
# store in list_sign_anno_df
# load bbox annotations
list_anno_collections = []
sign_anno_df_list = []
for collection in collections:
# load sign annotations
annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
# ATTENTION: only use gt annotations if collection is provided in collections parameter
if os.path.exists(annotation_file):
sign_anno_df = pd.read_csv(annotation_file, engine='python') # read annotation file
# add additional columns
sign_anno_df['generated'] = False
sign_anno_df['global_segm_idx'] = -1
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array) # convert to ndarray
# only keep inlier classes [0-240]
class_outlier_select = sign_anno_df.train_label < num_classes
if np.any(class_outlier_select):
print('Drop {} outlier class samples from {}!'.format(np.sum(~class_outlier_select), collection))
sign_anno_df = sign_anno_df[class_outlier_select]
# slice sign_anno_df if there are multiple different collections contained
for sub_collection in sign_anno_df.collection.unique():
# store collection name
list_anno_collections.append(sub_collection)
# store collection specific slice of data frame
sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
sign_anno_df_list.append(sub_sign_anno_df)
### extend collections
# create list of elementary collections
collections_ext = np.unique(list_gen_collection + list_anno_collections).tolist()
#collections_ext
###################
# II) on collection level: load segments meta data and line annotation (optional)
### load segment, line
# for each collection
# store in segments_df_list, line_anno_df_list
# reduced set of columns - only keep what is needed and maintained
# segments_df_columns = ['tablet_CDLI', 'view_desc', 'padded_bbox', 'collection', 'line_scale', 'scale',
# 'im_path',
# 'num_dets_hd', 'num_signs_visible']
segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
segments_df_list = []
line_anno_df_list = []
for collection in collections_ext:
# load segment metadata
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
# convert string of list to list
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
# add additional columns
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
# get assigned segment (can be edited from outside without harm)
assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
# load line annotations
annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, collection)
if os.path.exists(annotation_file):
line_anno_df = pd.read_csv(annotation_file, engine='python')
else:
line_anno_df = []
# collect data frames in lists
segments_df_list.append(assigned_segments_df[segments_df_columns])
line_anno_df_list.append(line_anno_df)
### assemble ssd_segments_df with new index
# search all segments with annotations
list_segments_df_anno = []
for collection in collections_ext:
coll_idx = collections_ext.index(collection)
#print(collection)
# get all segment indices for this collection that contain annotations
list_segm_indices = []
# if there are gt annotations
if collection in list_anno_collections:
anno_coll_idx = list_anno_collections.index(collection)
if len(sign_anno_df_list[anno_coll_idx]) > 0:
# load their indices
segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
# filter annotations without assigned segment
segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
list_segm_indices.append(segm_indices_anno)
# if there are generated annotations
if collection in list_gen_collection:
# select gen annotations by collection
col_gen_df = gen_df[gen_df.collection == collection]
# load their indices
segm_indices_anno = col_gen_df.segm_idx.unique()
list_segm_indices.append(segm_indices_anno)
# stack to obtain list of segment indices with annotations
segm_indices = np.unique(np.hstack(list_segm_indices))
# append only segments with anno
if len(segm_indices) > 0:
list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
# create new datasets ssd_segment_df
# concat dataframes and use reset_index to create column with old indices
ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
# rename column to segm_idx
ssd_segments_df.columns.values[0] = 'segm_idx'
###################
# III) on segment level: load data and prepare dataset index
### assemble ssd_sign_anno_df and update ssd_segments_df
# make sure all annos have relative_bbox
# additional column for ssd_sign_anno_df: global_segm_idx
# add two columns to ssd_segments_df: with num_anno, with_line_anno
# type of annotation: full, partly_w_line_anno, partly_w_line_dect
# sign_anno_df_cols = ['imageName', 'image_path', 'label', 'train_label', 'segm_idx', 'collection',
# 'generated', 'relative_bbox', 'global_segm_idx']
sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
'generated', 'relative_bbox', 'global_segm_idx']
list_ssd_sign_anno_df = []
list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
# iterate over segments
for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
collection = seg_rec.collection
segm_idx = seg_rec.segm_idx
coll_idx = collections_ext.index(collection)
### if annotations available for segment, append to list
if collection in list_anno_collections:
anno_coll_idx = list_anno_collections.index(collection)
if len(sign_anno_df_list[anno_coll_idx]) > 0:
sign_anno_df = sign_anno_df_list[anno_coll_idx]
# select sign annos for segment
segm_select = sign_anno_df.segm_idx == segm_idx
if len(sign_anno_df[segm_select]) > 0:
# update data frame column
sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
# collect information
sign_anno_seg = sign_anno_df[segm_select]
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
### if generated annotations available, append to list
if collection in list_gen_collection:
# select sign annos for segment AND collection
segm_select = (gen_df.segm_idx == segm_idx) & (gen_df.collection == seg_rec.collection)
if len(gen_df[segm_select]) > 0:
# update data frame columns
gen_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
# compute relative_bbox
relative_boxes = gen_df[segm_select].relative_bbox.apply(
lambda x: np.rint(convert_bbox_global2local(x, list(seg_bbox))).astype(int))
gen_df.loc[segm_select, 'relative_bbox'] = relative_boxes
# collect information
sign_anno_seg = gen_df[segm_select]
list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
### check for line annotations
if len(line_anno_df_list[coll_idx]) > 0:
line_anno_df = line_anno_df_list[coll_idx]
# select line annos for segment
segm_select = line_anno_df.segm_idx == segm_idx
# if there are line annotations for segment
if len(line_anno_df[segm_select]) > 0:
# assume all lines are annotated and remember type of line data
list_lines_annotated_per_segm[global_seg_idx] = True
# add columns to ssd_segments_df
ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
ssd_segments_df['with_line_anno'] = list_lines_annotated_per_segm
# assemble ssd_sign_anno_df (drop old index)
ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
# this is deprecated, since bug fix
#assert np.sum(ssd_sign_anno_df.groupby('global_segm_idx').collection.nunique() > 1) == 0
###################
# IV) Preload: segment images and line detections
### preload segment images
# crop segment and convert to gray scale
# IMPORTANT: preload segment crops (without scaling, because memory)
image_data_list = []
# iterate over segments
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# load composite image
pil_im = Image.open(image_path)
# crop segment
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
# convert to gray scale and store in list
image_data_list.append(tablet_seg.convert('L'))
### preload line detections
# could pre-compute line annotations->line map
# this is a speed memory trade-off
line_detection_dict = {}
line_map_dict = {}
# only required if there are any generated detections
if self.use_linemaps:
# iterate over segments
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# get collection idx
coll_idx = collections_ext.index(seg_rec.collection)
# get seg image shape
input_shape = np.array(image_data_list[global_seg_idx].size[::-1])
# if annotations are generated, need to create line map
#if seg_rec.collection in list_gen_collection:
# if no line annotations available
if True: # ALWAYS use generated annotations not seg_rec.with_line_anno: # if seg_rec.collection != 'train'
# either skeleton or lbl_ind
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, seg_rec.collection)
lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
# lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
lbl_ind_x = np.load(lines_file).astype(int)
# store in dictionary
line_detection_dict[global_seg_idx] = lbl_ind_x
# create line map from detections -> PIL binary
lbl_im = create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape)
else:
# create line map from line annotations -> PIL binary
lbl_im = create_line_map_from_line_anno(line_anno_df_list, coll_idx, seg_rec.segm_idx, input_shape)
# resize to image size (do here or in next iter
# lbl_im = lbl_im.resize(input_shape[::-1])
# store in dictionary
line_map_dict[global_seg_idx] = lbl_im
###################
# V) Tiling
### compute ssd_tile_df
list_tile_boxes = []
list_tile_support = []
list_tile_seg_idx = []
# iterate over segments
for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
## compute tiles
# get segment shape
imw, imh = image_data_list[global_seg_idx].size
# compute tile boxes
tile_boxes, _, _ = compute_tiles(imw, imh, scale, tile_shape=tile_shape)
# append
list_tile_boxes.append(tile_boxes)
list_tile_seg_idx.append([global_seg_idx] * len(tile_boxes))
## check overlap of tile boxes and sign boxes
# get annotations
seg_sign_annos = ssd_sign_anno_df[ssd_sign_anno_df.global_segm_idx == global_seg_idx]
sign_bboxes = np.stack(seg_sign_annos.relative_bbox.values)
# OPT I: compute IOU
# tiles_sign_iou = bbox_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
# tile_support = np.sum(tiles_sign_iou > 0.005, axis=1) # 0.01 or 0.005
# OPT II: compute ctr overlap (strict)
tiles_sign_ctrs = bbox_ctr_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
tile_support = np.sum(tiles_sign_ctrs, axis=1).astype(int)
list_tile_support.append(tile_support)
# stack tile boxes
tile_boxes_arr = np.vstack(list_tile_boxes)
tile_global_seg_idx = np.hstack(list_tile_seg_idx).astype(int)
tile_support_arr = np.hstack(list_tile_support)
# create tile_df
tile_df = pd.DataFrame({'global_segm_idx': tile_global_seg_idx,
'tile_bbox': tile_boxes_arr.tolist(),
'num_anno': tile_support_arr})
# OPTIONAL: filter tiles with little support
if remove_empty_tiles and not use_balanced_idx:
tile_df = tile_df[tile_df.num_anno > 0] # 0
tile_df.reset_index(drop=True)
###################
# VI) Dataset index
## Balance sampling of tiles with anno per tile
# create an dataset index which is proportional to annotations per tile
# attention: tiles without support will be ignored!
use_balanced_idx = use_balanced_idx # good for debug
# 1) get tile factors
tile_factors = tile_df.num_anno.values
# 2) compute list to sample from
if use_balanced_idx:
sample2tile_list = []
for ii, tile_factor in enumerate(tile_factors):
sample2tile_list.extend([ii] * tile_factor)
else:
sample2tile_list = tile_df.index.values
###################
# attach resulting data structures to class
self.collections = collections
self.collections_ext = collections_ext
self.ssd_segments_df = ssd_segments_df
self.ssd_sign_anno_df = ssd_sign_anno_df
self.tile_df = tile_df
self.image_data_list = image_data_list
# self.line_detection_dict = line_detection_dict
self.line_map_dict = line_map_dict
self.line_anno_df_list = line_anno_df_list
# self.sign_anno_df_list = sign_anno_df_list
# self.segments_df_list = segments_df_list
self.sample2tile_list = sample2tile_list
# setup finished
print("Setup dataset spanning {} collections with {} annotations [{} segments, {} tiles, {} indices]".format(
len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df), len(tile_df), len(sample2tile_list)))
def __getitem__(self, index):
# get tile
tile_index = self.sample2tile_list[index]
tile_rec = self.tile_df.loc[tile_index]
tile_bbox = tile_rec.tile_bbox
# get segment
global_seg_idx = tile_rec.global_segm_idx
seg_rec = self.ssd_segments_df.loc[global_seg_idx]
coll_idx = self.collections_ext.index(seg_rec.collection)
# load segment meta data
image_name, scale, seg_bbox, path_to_image, view_desc = get_segment_meta(seg_rec)
with_line_anno = seg_rec.with_line_anno
# get segment image
pil_im = self.image_data_list[global_seg_idx]
# get sign annos
select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
segm_annos = self.ssd_sign_anno_df[select_segm]
seg_boxes = np.stack(segm_annos.relative_bbox)
labels = segm_annos.train_label.values
are_generated = segm_annos.generated.any()
# OPT II: tensor functions adapted from kuangliu's code
# https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
# convert to torch tensors
seg_boxes = torch.from_numpy(seg_boxes).float()
labels = torch.from_numpy(labels)
if self.use_linemaps:
if are_generated:
# incomplete annotations -> use line detections to avoid false negatives
lbl_im = self.line_map_dict[global_seg_idx]
# resize to crop
lbl_im = lbl_im.resize(pil_im.size)
else:
# assume all ground truth signs are annotated
# provide dummy label map
lbl_im = Image.new('1', pil_im.size, 0)
if False:
from skimage.color import label2rgb
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
# plt.imshow(lbl_ind)
plt.imshow(label2rgb(np.asarray(lbl_im), np.asarray(pil_im)))
plt.show()
# crop tile
# print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
im, boxes, labels, linemap = crop_box_lm(pil_im, seg_boxes, labels, lbl_im, tile_bbox)
# scale tile
im, boxes, linemap = resize_lm(im, boxes, linemap, None, scale=scale)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im, boxes, labels, linemap = self.transform(im, boxes, labels, linemap)
return im, boxes, labels, linemap
else:
# crop tile
#print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
im, boxes, labels = crop_box(pil_im, seg_boxes, labels, tile_bbox)
# scale tile
im, boxes = resize(im, boxes, None, scale=scale)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im, boxes, labels = self.transform(im, boxes, labels)
return im, boxes, labels
def __len__(self):
return len(self.sample2tile_list)
# helper functions
def create_line_map_from_line_anno(line_anno_df_list, coll_idx, segm_idx, input_shape):
line_height = 3
# select line annotations
line_anno_df = line_anno_df_list[coll_idx]
seg_line_df = line_anno_df[line_anno_df.segm_idx == segm_idx]
# # collect all line coordinates
rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=1 / 16.)
# compute line trafo
line_trafo = create_line_trafo(rr, cc, input_shape / 16)
# # compute masks
line_mask = line_trafo < line_height
# convert to binary PIL image
lbl_im = convert2binaryPIL(line_mask)
return lbl_im
def create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape):
# get line detection
lbl_ind = line_detection_dict[global_seg_idx]
# compute line map
lbl_ind = compute_image_label_map(lbl_ind, np.array(input_shape * scale, dtype=int), padding=5) # default:16, other padding=16 20 24
# convert to binary PIL image
lbl_im = convert2binaryPIL(lbl_ind)
return lbl_im
# run test
def test(collections=['train'], gen_collections=[], gen_folder=None, use_balanced_idx=True, use_linemaps=False,
remove_empty_tiles=False, min_align_ratio=0.2, relative_path='../../'):
ssd_dataset = CuneiformSSD(collections=collections, gen_file_path=None, gen_collections=gen_collections,
gen_folder=gen_folder, relative_path=relative_path,
use_balanced_idx=use_balanced_idx, tile_shape=[600, 600], use_linemaps=use_linemaps,
remove_empty_tiles=remove_empty_tiles, min_align_ratio=min_align_ratio)
return ssd_dataset
+286
Ver Arquivo
@@ -0,0 +1,286 @@
import os
import numpy as np
import pandas as pd
from PIL import Image
from ast import literal_eval
from scipy import ndimage as ndi
from skimage.util import invert
from skimage.draw import line, line_aa
import torch.utils.data as data
from tqdm import tqdm
from ..utils.bbox_utils import clip_boxes
from ..utils.transform_utils import crop_pil_image
from ..detection.sign_detection import *
### helper functions
def collect_line_coords(seg_line_df, scale=1):
# group according to line idx
grouped = seg_line_df.groupby('line_idx')
# collect all line coordinates
rr_list, cc_list, lbbox_list = [], [], []
for i, line_rec in grouped:
xx = np.rint(line_rec.x.values * scale).astype(int)
yy = np.rint(line_rec.y.values * scale).astype(int)
lbbox = np.array([np.min(xx), np.min(yy), np.max(xx), np.max(yy)])
lbbox_list.append(lbbox)
for li in range(len(xx) - 1):
rr, cc, _ = line_aa(yy[li], xx[li], yy[li + 1], xx[li + 1])
# rr, cc = line(yy[li], xx[li], yy[li+1], xx[li+1])
rr_list.append(rr)
cc_list.append(cc)
# stack coordinates
rr = np.hstack(rr_list)
cc = np.hstack(cc_list)
lbboxes = np.stack(lbbox_list)
return rr, cc, lbboxes
def create_line_trafo(rr, cc, input_shape):
# create mask
line_mask = np.zeros(input_shape).astype(bool)
line_mask[rr, cc] = 1
# compute distance transform after inverting
line_trafo = ndi.distance_transform_edt(invert(line_mask))
return line_trafo
def compute_sampling_freq(line_trafo, sample_mask, sample_radius, expo=2):
sample_freq = line_trafo
# convert to probs
sample_freq = (-sample_freq / sample_radius + 1) ** expo
# sample_freq = -sample_freq/sample_radius + 1
# sample_freq = np.exp(-sample_freq/sample_radius * 2)
# set area that is not sampled from to 'zero'
sample_freq[sample_mask < 1] = 0
return sample_freq
def spatial_sample(sample_freq):
thresh = np.random.random_sample()
ylist, xlist = np.where(sample_freq > thresh)
select_idx = np.random.randint(len(xlist))
return xlist[select_idx], ylist[select_idx]
def spatial_sample_negative(sample_freq):
# too slow
if 0:
# remove samples close to border
border_mask = np.zeros_like(sample_freq, dtype=bool)
bdist = 150
border_mask[bdist:-bdist, bdist:-bdist] = True
# apply masks
ylist, xlist = np.where((sample_freq == 0) & (border_mask))
select_idx = np.random.randint(len(xlist))
return xlist[select_idx], ylist[select_idx]
# faster
if 1:
# remove samples close to border
border_mask = np.zeros_like(sample_freq, dtype=bool)
bdist = 150
border_mask[bdist:-bdist, bdist:-bdist] = True
x, y = 0, 0
# (line_map[x, y] is True) results in overlap with hard negative samples
for i in range(100):
# pick coordinate
select_idx = np.random.randint(np.prod(sample_freq.shape))
# back to matrix index
x, y = np.unravel_index(select_idx, sample_freq.shape)
if (sample_freq[x, y] == 0) and (border_mask[x, y] == True):
break
return y, x
def pad_bboxes(lbboxes, context_pad):
# works inplace, so need to return
for bb in lbboxes:
bb[:2] = bb[:2] - context_pad
bb[2:4] = bb[2:4] + context_pad
# return lbboxes
def spatial_sample_line(sample_freq, lbbox):
thresh = np.random.random_sample()
ylist, xlist = np.where(sample_freq[lbbox[1]:lbbox[3], lbbox[0]:lbbox[2]] >= thresh)
if len(xlist) == 0:
print lbbox, sample_freq.shape
select_idx = np.random.randint(len(xlist))
return lbbox[0] + xlist[select_idx], lbbox[1] + ylist[select_idx]
### CuneiformLine Class
class CuneiformLines(data.Dataset):
def __init__(self, dataset_params, transform=None, target_transform=None, relative_path='../', split='train'):
# annotation_path, params,
# set params
self.line_height = dataset_params['line_height']
self.sample_radius = dataset_params['sample_radius'] # self.line_height * 3
self.expo = dataset_params['expo']
if 'train' in split:
self.soft_bg_frac = dataset_params['soft_bg_frac'][0]
else:
self.soft_bg_frac = dataset_params['soft_bg_frac'][1]
self.crop_size = dataset_params['crop_size']
self.patch_size = dataset_params['patch_size']
# transforms for data preparation
self.transform = transform
self.target_transform = target_transform
# load line annotation
annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, split)
line_anno_df = pd.read_csv(annotation_file, engine='python')
# load segment metadata
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, split)
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
# convert string of list to list
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
# additional columns
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
# select assigned
assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
# pre-load segments and compute line and sampling maps
self.valid_indices = []
self.num_lines_list = []
self.image_data_list = []
self.line_map_list = []
self.sample_freq_list = []
lbboxes_list = []
for segment_idx, segment_rec in tqdm(assigned_segments_df.iterrows(), total=len(assigned_segments_df)):
imageName = segment_rec.tablet_CDLI
scale = segment_rec.scale
seg_bbox = segment_rec.bbox
path_to_image = segment_rec.im_path
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
# select line annotations
seg_line_df = line_anno_df[line_anno_df.segm_idx == segment_idx]
# check if any annotations available
if len(seg_line_df) > 0:
# print(split, imageName, view_desc)
### 1) load segment
# prepare input tablet
pil_im = Image.open(path_to_image)
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
# scale image
input_im = rescale_segment_single(tablet_seg, scale)
input_shape = input_im.size[::-1]
### 2) line map
# compute interpolated line coordinates
# collect all line coordinates
rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=scale)
# pad with sample radius
pad_bboxes(lbboxes, self.sample_radius)
clip_boxes(lbboxes, input_shape)
# compute line trafo
line_trafo = create_line_trafo(rr, cc, input_shape)
# compute masks
line_mask = line_trafo < self.line_height
sample_mask = line_trafo < self.sample_radius
# compute frequency
sample_freq = compute_sampling_freq(line_trafo, sample_mask, self.sample_radius, self.expo)
### 3) save data
# append to list
self.valid_indices.append(segment_idx)
self.num_lines_list.append(len(seg_line_df.line_idx.unique()))
self.image_data_list.append(input_im)
self.line_map_list.append(line_mask)
self.sample_freq_list.append(sample_freq)
lbboxes_list.append(lbboxes)
# stack lbboxes
self.lbboxes = np.vstack(lbboxes_list)
self.line2mem_list = []
# for valid_idx, num_lines in zip(self.valid_indices, self.num_lines_list):
for men_idx, num_lines in enumerate(self.num_lines_list):
self.line2mem_list.extend([men_idx] * num_lines)
# Balance sampling with line length
# 1) get line factors by line width and normalisation
widths = self.lbboxes[:, 2] - self.lbboxes[:, 0]
# factor required to make smallest length larger equal 1
norm_factor_int = np.ceil(float(widths.sum()) / widths.min())
norm_widths = widths / float(widths.sum())
line_factors = np.rint(norm_factor_int * norm_widths).astype(int)
# 2) compute list to sample from
self.sample2line_list = []
for ii, line_factor in enumerate(line_factors):
self.sample2line_list.extend([ii] * line_factor)
# increase test set size to obtain more stable error
if split == 'test':
self.sample2line_list = self.sample2line_list * 5
# setup finished
print("Setup {} dataset with {} rows and {} samples".format(split, len(self.line2mem_list), len(self)))
def __getitem__(self, index):
# line_index = index
line_index = self.sample2line_list[index]
lbbox = self.lbboxes[line_index]
mem_idx = self.line2mem_list[line_index]
# get required data
segm_im = self.image_data_list[mem_idx]
line_map = self.line_map_list[mem_idx]
sample_freq = self.sample_freq_list[mem_idx]
if np.random.random() > self.soft_bg_frac:
# sample spatial location
# y, x = spatial_sample(sample_freq) # coordinates need to be inverted
y, x = spatial_sample_line(sample_freq, lbbox)
# compute target label
target = int(line_map[x, y])
else:
y, x = spatial_sample_negative(sample_freq)
# compute target label
target = int(line_map[x, y]) # should be always negative
# crop patch at sampled location (use PIL for that)
hw, hh = self.patch_size[0] / 2., self.patch_size[1] / 2.
bbox = [y - hw, x - hh, y + hw, x + hh]
# new fast
im, bb = crop_pil_image(segm_im, bbox, context_pad=0, pad_to_square=False)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im = self.transform(im)
if self.target_transform is not None:
target = self.target_transform(target)
return im, target
def __len__(self):
# return total lines
# return len(self.sample_indices)
return len(self.sample2line_list)
+149
Ver Arquivo
@@ -0,0 +1,149 @@
import numpy as np
import pandas as pd
from PIL import Image
from ast import literal_eval
from tqdm import tqdm
import torch.utils.data as data
from ..detection.sign_detection import crop_segment_from_tablet_im, rescale_segment_single
from ..utils.torchcv.transforms.resize import resize
class CuneiformSegments(data.Dataset):
# lightweight version of cunei_dataset_segments
# no annotations processing
# no preloading
def __init__(self, transform=None, target_transform=None, collection='train', collections=[],
relative_path='../', rescale=1.0, only_assigned=True, preload_segments=False):
self.rescale = rescale
self.relative_path = relative_path
self.collection = collection
self.preload_segments = preload_segments
# transforms for data preparation
self.transform = transform
self.target_transform = target_transform
if len(collections) > 0:
# load segment metadata for multiple collections
df_list = []
for collection in collections:
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
df_list.append(tablet_segments_df)
# concatenate to single df
tablet_segments_df = pd.concat(df_list, ignore_index=True)
else:
# load segment metadata for single collection
annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
# convert string of list to list
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array) # convert to ndarray
# add additional columns
tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
# get assigned segment (can be edited from outside without harm)
if only_assigned:
self.assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
else:
self.assigned_segments_df = tablet_segments_df
# make available for outside
self.tablet_segments_df = tablet_segments_df
self.image_data_list = []
self.sample2seg_list = []
self.sidx2didx = []
self.setup_sample_list()
def setup_sample_list(self, updated_df=None):
if updated_df is not None:
self.assigned_segments_df = updated_df
### preload segment images
# crop segment and convert to gray scale
# IMPORTANT: preload segment crops (without scaling, because memory)
image_data_list = []
if self.preload_segments:
# iterate over segments
for seg_idx, seg_rec in tqdm(self.assigned_segments_df.iterrows(), total=len(self.assigned_segments_df)):
# load segment meta data
image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
# prepare input tablet
pil_im = Image.open(path_to_image)
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
# store in list
image_data_list.append(tablet_seg)
self.image_data_list = image_data_list
self.sample2seg_list = self.assigned_segments_df.index.values
# map from seg idx to dataset idx
self.sidx2didx = dict(zip(self.sample2seg_list, range(len(self.sample2seg_list))))
# setup finished
print("Setup {} dataset with {} elements".format(self.collection, len(self)))
def __getitem__(self, index):
seg_idx = self.sample2seg_list[index]
seg_rec = self.assigned_segments_df.loc[seg_idx]
# load segment meta data
image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
# specify target
target = seg_idx
# get segment image
if self.preload_segments:
tablet_seg = self.image_data_list[index]
else:
# prepare input tablet
pil_im = Image.open(path_to_image)
tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
# scale image
if 0:
# scale image
im = rescale_segment_single(tablet_seg, scale)
else:
# convert to gray scale
# tablet_seg = tablet_seg.convert('L')
# scale segment
im, _ = resize(tablet_seg, None, None, scale=scale)
# apply augmentation pipeline and convert from PIL to numpy
if self.transform is not None:
im = self.transform(im)
if self.target_transform is not None:
target = self.target_transform(target)
return im, target
def __len__(self):
# return total lines
return len(self.assigned_segments_df)
def get_segment_meta(self, segment_rec):
image_name = segment_rec.tablet_CDLI
# this should control which scale is used in consecutive processing
scale = segment_rec.scale * self.rescale
seg_bbox = segment_rec.bbox
path_to_image = segment_rec.im_path
view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
return image_name, scale, seg_bbox, path_to_image, view_desc
Ver Arquivo
+725
Ver Arquivo
@@ -0,0 +1,725 @@
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib import cm
import PIL.Image as Image
from ..utils.transform_utils import crop_pil_image
from ..evaluations.config import cfg
from ..utils.bbox_utils import clip_boxes
def visualize_net_output_single(im, predicted, cunei_id=30, num_classes=None, min_prob=0.95):
# visualize output of single crop detections
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
# cross-image products
output = np.mean(predicted, axis=0)
# cross-channel products
lbl_ind = np.argmax(output, axis=0)
ctr_crop = predicted[0, ...]
plt.figure(figsize=(16, 24))
plt.subplot(4, 2, 1)
plt.imshow(im, cmap=cm.Greys_r)
plt.title('input')
plt.subplot(4, 2, 2)
plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
plt.colorbar()
plt.title('class #{}'.format(cunei_id))
plt.subplot(4, 2, 3)
cmap = plt.get_cmap('Paired')
plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
plt.colorbar()
plt.title('argmax class')
plt.subplot(4, 2, 4)
test = np.argmax(ctr_crop.squeeze(), axis=0)
test[np.max(ctr_crop.squeeze(), axis=0) < min_prob] = 0
plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
plt.colorbar()
plt.title('argmax class ( {} confidence)'.format(min_prob))
def _refine_detections(predicted):
# single image product from center crop
ctr_crop = predicted[4, ...]
# cross-image products
output = np.mean(predicted, axis=0)
max_output = np.max(predicted, axis=0)
uncertainty = np.var(predicted, axis=0)
# cross-channel products
lbl_ind = np.argmax(output, axis=0)
average_unc = np.mean(uncertainty, axis=0)
min_average_unc = np.min(average_unc)
max_average_unc = np.max(average_unc)
max_unc = np.max(uncertainty)
# save products
# sio.savemat('results/test_tablet_cuneiNet_{}_{}_scale_{}_{}.mat'.format(training_round, imageName, scale, negatives_used),
# {#'probs':ctr_crop,
# #'pred_labels': np.argmax(ctr_crop,axis=0),
# #'entropy': -np.sum(ctr_crop * np.log(ctr_crop)),
# 'predicted': predicted,
# 'avg_probs': output,
# 'avg_unc': average_unc,
# 'avg_pred_labels': lbl_ind})
return ctr_crop, output, max_output, uncertainty, lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc
def visualize_net_output(im, predicted, cunei_id=30, num_classes=None):
# visualize output of 5 star crop detections
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
ctr_crop, output, max_output, uncertainty, \
lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc = _refine_detections(predicted)
plt.figure(figsize=(16, 24))
plt.subplot(3, 2, 1)
plt.imshow(im, cmap=cm.Greys_r)
plt.title('input')
plt.subplot(3, 2, 2)
plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
plt.colorbar()
plt.title('class #{}'.format(cunei_id))
plt.subplot(3, 2, 3)
cmap = plt.get_cmap('Paired')
plt.imshow(np.argmax(ctr_crop.squeeze(), axis=0), cmap=cmap, vmin=0, vmax=num_classes)
plt.colorbar()
plt.title('argmax class')
plt.subplot(3, 2, 4)
test = np.argmax(ctr_crop.squeeze(), axis=0)
test[np.max(ctr_crop.squeeze(), axis=0) < 0.95] = 0
plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
plt.colorbar()
plt.title('argmax class (0.95 confidence)')
#plt.subplot(4, 2, 5)
#cmap = plt.get_cmap('Paired')
#plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
#plt.colorbar()
#plt.title('avg argmax class')
plt.subplot(3, 2, 5)
plt.imshow(average_unc, vmin=0, vmax=max_average_unc)
plt.colorbar()
plt.title('shift induced uncertainty')
plt.subplot(3, 2, 6)
# entropy
plt.imshow(-np.sum(ctr_crop.squeeze() * np.log(ctr_crop.squeeze()), axis=0))
plt.colorbar()
plt.title('entropy')
def _im_to_pyra_coords(pyra, boxes):
# boxes is N x 4 where each row is a box in the image specified
# by [x1 y1 x2 y2].
#
# Output is a cell array where cell i holds the pyramid boxes
# coming from the image box
boxes = boxes - 1
pyra_boxes = []
for level in range(pyra['num_levels']):
level_boxes = boxes * pyra['scales'][level]
level_boxes = np.round(level_boxes / pyra['stride'])
level_boxes = level_boxes
# add padding
level_boxes[:, 0] = level_boxes[:, 0] + pyra['padx']
level_boxes[:, 2] = level_boxes[:, 2] + pyra['padx']
level_boxes[:, 1] = level_boxes[:, 1] + pyra['pady']
level_boxes[:, 3] = level_boxes[:, 3] + pyra['pady']
pyra_boxes.append(level_boxes)
return pyra_boxes
def _pyra_to_im_coords(pyra, boxes):
# boxes is N x 5 where each row is a box in the format [x1 y1 x2 y2 pyra_level]
# where (x1, y1) is the upper-left corner of the box in pyramid level pyra_level
# and (x2, y2) is the lower-right corner of the box in pyramid level pyra_level
# Assumes 1-based indexing.
# pyramid to im scale factors for each scale
scales = pyra['stride'] / pyra['scales'][0]
# pyramid to im scale factors for each pyra level in boxes
if len(scales.shape) > 0:
scales = scales[boxes[:, -1]];
# Remove padding from pyramid boxes
boxes[:, 0] = boxes[:, 0] - pyra['padx']
boxes[:, 2] = boxes[:, 2] - pyra['padx']
boxes[:, 1] = boxes[:, 1] - pyra['pady']
boxes[:, 3] = boxes[:, 3] - pyra['pady']
im_boxes = boxes[:, :4] * scales
return im_boxes
def _pyramid_patch_box(x1, y1, feat_map_sz, pyra, lvl_idx, opt='A'):
# compute image patch box coordinates in original image
# should also work for all features of one image at once
# REQUIREMENTS:
# position of feature in feature map: x1, y1
# dimension of feature map: feat_map_sz
# scale of input image (relative to original scale): pyra, lvl_idx
# stride that is determined by network architecture: pyra
# OPTION A
if opt == 'A':
boxes = np.array([x1 - 0.5, y1 - 0.5, x1 + 0.5, y1 + 0.5]).transpose([1, 0])
boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
# OPTION B - more accurate
elif opt == 'B':
x_step = (feat_map_sz[1] - 1) / float(feat_map_sz[1])
y_step = (feat_map_sz[0] - 1) / float(feat_map_sz[0])
boxes = np.array(
[(x1 - 0.5) * x_step, (y1 - 0.5) * y_step, (x1 + 0.5) * x_step, (y1 + 0.5) * y_step]).transpose([1, 0])
boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
im_patch_box = np.floor(_pyra_to_im_coords(pyra, boxes))
return im_patch_box
def _pyramid_rf_box(im_sz, im_patch_box, rf_size, scales, lvl_idx):
# compute receptive field box coordinates in original image
# (given patch_box coordinates in original image)
# REQUIREMENTS:
# receptive field size determined by network architecture: rf_size [H, W]
# original image size: im_sz
# patch box size in original image: im_patch_box
# scale of input image relative to original image: scales, lvl_idx
scaled_rf_sz = rf_size / scales[lvl_idx]
im_rf_box = np.zeros_like(im_patch_box)
im_rf_box[:, 0] = im_patch_box[:, 0] - scaled_rf_sz[1] / 2.
im_rf_box[:, 1] = im_patch_box[:, 1] - scaled_rf_sz[0] / 2.
im_rf_box[:, 2] = im_patch_box[:, 2] + scaled_rf_sz[1] / 2.
im_rf_box[:, 3] = im_patch_box[:, 3] + scaled_rf_sz[0] / 2.
# should not be required!!
# im_rf_box = clip_boxes(im_rf_box, im_sz)
return np.round(im_rf_box)
def compute_bbox_grids(map_shape, im_shape, arch_type='alexnet'):
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
if arch_type is 'alexnet':
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
else:
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
# pyra = {'stride': 16, 'num_levels': 1, 'scales': np.array([1.0]),
# 'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
# blobs in caffe and images in opencv are H,W formatted. This results in YX format
# since all bounding boxes use XY convention, then accessing images or blobs this needs to be taken into account
x = np.arange(0, map_shape[1])
y = np.arange(0, map_shape[0])
xv, yv = np.meshgrid(x, y, sparse=False, indexing='xy')
# print lbl_ind.shape, im.shape, xv.shape
# compute basic patch boxes
# each score in the score map corresponds to a single non-overlapping box
patch_boxes = _pyramid_patch_box(xv.flatten(), yv.flatten(), map_shape, pyra, 0, opt='A') + pyra[
'offset']
# compute receptive field sized boxes (overlapping)
# due to the way pyramid_rf_box is implemented one stride needs to be subtracted from rf_size
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
rf_boxes = _pyramid_rf_box(im_shape, patch_boxes, rf_sz - pyra['stride'], pyra['scales'], 0)
return patch_boxes, rf_boxes
def label_map2image(feat_x, feat_y, map_shape, arch_type='alexnet'):
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
if arch_type is 'alexnet':
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
else:
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
# compute basic patch boxes
# each score in the score map corresponds to a single non-overlapping box
patch_boxes = _pyramid_patch_box(feat_x, feat_y, map_shape, pyra, 0, opt='A') + pyra[
'offset']
return patch_boxes
def radius_in_image(feat_radius, dim=0, arch_type='alexnet'):
# dim defines along which dimension to compute (only important, if rf_size not square)
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
if arch_type is 'alexnet':
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
else:
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
# compute radii in image
patch_radius = feat_radius * pyra['stride']
rf_radius = patch_radius + (rf_sz[dim] - pyra['stride'])
return patch_radius, rf_radius
def coord_in_image(coord, add_rf=False, arch_type='alexnet'):
# stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
if arch_type is 'alexnet':
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]} # [227 rf] 195, 227
else:
pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]} # [227 rf] 195, 227
rf_sz = np.array(pyra['rf_size'], dtype=np.int)
# compute coordinate in image
im_coord = coord * pyra['stride'] + pyra['offset']
if add_rf:
im_coord += (rf_sz[0] - pyra['stride'])
return im_coord
def _bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
def nms(dets, thresh):
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
def crop_bboxes_from_im(im, bboxes, context_pad=0, is_pil=False):
"""
Crop a bbox from the image for detection.
im: crop target
bboxes: bounding box coordinates as xmin, ymin, xmax, ymax.
"""
# iterate over boxes
im_crop_list = []
for i in xrange(bboxes.shape[0]):
# format bbox
bbox = np.round(bboxes[i, :]).astype(int)
# crop bbox from image
if context_pad <= 0:
im_crop = im[bbox[1]:bbox[3], bbox[0]:bbox[2]]
else:
if is_pil:
im_crop = np.asarray(crop_pil_image(im, bbox, context_pad=context_pad)[0])
else:
im_crop = np.asarray(crop_pil_image(Image.fromarray(im), bbox, context_pad=context_pad)[0]) # Image.fromarray(im, 'L')
# append to list
im_crop_list.append(im_crop)
return im_crop_list
def apply_bbox_regression(predicted_roi, rf_boxes, im_shape, num_classes=None, with_star_crop=True):
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
## if use_bbox_reg:
# select roi deltas
if with_star_crop:
roi_deltas = predicted_roi[4, ...].reshape([num_classes * 4, -1]).transpose()
else:
roi_deltas = predicted_roi.reshape([num_classes * 4, -1]).transpose()
# apply bounding-box regression deltas
pred_boxes = _bbox_transform_inv(rf_boxes, roi_deltas)
# make sure everything stays inside its limits
pred_boxes = clip_boxes(pred_boxes, im_shape)
return pred_boxes
def _split_detections(detections, boxes, axis=1, nsplits=2, sid=1):
assert (axis == 1) | (axis == 2)
boxes_split = np.array_split(boxes, nsplits, axis=axis-1)
dets_split = np.array_split(detections, nsplits, axis=axis)
# reshape to original format
det_vec = dets_split[sid].reshape([dets_split[sid].shape[0], -1]).transpose()
box_vec = boxes_split[sid].reshape([-1, boxes_split[sid].shape[-1]])
return det_vec, box_vec
def split_detections(detections, pred_boxes, rf_boxes, lbl_map_shape,
split_axis='h', nsplits=2, sid=1, num_classes=cfg.TEST.NUM_CLASSES):
if split_axis == 'h':
# horizontal
axis = 1
elif split_axis == 'v':
# vertical
axis = 2
# split detections
if cfg.TEST.BBOX_REG:
det_vec, box_vec = _split_detections(detections,
pred_boxes.reshape(list(lbl_map_shape) + [4 * num_classes]),
axis=axis, nsplits=nsplits, sid=sid)
else:
det_vec, box_vec = _split_detections(detections,
rf_boxes.reshape(list(lbl_map_shape) + [num_classes]),
axis=axis, nsplits=nsplits, sid=sid)
return det_vec, box_vec
def vis_detections(im, bboxes, scores=None, labels=None, thresh=0.3, max_vis=20, figs_sz=(14, 14), ax=None):
"""
Visualize bounding boxes on top of input image including labels / scores.
im: input image
bboxes: ndarray of bounding boxes
scores: list of scores with length equal bboxes.shape[0]
labels: list of integer labels with length equal bboxes.shape[0]
etc.
"""
if scores is None:
nvis = min(max_vis, bboxes.shape[0])
else:
assert len(scores) == bboxes.shape[0]
inds = np.where(scores > thresh)[0]
nvis = min(max_vis, len(inds))
# return if no bboxes to visualize
if nvis == 0:
return
# plot base figure
if ax is None:
fig, ax = plt.subplots(1, 1, figsize=figs_sz)
ax.imshow(im, cmap=cm.Greys_r)
# iterate over bboxes and add them
for i in xrange(nvis):
bbox = bboxes[i, :4]
# deal with scores
if scores is not None:
score = scores[i]
# only show boxes with score above threshold
if score <= thresh:
continue
# deal with labels
if isinstance(labels, str):
# if label is string
cls_name = labels
title_txt = labels
else:
# else assume index array
assert len(labels) == bboxes.shape[0]
cls_name = '{:.0f}'.format(labels[i])
title_txt = 'X'
# plt.cla()
# plt.imshow(im, cmap = cm.Greys_r)
ax.add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1], fill=False,
edgecolor='blue', alpha=0.5, linewidth=2.0)
)
if scores is None:
ax.text(bbox[0], bbox[1] - 2, '{:s}'.format(cls_name),
bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
ax.set_title('{}'.format(cls_name))
else:
ax.text(bbox[0], bbox[1] - 2, '{:s} {:.2f}'.format(cls_name, score),
bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
ax.set_title('{} {:.3f}'.format(cls_name, score))
ax.set_title('{} detections with p({} | box) >= {:.2f}'.format(nvis, title_txt, thresh), fontsize=14)
ax.xaxis.set_major_locator(ticker.MultipleLocator(200))
ax.yaxis.set_major_locator(ticker.MultipleLocator(200))
# plt.axis('off')
# plt.tight_layout()
# plt.draw()
def scale_detection_boxes(boxes, scale_factor):
# scale boxes depending on scale factor
return boxes * scale_factor
def correct_for_shift(boxes, correction):
# correct shift due to oversampling
# in order to correct ground truth boxes, e.g. center crop -> subtract half the shift from gt boxes
# in order to correct detection boxes, e.g. center crop -> add half the shift to detection boxes
return boxes + correction
def reverse_scaling(rf_boxes, pred_boxes, scaling=1):
# if used, should be applied right before post-processing detections
# reverse scaling of detection boxes
rf_boxes = scale_detection_boxes(rf_boxes, scaling)
pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
return rf_boxes, pred_boxes
def reverse_shift_and_scaling(rf_boxes, pred_boxes, shift=0, scaling=1):
# if used, should be applied right before post-processing detections
# correct shift of detection boxes due to center crop
rf_boxes = correct_for_shift(rf_boxes, shift)
pred_boxes = correct_for_shift(np.array(pred_boxes), shift)
# reverse scaling of detection boxes
rf_boxes = scale_detection_boxes(rf_boxes, scaling)
pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
return rf_boxes, pred_boxes
def post_process_detections(scores, pred_boxes, rf_boxes, num_classes=None, use_bbox_reg=None, nms_thresh=None):
# apply nms and filter low confidence boxes
# return list of good candidates
# all detections are collected into:
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
if use_bbox_reg is None:
use_bbox_reg = cfg.TEST.BBOX_REG
if nms_thresh is None:
nms_thresh = cfg.TEST.NMS
score_min_thresh = cfg.TEST.SCORE_MIN_THRESH
score_bg_thresh = cfg.TEST.SCORE_BG_THRESH
num_images = 1
all_boxes = [[[] for _ in range(num_images)]
for _ in range(num_classes)] # xrange vs range
for i in range(num_images): # xrange vs range
# load image and get detections from network
# [.....]
# skip j = 0, because it's the background class
for j in range(1, num_classes): # xrange vs range
# selection of boxes before NMS
inds = np.where((scores[:, j] > score_min_thresh) & (scores[:, 0] < score_bg_thresh))[0]
cls_scores = scores[inds, j]
if use_bbox_reg:
cls_boxes = pred_boxes[inds, j * 4:(j + 1) * 4] # bbox regression
else:
cls_boxes = rf_boxes[inds, :] # without bbox regression
cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
# apply nms suppression
keep = nms(cls_dets, nms_thresh)
cls_dets = cls_dets[keep, :]
all_boxes[j][i] = cls_dets
return all_boxes
def get_all_bboxes(all_boxes):
# take detections and all_boxes
# return enriched list of detections including bbox, score, and max label
num_classes = len(all_boxes)
dets_list = [[] for _ in xrange(num_classes)]
for j in xrange(1, num_classes):
if len(all_boxes[j][0]) > 0:
# get boxes
BB = all_boxes[j][0]
confidence = all_boxes[j][0][:, -1]
# sort boxes by confidence
sorted_ind = np.argsort(-confidence)
# sorted_scores = np.sort(-confidence)
BB = BB[sorted_ind, :]
# append together with class label
dets_list[j] = np.concatenate([BB, np.tile(j, reps=(BB.shape[0], 1))], axis=1)
# concatenate lists from different classes
return dets_list
def get_detection_bboxes(detections, all_boxes):
# take detections and all_boxes
# return enriched list of detections including bbox, score, and max label
num_classes = len(all_boxes)
dets_list = [[] for _ in xrange(num_classes)]
for j in xrange(1, num_classes):
if len(detections[j][0]) > 0:
# get boxes
BB = all_boxes[j][0]
confidence = all_boxes[j][0][:, -1]
# sort boxes by confidence
sorted_ind = np.argsort(-confidence)
# sorted_scores = np.sort(-confidence)
BB = BB[sorted_ind, :]
# select detections (indices require sorted detections - since sorted in evaluate)
inds = detections[j][0]
# append together with class label
dets_list[j] = np.concatenate([BB[inds, :], np.tile(j, reps=(inds.shape[0], 1))], axis=1)
# concatenate lists from different classes
return dets_list
def collect_detection_crops(input_im, dets_list, max_vis=5, context_pad=0):
# take tablet(input_im) and list of bboxes(dets_list)
# return cropped patches(dets_crops)
num_classes = len(dets_list)
dets_crops = [[] for _ in xrange(num_classes)]
for j in xrange(1, num_classes):
if len(dets_list[j]) > 0:
# get boxes
cls_dets = dets_list[j] # select class list
bboxes = cls_dets[:, :4] # remove any additional dims
ncrops = min(max_vis, bboxes.shape[0])
dets_crops[j] = crop_bboxes_from_im(input_im, bboxes[:ncrops, ...], context_pad)
return dets_crops
def plot_crop_list(dets_crops, gt_crops, scores=None, k=8, cls_label='', figs_sz=(14, 4.5), context_pad=0):
# plot co-detections of a single class
# can handle dets_crops and gt_crops together or both on their own
nvis = min(len(dets_crops), k)
ngt = len(gt_crops)
if nvis > 0:
# slice crops and scores
top_list = dets_crops[:nvis]
top_vals = scores[:nvis]
# prepare subplots (nvis or nvis + 1)
fig, axes = plt.subplots(1, nvis + (ngt > 0), figsize=figs_sz, squeeze=False) # , gridspec_kw={'wspace': 1}
axes = axes.ravel()
# plot idx
pid = 0
# plot ground truth in front if available
if ngt > 0:
axes[pid].imshow(gt_crops[0], cmap=cm.Greys_r)
axes[pid].set_yticks([])
axes[pid].set_xticks([])
axes[pid].set_title("gt [{}]".format(cls_label))
pid += 1
# iterate over top_list
for i, imcrop in enumerate(top_list):
axes[pid + i].imshow(imcrop, cmap=cm.Greys_r)
axes[pid + i].set_yticks([])
axes[pid + i].set_xticks([])
bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
# if there is no gt, add class label to title in first plot
if pid + i == 0 and ngt == 0:
axes[pid + i].set_title("class [{}] #{} p(x)={:.1f}".format(cls_label, i + 1, top_vals[i]))
else:
axes[pid + i].set_title("#{} p(x)={:.1f}".format(i + 1, top_vals[i]))
if context_pad > 0:
imw, imh = imcrop.shape[:2]
bbox = [context_pad, context_pad, imh - context_pad, imw - context_pad]
axes[pid + i].add_patch(plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0], bbox[3] - bbox[1],
fill=False, edgecolor='blue', linestyle='-',
alpha=0.3, linewidth=2.0))
elif ngt > 0:
nvis = 1
# plot top k
fig, axes = plt.subplots(1, nvis, figsize=figs_sz, squeeze=False)
axes = axes.ravel()
top_list = [gt_crops[0]] * nvis
top_vals = [1] * len(top_list)
for pid, imcrop in enumerate(top_list):
axes[0, pid].imshow(imcrop, cmap=cm.Greys_r)
bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
axes[0, pid].set_title("gt [{}]".format(cls_label))
axes[0, pid].set_yticks([])
axes[0, pid].set_xticks([])
def convert_detections_to_array(all_boxes, img_idx=0, idx_column=None):
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
total_labels = len(all_boxes) # all_boxes.shape[0]
temp = [0, 0, 0, 0, 0, 0, 0, 0, 0] # [ID, cx, cy, score, x1, y1, x2, y2, idx]
detections_arr = np.zeros((0, 9))
idx = 0
# convert to CLS, cx, cy, score
for i in range(total_labels):
for box in all_boxes[i][img_idx]:
temp[0] = i
temp[1] = (box[2] + box[0]) / 2
temp[2] = (box[3] + box[1]) / 2
temp[3] = box[4]
temp[4:8] = box[0:4]
if idx_column is None:
temp[8] = idx
else:
temp[8] = idx_column[idx]
idx += 1
detections_arr = np.vstack((detections_arr, temp))
# SORT BY SCORE!?
return detections_arr
+519
Ver Arquivo
@@ -0,0 +1,519 @@
import numpy as np
import pandas as pd
import torch
from torchvision import transforms as trafos
from skimage import draw
from skimage.color import label2rgb
from skimage.transform import hough_line, hough_line_peaks, probabilistic_hough_line
from skimage.morphology import skeletonize, skeletonize_3d, thin, medial_axis, watershed
from scipy import ndimage as ndi
from scipy.spatial.distance import pdist, cdist, squareform
from ..detection.detection_helpers import label_map2image, coord_in_image
# prepare input for line detection
def preprocess_line_input(pil_im, scale, shift=None):
""" produces five copies of the segment at slightly different offsets
:param pil_im: tablet segment that is to be processed
:param scale: scale which should be used for resizing
:param shift: offset shift used to produce five-fold oversampling
:return: 4D tensor with 5xCxWxH
"""
if shift is None:
shift = 0 # cfg.TEST.SHIFT
# compute scaled size
imw, imh = pil_im.size
imw = int(imw * scale)
imh = int(imh * scale)
# determine crop size
crop_sz = [int(imw - shift), int(imh - shift)]
# tensor-space transforms
ts_transform = trafos.Compose([
trafos.ToTensor(),
trafos.Normalize(mean=[0.5], std=[1]), # normalize
])
# compose transforms
tablet_transform = trafos.Compose([
trafos.Lambda(lambda x: x.convert('L')), # convert to gray
trafos.Resize((imh, imw)), # resize according to scale
trafos.FiveCrop((crop_sz[1], crop_sz[0])), # oversample
trafos.Lambda(
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
])
# apply transforms
im_list = tablet_transform(pil_im)
return im_list
def apply_detector(inputs, model_fcn, device):
with torch.no_grad(): # faster, less memory usage
inputs = inputs.to(device)
# apply network
output = model_fcn(inputs)
# convert to numpy
output = output.data.cpu().numpy()
return output
# prepare transliteration for line detection
def prepare_transliteration(tl_df, num_lines, stats):
"""
ATTENTION: this filters the transliteration according to status!
"""
# prepare transliteration for line detection
if num_lines > 0:
# only visible/not broken
tl_df = tl_df[tl_df.status > 0]
# compute line length
tl_df = tl_df.groupby('line_idx').apply(compute_line_length_from_tl, stats)
# get line statistics
num_vis_lines = tl_df.line_idx.nunique() # num visible lines (not broken lines)
# len_lines = tl_df.groupby('line_idx').pos_idx.count()
len_lines = tl_df.line_idx.value_counts()
len_min, len_max = len_lines.min(), len_lines.max()
else:
# TODO: if no tl info available, use initial line detection results to set these parameters
len_min, len_max = 4, 12
num_vis_lines = 40
return tl_df, num_vis_lines, len_min, len_max
# extract lines with hough transform
def compute_hough_transform(line_det_map1, line_det_map2, re_focus_angle=True):
# focus theta for cuneiform horizontal lines
theta_range = np.linspace(np.deg2rad(83), np.deg2rad(97), 50)
# theta_range = np.linspace(np.deg2rad(-90) ,np.deg2rad(90), 180) # normal range
# Classic straight-line Hough transform (usually angles from -90 to +90)
h, theta, d = hough_line(line_det_map1, theta=theta_range)
# debug
# plt.imshow(np.log(1 + h), extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]], cmap='gray', aspect=1/1.5)
# plt.show()
# focus angle and re-run
if re_focus_angle:
# get peaks
accum, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=16, num_peaks=50)
# get median angle
m_angle = np.median(np.rad2deg(angles))
# modify theta
theta_range = np.linspace(np.deg2rad(m_angle - 2), np.deg2rad(m_angle + 2), 50)
theta_range2 = np.linspace(np.deg2rad(m_angle - 3), np.deg2rad(m_angle + 3), 50)
# Classic straight-line Hough transform (usually angles from -90 to +90)
h, theta, d = hough_line(line_det_map2, theta=theta_range)
return h, theta, d, theta_range, theta_range2
# group lines together that are "close"
def shoelace_formula(points):
''' compute are of polygon according to shoelace
requires ordering of point coordinates
https://en.wikipedia.org/wiki/Shoelace_formula
:param points: 2xn matrix, where n is number of points (points need to be ordered!!)
:return: area of polygon
'''
area = 0
dmat = np.ones((2, 2))
for i in range(points.shape[1]):
dmat[:, 0] = points[:, i]
dmat[:, 1] = points[:, (i+1) % points.shape[1]]
area += np.linalg.det(dmat.transpose())
return np.abs(area) / 2.
def area_between_two_line_segments(spt1, spt2, lpt1, lpt2):
# compute area between line segments
# assume: line segments do not intersect and
# assume: pts should be order according to x-axis
# this means a valid order would be [spt1, spt2, lpt2, lpt1]
return shoelace_formula(np.stack([spt1, spt2, lpt2, lpt1], axis=1))
def nearby_and_near_parallel_2(l1, l2, interline_distance, interval=[0, 10]):
# compute area between line segments over interval
angle1, rad1 = l1
angle2, rad2 = l2
spt1, spt2 = line_pts_from_polar_line(angle1, rad1, x0=interval[0], x1=interval[1])
lpt1, lpt2 = line_pts_from_polar_line(angle2, rad2, x0=interval[0], x1=interval[1])
# use shoelace method
area = area_between_two_line_segments(spt1, spt2, lpt1, lpt2)
# check threshold
interval_interline_area = interline_distance * np.abs(interval[1] - interval[0])
# print area, interval_interline_area / 2.
if area < interval_interline_area / 2.:
return True
else:
return False
def nearby_and_near_parallel(l1, l2, interline_distance):
# simple filter
angle1, rad1 = l1
angle2, rad2 = l2
if np.abs(rad1 - rad2) < interline_distance/2. and np.abs(np.rad2deg(angle1-angle2)) < 1.0:
return True
else:
return False
def do_intersect_in_interval(l1, l2, interval):
# y = mx+c or in parametric form
# \rho = x \cos \theta + y \sin \theta
# \rho (radius) perpendicular distance from origin to the line
# \theta is the angle formed by this perpendicular line
angle1, rad1 = l1
angle2, rad2 = l2
lower, upper = interval
quotient = (np.cos(angle1) - np.cos(angle2))
if quotient == 0: # same angles
if np.abs(rad1 - rad2) < 3: # same radius
return True
else:
return False
else:
# compute intersection coordinate
x_intersect = (rad1 - rad2) / quotient
# inside interval
if (x_intersect >= lower) and (x_intersect <= upper):
return True
else:
return False
def compute_group_labels_from_dists(X_dist):
# assign labels to groups
# iterate over pairwise distances and
# get squareform
XX = squareform(X_dist)
# set dummy labels
labels = -np.ones(XX.shape[0])
# label lines while checking for neighbourhood
for ii in range(len(labels)):
if labels[ii] == -1:
labels[ii] = ii
# for each row in squareform indicates potential neighbors
for idx in np.where(XX[ii, :] > 0)[0]:
labels[idx] = labels[ii]
return labels
# associate lines with line segments
def line_pts_from_polar_line(angle, dist, x0=0, x1=10):
# computes two points defining a line from polar line representation
x0, x1 = x0 * np.ones_like(angle), x1 * np.ones_like(angle) # x0 = np.zeros_like(angle)
y0 = (dist - x0 * np.cos(angle)) / np.sin(angle)
y1 = (dist - x1 * np.cos(angle)) / np.sin(angle)
return (x0, y0), (x1, y1)
def line_params_from_pts(lpt1, lpt2):
# compute parameters a, b for line representation y = a * x + b
a = (lpt2[1] - lpt1[1])/float(lpt2[0] - lpt1[0])
b = lpt1[1] - lpt1[0] * a
return a, b
def normal_form_from_pts(p, q):
# takes two points an computes
# https://de.wikipedia.org/wiki/Normalenform#Aus_der_Zweipunkteform
# normal
n = np.array([-(q[1]-p[1]),
q[0]-p[0]], dtype=float)
# normalize
n_0 = n / np.linalg.norm(n)
# distance from origin
dist = np.dot(n_0, p)
return n_0, dist
def hess_normal_form_from_pts(p, q):
n_0, dist = normal_form_from_pts(p, q)
# angle in rad
rad = np.arctan(n_0[1]/n_0[0])
return rad, dist
def _offset_pt_to_normal_form_line(pt, n_0, dist):
return np.dot(n_0, pt) - dist
def _shift_pt_to_normal_form_line(pt, n_0, shift_dist):
return pt + n_0 * shift_dist
def clip_pt_using_normal_form(pt, n_0, dist, min_dist):
# compute offset
offset_line = _offset_pt_to_normal_form_line(pt, n_0, dist)
# check if correction necessary
if np.abs(offset_line) > min_dist:
# compute correction
if offset_line >= 0:
correction = offset_line - min_dist
else:
correction = offset_line + min_dist
# apply correction
pt = _shift_pt_to_normal_form_line(pt, n_0, -correction)
return pt
def clip_bbox_using_line(bbox, line_pts_arr, min_dist=128/2.):
# get normal form of line
n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
# compute distance to line and decide if pt needs to be shifted
pt_list = []
# iterate over two bounding box coordinates
for pt in [bbox[:2], bbox[2:]]:
pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
pt_list.append(pt)
return np.concatenate(pt_list)
def clip_bbox_using_line_segmentation(bbox, line_pts_arr, skeleton, min_dist=128/2.):
# get normal form of line
n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
# use bbox boundaries to crop pts from line segmentation
# seg_line_pts = np.nonzero(skeleton[:, int(bbox[0]):int(bbox[2])])[0]
# faster but more exclusive (probably worth the speedup)
seg_line_pts = np.nonzero(skeleton[int(bbox[1]):int(bbox[3]), int(bbox[0]):int(bbox[2])])[0] + int(bbox[1])
dist_delta = 0
if len(seg_line_pts) > 3:
# compute average y location of segmentation line pts
seg_line_cy = np.mean(seg_line_pts)
# determine local distance delta from linear model to skeleton
pt = [(bbox[0] + bbox[2]) / 2., seg_line_cy]
dist_delta = _offset_pt_to_normal_form_line(pt, n_0, dist)
# correct normal form of line [n_0, dist] using delta, ie. alter dist
dist = dist + dist_delta
#print dist_delta, dist - dist_delta, dist
# compute distance to line and decide if pt needs to be shifted
pt_list = []
# iterate over two bounding box coordinates
for pt in [bbox[:2], bbox[2:]]:
pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
pt_list.append(pt)
return np.concatenate(pt_list)
def dist_pt_line(pt, lpt1, lpt2):
# compute squared 'perpendicular distance'
# pt is point
# lpt are line points
# returns minimum (perpendicular) distance from point to line
# assumes line representation of form y = a * x + b
(a, b) = line_params_from_pts(lpt1, lpt2)
# from: energy based geometric model fitting (2010)
# https://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line#Another_formula
return (np.abs(pt[1]-a*pt[0]-b)/np.sqrt(a**2+1))**2
def dist_lineseg_line(spt1, spt2, lpt1, lpt2):
# computes the distance between line (unbounded) and line segment (bounded)
# spt are line segment points
# lpt are line points
# returns minimum distance from line segment to line
return min(dist_pt_line(spt1, lpt1, lpt2),
dist_pt_line(spt2, lpt1, lpt2))
def assign_line_segments_to_lines(line_segs, line_hypos, x1=10):
# get line pts from polar lines
polar_lines = line_hypos.groupby('label').mean()[['angle', 'dist']].values
line_pts = line_pts_from_polar_line(polar_lines[:, 0], polar_lines[:, 1], x1=x1)
line_pts = np.transpose(np.concatenate(line_pts))
# get line segments
line_seg_pts = np.stack(line_segs).reshape(len(line_segs), -1)
# compute distance between line segments and lines
X2_dist = cdist(line_pts, line_seg_pts,
lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
# assign line segments to nearest line
ls_labels = np.argmin(X2_dist, axis=0)
return ls_labels
# associate line segments with segments
def associate_segments_with_lines(lbl_ind, line_segs, ls_labels, group2line):
# create markers from line segments
im_marker = np.zeros_like(lbl_ind)
for line, li in zip(line_segs, ls_labels):
p0, p1 = line
rr, cc = draw.line(p0[1], p0[0], p1[1], p1[0])
im_marker[rr, cc] = int(group2line[li]) + 1 # avoid background class
# plt.imshow(im_marker)
# use water shed to assign labels to segments
distance = ndi.distance_transform_edt(lbl_ind)
segm_labels = watershed(-distance, im_marker, mask=lbl_ind)
return segm_labels, im_marker
# map segment lbls to image resolution (deal with network architecture with offset)
def compute_image_label_map(segm_labels, image_shape, padding=0):
# collect patch boxes and their labels
list_patch_boxes, list_patch_labels = [], []
for lbl_idx in np.unique(segm_labels):
if lbl_idx > 0:
# for index compute coordinate boxes
vx, vy = np.where(segm_labels == lbl_idx)
patch_boxes = label_map2image(vy, vx, segm_labels.shape[::-1]).astype(int)
# append
list_patch_boxes.append(patch_boxes)
list_patch_labels.append(patch_boxes.shape[0] * [lbl_idx])
# vis_detections(center_im, patch_boxes, max_vis=200, labels="")
patch_boxes = np.concatenate(list_patch_boxes, axis=0)
patch_labels = np.concatenate(list_patch_labels, axis=0)
# vis_detections(center_im, np.concatenate(list_patch_boxes, axis=0) , max_vis=1000, labels="")
# create segmentation map from boxes and labels
seg_canvas = np.zeros(image_shape[:2])
for bb, lbl in zip(patch_boxes, patch_labels):
pad = padding
bb[:2] = bb[:2] - pad
bb[2:] = bb[2:] + pad
# print patch_box, patch_lbl
seg_canvas[bb[1]:bb[3], bb[0]:bb[2]] = lbl
return seg_canvas
def compute_line_length_from_tl(group, stats, b=128.): # 128 / (2 * 32)
# collect widths
widths = np.zeros(len(group))
for ii, (sidx, sign_rec) in enumerate(group.iterrows()):
widths[ii] = stats.get_sign_width(sign_rec.lbl, sign_width=1) * b
# compute offsets and line length
sign_xpos = widths.cumsum() - (widths / 2.)
line_len = widths.sum()
# add columns to group
group['prior_line_len'] = np.rint(line_len)
group.loc[group.index, 'prior_sign_xoff'] = np.rint(sign_xpos)
group.loc[group.index, 'prior_sign_width'] = np.rint(widths)
return group
##### full pipeline
def post_process_line_detections(lbl_ind_x, num_lines, len_min, len_max, verbose=True):
# identify lines and merge them if too close together
# line hypothesis are stored in line_hypos dataframe
# line_hypos.label indicates which lines are grouped together(merged) -> line_hypo_agg
# (0) perform skeletonization
skeleton = skeletonize(lbl_ind_x)
# skeleton = skeletonize_3d(lbl_ind)
# skeleton = thin(lbl_ind)
# (1) compute hough transform
h, theta, d, theta_range, theta_range2 = compute_hough_transform(skeleton, skeleton) # skeleton, lbl_ind_x,
# (I) find peaks in hough transform
num_peaks_factor = 1.9 # 1.5 1.6 v007: 1.9 v047: 2.5 # line detector dependent (VIP)
hl_peak_threshold = (h.max() / float(len_max)) / 2. * len_min # 2. # has impact on lenght of lines found
accums, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=14,
num_peaks=int(num_lines * num_peaks_factor),
threshold=hl_peak_threshold)
# ugly patch for hough_line_peaks shortcomings
# in rare cases len(accums) != len(angles) or len(dists
if len(accums) != len(angles):
angles = accums
dists = accums
# (II) check if lines intersect close to the center and group them accordingly
interval = [lbl_ind_x.shape[1] * 1 / 8., lbl_ind_x.shape[1] * 7 / 8.]
X_dist = pdist(np.stack([angles, dists], axis=1), lambda l1, l2: do_intersect_in_interval(l1, l2, interval))
labels = compute_group_labels_from_dists(X_dist).astype(int)
if verbose:
print('detected groups: {} | num lines: {}.'.format(len(np.unique(labels)), num_lines))
# collect lines in dataframe
line_hypos = pd.DataFrame({'accum': accums, 'angle': angles, 'dist': dists, 'label': labels})
line_hypos_agg = line_hypos.groupby('label').mean()
# add group diff column
diffs = line_hypos_agg.dist.sort_values().diff()
# compute interline median
dist_interline_median = diffs.median()
# (III) check if remaining groups are very close
X_dist = pdist(np.stack([line_hypos_agg.angle.values, line_hypos_agg.dist.values], axis=1),
lambda l1, l2: nearby_and_near_parallel_2(l1, l2, dist_interline_median, interval))
updated_labels = compute_group_labels_from_dists(X_dist).astype(int)
# update dataframe and grouping
line_hypos.label.replace(to_replace=line_hypos_agg.index.values, value=updated_labels, inplace=True)
# (IV) re-group with updated labels and get line meta for later usage
line_hypos_agg = line_hypos.groupby('label').mean()
# group lines and remember index (needs to be here)
group2line = line_hypos_agg.index.values
# add column group dist diff
diffs = line_hypos_agg.dist.sort_values().diff()
diffs.name = 'group_diff' # set name before join
line_hypos = line_hypos.join(diffs, on='label')
# add column group angle diff
angle_diff = line_hypos_agg.sort_values('dist').angle.apply(np.rad2deg).diff()
angle_diff.name = 'group_angle_diff' # set name before join
line_hypos = line_hypos.join(angle_diff, on='label')
# compute interline median
dist_interline_median = diffs.median()
if verbose:
print('Update: detected groups: {} | num lines: {}.'.format(len(line_hypos_agg), num_lines))
# (V) label line detection segments according to line hypos
# compute probabilistic hough transform for lines
# line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=15,
# line_gap=6, theta=basic_theta)
line_length = 8 # v007: 8 v047: 5 # line detector dependent (VIP)
if len_max < line_length:
line_length = len_max
line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=line_length,
line_gap=6, theta=theta_range)
if len(line_segs) > 0:
# assign line segments to nearest line
ls_labels = assign_line_segments_to_lines(line_segs, line_hypos)
# associate segments with lines
segm_labels, im_marker = associate_segments_with_lines(lbl_ind_x, line_segs, ls_labels, group2line)
else:
segm_labels, ls_labels = lbl_ind_x, None # set segm_labels so that line_frag gets a shape
return line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line, h, theta, d, skeleton
+70
Ver Arquivo
@@ -0,0 +1,70 @@
import numpy as np
from tqdm import tqdm
from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector)
from ..utils.path_utils import make_folder
from skimage.morphology import skeletonize
def gen_line_detections(didx_list, dataset, saa_version, relative_path,
line_model_version, model_fcn, re_transform, device,
save_line_detections):
# for seg_im, seg_idx in dataset:
# iterate over segments
for didx in tqdm(didx_list, desc=saa_version):
# print(didx)
seg_im, gt_boxes, gt_labels = dataset[didx]
# access meta
seg_rec = dataset.get_seg_rec(didx)
image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
res_name = "{}{}".format(image_name, view_desc)
# make seg image is large enough for line detector
if seg_im.size[0] > 224 and seg_im.size[1] > 224:
# prepare input
inputs = preprocess_line_input(seg_im, 1, shift=0)
center_im = re_transform(inputs[4]) # to pil image
center_im = np.asarray(center_im) # to numpy
try:
# apply network
output = apply_detector(inputs, model_fcn, device)
# visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
# plt.show()
# prepare output
outprob = np.mean(output, axis=0)
lbl_ind = np.argmax(outprob, axis=0)
lbl_ind_x = lbl_ind.copy()
lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0 # 7
lbl_ind_80 = lbl_ind.copy()
lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0 # remove squeeze() from outprob in order to fix a bug!
# save line detections
if save_line_detections:
# line result folder
line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
make_folder(line_res_path)
# save lbl_ind_x
outfile = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
np.save(outfile, lbl_ind_x.astype(bool))
if False:
# compute skeleton
skeleton = skeletonize(lbl_ind_x)
# save skeleton
outfile = "{}/{}_skeleton.npy".format(line_res_path, res_name)
np.save(outfile, skeleton.astype(bool))
except Exception as e:
# Usually CUDA error: out of memory
print res_name, e.message, e.args
+184
Ver Arquivo
@@ -0,0 +1,184 @@
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
from ..alignment.LineFragment import plot_boxes
from ..utils.path_utils import make_folder
from ..utils.torchcv.box_coder_retina import RetinaBoxCoder
from ..utils.torchcv.box_coder_fpnssd import FPNSSDBoxCoder
from ..utils.torchcv.box import box_nms
from ..utils.torchcv.evaluations.voc_eval import voc_eval
from ..evaluations.sign_evaluation_prep import (prepare_ssd_outputs_for_eval, prepare_ssd_gt_for_eval,
get_pred_boxes_df, get_gt_boxes_df)
from ..evaluations.sign_evaluation import eval_detector, eval_detector_on_collection
from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
def gen_ssd_detections(didx_list, dataset, saa_version, relative_path,
model_version, fpnssd_net, with_64, create_bg_class, device,
test_min_score_thresh, test_nms_thresh, eval_ovthresh,
save_detections, show_detections, with_4_aspects=False, verbose_mode=True, return_eval=False):
list_pred_boxes_df, list_gt_boxes_df = [], []
list_seg_ap, list_seg_name_with_anno = [], []
# setup evaluators
use_new_eval = True
num_classes = 240
# eval_basic = SignEvalBasic(model_version, saa_version, eval_ovthresh)
eval_fast = SignEvalFast(model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
# iterate over segments
for didx in tqdm(didx_list, desc=saa_version):
# print(didx)
seg_im, gt_boxes, gt_labels = dataset[didx]
# access meta
seg_rec = dataset.get_seg_rec(didx)
image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
# for plots
input_im = np.asarray(seg_im)
# prepare box coder
# box_coder = RetinaBoxCoder()
box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects, create_bg_class=create_bg_class)
# prepare input
inputs = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
inputs = inputs.unsqueeze(0)
with torch.no_grad():
loc_preds, cls_preds = fpnssd_net(inputs.to(device))
box_preds, label_preds, score_preds = box_coder.decode(
loc_preds.cpu().data.squeeze(),
F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
if show_detections:
# plot prediction
plt.figure(figsize=(10, 10))
plot_boxes(box_preds, confidence=score_preds)
plt.imshow(input_im, cmap='gray')
plt.grid(True, color='w', linestyle=':')
plt.show()
# vis_detections(input_im, box_preds, scores=score_preds, labels=label_preds,
# thresh=0.01, max_vis=300, figs_sz=(15, 15)) #lbl2lbl[labels]
# plt.show()
# convert detections to all boxes format
all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
if save_detections:
res_name = "{}{}".format(image_name, view_desc)
res_path = "{}results/results_ssd/{}/{}".format(relative_path, model_version, saa_version)
# check folder
make_folder(res_path)
if True:
# Save detections
# outfile = "{}/{}.npy".format(res_path, res_name)
# np.save(outfile, scores)
# save all_boxes
outfile = "{}/{}_all_boxes.npy".format(res_path, res_name)
np.save(outfile, all_boxes)
if gt_boxes is not None:
if 0:
if verbose_mode:
# [METHOD A]: evaluate for a single segment (in tensor format)
print(voc_eval([box_preds.clone()], [label_preds.clone()], [score_preds.clone()],
[gt_boxes.clone()], [gt_labels.clone()], None,
iou_thresh=eval_ovthresh, use_07_metric=False)['map'])
# convert gt to numpy format
gt_boxes, gt_labels = prepare_ssd_gt_for_eval(gt_boxes, gt_labels)
if use_new_eval:
list_seg_name_with_anno.append(image_name + view_desc)
if verbose_mode:
print(image_name, view_desc)
# standard mAP eval
# eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
# fast evaluation
eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
else:
if verbose_mode:
# [METHOD B]: evaluate mAP and print stats for a single segment
# (these results can strongly differ from collection-wise evaluation)
acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=eval_ovthresh)
# collect results
list_seg_ap.append(df_stats['ap'].mean())
list_seg_name_with_anno.append(image_name + view_desc)
# prepare full collection evaluation
list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_rec.segm_idx))
list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_rec.segm_idx))
# full collection eval
if use_new_eval:
eval_fast.prepare_eval_collection()
df_stats, global_ap = eval_fast.eval_collection(verbose=verbose_mode)
if return_eval:
return global_ap, df_stats, eval_fast
else:
if verbose_mode:
return eval_fast.list_seg_mean_ap, list_seg_name_with_anno
else:
return global_ap, df_stats
else:
acc = 0
df_stats = pd.DataFrame()
if len(list_gt_boxes_df) > 0:
# [METHOD C]: compute mAP across all instances of individual classes
# (these results can strongly differ from segment-wise evaluation)
gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=eval_ovthresh)
if verbose_mode:
return list_seg_ap, list_seg_name_with_anno
else:
return acc, df_stats
def get_detections(fpnssd_net, device, seg_im, with_64, with_4_aspects, create_bg_class,
test_nms_thresh, test_min_score_thresh):
# prepare box coder
# box_coder = RetinaBoxCoder()
box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects,
create_bg_class=create_bg_class)
# prepare input
inputs = transforms.Compose([transforms.Lambda(lambda x: x.convert('L')),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
inputs = inputs.unsqueeze(0)
with torch.no_grad():
loc_preds, cls_preds = fpnssd_net(inputs.to(device))
box_preds, label_preds, score_preds = box_coder.decode(
loc_preds.cpu().data.squeeze(),
F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
# convert detections to all boxes format
all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
return all_boxes
+194
Ver Arquivo
@@ -0,0 +1,194 @@
import torch
from torch.autograd import Variable
import torchvision
from torchvision.transforms import Resize, FiveCrop, CenterCrop
from ..utils.transform_utils import crop_pil_image
def crop_segment_from_tablet_im(pil_im, seg_bbox, context_pad_frac=0):
"""
:param pil_im: full tablet image to crop from
:param seg_bbox: bbox coordinates [xmin, ymin, xmax, ymax]
:param context_pad_frac: the fraction of the minimum side length of bbox to use as padding
:return: cropped segment as pil image
"""
min_side = min((seg_bbox[2] - seg_bbox[0], seg_bbox[3]-seg_bbox[1]))
context_pad = min_side * context_pad_frac
# crop segment
segment_crop, new_bbox = crop_pil_image(pil_im, seg_bbox, context_pad=context_pad, pad_to_square=False)
return segment_crop, new_bbox
def rescale_segment_single(pil_im, scale):
""" Produce PIL image of segment at selected scale
:param pil_im: tablet segment that is to be processed
:param scale: scale used for resizing
:return: PIL image
"""
# compute scaled size
imw, imh = pil_im.size
imw = int(imw * scale)
imh = int(imh * scale)
# compose transforms
tablet_transform = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
Resize((imh, imw)), # resize according to scale
])
# apply transforms
input_im = tablet_transform(pil_im)
return input_im
def preprocess_segment_single(pil_im, scale):
""" produce tensor of segment at selected scale
:param pil_im: tablet segment that is to be processed
:param scale: scale used for resizing
:return: 4D tensors
"""
# compute scaled size
imw, imh = pil_im.size
imw = int(imw * scale)
imh = int(imh * scale)
# compose transforms
tablet_transform = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
Resize((imh, imw)), # resize according to scale
# tensor-space transforms
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
])
# apply transforms
input_tensor = tablet_transform(pil_im).unsqueeze(0)
return input_tensor
def preprocess_segment_multi_scale(pil_im, scales):
""" produces multiple copies of the segment at different scales
:param pil_im: tablet segment that is to be processed
:param scales: list of scales
:return: list of 3D tensors with different shapes (according to scales)
"""
# compute scaled size
imw, imh = pil_im.size
# tensor-space transforms
ts_transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
])
# compose transforms
tablet_transform = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
# resize according to scales
torchvision.transforms.Lambda(
lambda crop: [Resize((int(imh * scale), int(imw * scale)))(crop) for scale in scales]),
torchvision.transforms.Lambda(
lambda scaled_crops: [ts_transform(crop) for crop in scaled_crops]), # returns a 4D tensor
])
# apply transforms
im_list = tablet_transform(pil_im)
return im_list
def preprocess_segment_for_eval(pil_im, scale, shift=0):
""" produces five copies of the segment at slightly different offsets
:param pil_im: tablet segment that is to be processed
:param scale: scale which should be used for resizing
:param shift: offset shift used to produce five-fold oversampling
:return: 4D tensor with 5xCxWxH
"""
# compute scaled size
imw, imh = pil_im.size
imw = int(imw * scale)
imh = int(imh * scale)
# determine crop size
crop_sz = [int(imh - shift), int(imw - shift)]
# tensor-space transforms
ts_transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
])
# compose transforms
tablet_transform = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
Resize((imh, imw)), # resize according to scale
FiveCrop((crop_sz[0], crop_sz[1])), # oversample
torchvision.transforms.Lambda(
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
])
# apply transforms
input_tensor = tablet_transform(pil_im)
return input_tensor
def predict_im_list(model, im_list, use_gpu, min_sz=227):
""" applies model to list of 3D tensors (unlike a 4D tensor in predict())
:param model: network module that is used for the prediction
:param im_list: list of 3D tensors
:param use_gpu: boolean that indicates whether GPU is available
:param min_sz: minimum side length of input
:return: list of result tensors
"""
# apply network model
outputs = []
for in_im in im_list:
if (in_im.shape[1] >= min_sz) and (in_im.shape[2] >= min_sz):
# prepare input
if use_gpu:
in_var = Variable(in_im.cuda(), volatile=True) # volatile=True -> faster, less memory usage
else:
in_var = Variable(in_im, volatile=True)
output = model(in_var.unsqueeze(0))
outputs.append(output.data.cpu().numpy())
else:
outputs.append(None)
# convert to numpy
return outputs
def predict(model, inputs, use_gpu, use_bbox_reg=False):
""" applies model to 4D tensor (batch of images)
:param model: network module that is used for the prediction
:param inputs: 4D tensor (batch of images)
:param use_gpu: boolean that indicates whether GPU is available
:param use_bbox_reg: boolean that indicates whether to use bbox regression
:return: result tensor
"""
# prepare input
if use_gpu:
inputs = Variable(inputs.cuda(), volatile=True) # volatile=True -> faster, less memory usage
else:
inputs = Variable(inputs, volatile=True)
# apply network model
# output = model(inputs) # consumes to much memory
if use_bbox_reg:
scores, bboxes = [], []
for in_im in inputs:
o1, o2 = model(in_im.unsqueeze(0))
scores.append(o1)
bboxes.append(o2)
# concat and convert to numpy
output = torch.cat(scores, dim=0)
predicted = output.data.cpu().numpy()
output = torch.cat(bboxes, dim=0)
predicted_roi = output.data.cpu().numpy()
else:
scores = []
for in_im in inputs:
scores.append(model(in_im.unsqueeze(0)))
# concat and convert to numpy
output = torch.cat(scores, dim=0)
predicted = output.data.cpu().numpy()
predicted_roi = []
return predicted, predicted_roi
Ver Arquivo
+121
Ver Arquivo
@@ -0,0 +1,121 @@
# --------------------------------------------------------
# Adapted from Ross Girshick's Fast/er R-CNN code
# --------------------------------------------------------
import os.path as osp
import numpy as np
# `pip install easydict` if you don't have it
from easydict import EasyDict as edict
__C = edict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
#
# Detector options [legacy support]
# These options are only used for the Basic evaluation method, if not specified in the eval scripts directly.
#
__C.TEST = edict()
# Number classes considered during testing
__C.TEST.NUM_CLASSES = 240
# Min score for any class
# (if not any score larger than thresh, suppress box)
__C.TEST.SCORE_MIN_THRESH = 0.05 # 0.01
# Score threshold for ROI to be considered background
# (if bg score in (THRESH, 1], suppress box)
__C.TEST.SCORE_BG_THRESH = 0.7
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Test using bounding-box regressors (only works if a network trained for bbox_reg is evaluated)
__C.TEST.BBOX_REG = True
# Shift applied to the five different crops during oversampling
__C.TEST.SHIFT = 24
# Min overlap with ground truth box for positive detection (if IoU < this threshold, detection is a false positive)
__C.TEST.TP_MIN_OVERLAP = 0.5 # 0.4
# Data directory
__C.DATA_DIR = '/home/tobias/Datasets/cuneiform/'
# tablet directories
__C.DATA_TEST_DIR = __C.DATA_DIR + 'test_images/'
# FUNCTIONS for loading cfg
#
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
if type(a) is not edict:
return
for k, v in a.iteritems():
# a must specify keys that are in b
if k not in b: # not b.has_key(k):
raise KeyError('{} is not a valid config key'.format(k))
# the types must match, too
old_type = type(b[k])
if old_type is not type(v):
if isinstance(b[k], np.ndarray):
v = np.array(v, dtype=b[k].dtype)
else:
raise ValueError(('Type mismatch ({} vs. {}) '
'for config key: {}').format(type(b[k]),
type(v), k))
# recursively merge dicts
if type(v) is edict:
try:
_merge_a_into_b(a[k], b[k])
except:
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def cfg_from_list(cfg_list):
"""Set config keys via list (e.g., from command line)."""
from ast import literal_eval
assert len(cfg_list) % 2 == 0
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = __C
for subkey in key_list[:-1]:
assert subkey in d # d.has_key(subkey)
d = d[subkey]
subkey = key_list[-1]
assert subkey in d # d.has_key(subkey)
try:
value = literal_eval(v)
except:
# handle the case when v is a string literal
value = v
assert type(value) == type(d[subkey]), \
'type {} does not match original type {}'.format(
type(value), type(d[subkey]))
d[subkey] = value
+448
Ver Arquivo
@@ -0,0 +1,448 @@
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
from ast import literal_eval
import os.path
from ..alignment.LineFragment import compute_line_endpoints_by_hypo_idx
from ..detection.detection_helpers import radius_in_image
from ..detection.line_detection import line_params_from_pts, hess_normal_form_from_pts, dist_lineseg_line
class LineAnnotations(object):
def __init__(self, collection_name, coll_scales=None, interline_dist=128/2., relative_path='../'):
# basic paths
self.num_classes = 2
self.path_to_data_products = '{}data/annotations/'.format(relative_path)
self.coll_scales = coll_scales
self.interline_dist = interline_dist
# load collection annotations
self.anno_df = self.load_collection_annotations(collection_name)
if len(self.anno_df) > 0:
print('Load line annotations for {} dataset: {} found!'.format(collection_name,
self.anno_df.segm_idx.nunique()))
else:
print('No line annotations for {} dataset'.format(collection_name))
def load_collection_annotations(self, collection_name):
# assemble annotation file path
annotation_file = 'line_annotations_{}.csv'.format(collection_name)
annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
# check if annotation file exists
if os.path.isfile(annotation_file_path):
# read annotation file
anno_df = pd.read_csv(annotation_file_path, engine='python')
# apply scale
if self.coll_scales is not None:
scale_vec = self.coll_scales[anno_df.segm_idx].values
anno_df.x = (anno_df.x * scale_vec).round().astype(int)
anno_df.y = (anno_df.y * scale_vec).round().astype(int)
# assemble line segs
anno_df = anno_df.groupby('segm_idx').apply(assemble_line_segments)
## 0) prepare meta data columns
# add ls_x_seperate column (depends on assemble_line_segments)
anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(add_x_minmax)
anno_df = anno_df.groupby('segm_idx').apply(mark_x_seperate)
# add dist and dist_avg column
anno_df['dist'] = anno_df.line_segs.apply(set_line_param)
anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(set_mean)
##print anno_df
# add ls_vert_nb column (depends on assemble_line_segments)
#anno_df = anno_df.groupby('segm_idx').apply(mark_vert_nb, self.interline_dist * 0.8)
## 1) group lines together
# set inline
#anno_df['inline'] = [np.intersect1d(*el) for el in anno_df[['ls_vert_nb', 'ls_x_separate']].values]
#anno_df['inline'] = [np.empty(0, dtype=int)] * len(anno_df)
anno_df['inline'] = pd.Series([np.empty(0, dtype=int)] * len(anno_df), index=anno_df.index)
# further group line segments by order and ls_x_separate (should be respected when annotating data!)
anno_df = anno_df.groupby('segm_idx').apply(group_ls_by_order, self.interline_dist * 5) # * 3
# assign actual line idx
anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
## 2) refine ordering
# reset dist_avg based on gt_line_idx
anno_df = anno_df.groupby(['segm_idx', 'gt_line_idx']).apply(set_mean)
# assign actual line idx again
anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
# return data frame
return anno_df
else:
# return empty list (check later with len(.) to see if file exists)
return []
def select_df_by_segm_idx(self, segm_idx):
assert len(self.anno_df) > 0, 'No annotations available!'
# wrap pandas logic
return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
def visualize_line_annotations(self, segm_idx, input_im, show_line_seg_idx=False):
# plot line annotations
# get segment data frame
seg_line_df = self.select_df_by_segm_idx(segm_idx)
# check if any anno
if len(seg_line_df) > 0:
# create basic plot
fig, axes = plt.subplots(figsize=(10, 10))
grouped = seg_line_df.groupby('line_idx')
color = plt.cm.jet(np.linspace(0, 1, np.max(seg_line_df.line_idx) + 2))
for i, line_rec in grouped:
gt_line_idx = line_rec.gt_line_idx.values[0]
line_idx = line_rec.line_idx.values[0]
# print line_rec
axes.plot(line_rec.x.values, line_rec.y.values, linewidth=5, color=color[gt_line_idx],)
axes.text(line_rec.x.values[0], line_rec.y.values[0], '{}'.format(gt_line_idx),
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
if show_line_seg_idx:
axes.text(line_rec.x.values[1], line_rec.y.values[1], '{}'.format(line_idx),
bbox=dict(facecolor='red', alpha=0.5), fontsize=8, color='white')
# axes.set_yticks([])
# axes.set_xticks([])
# plot last so that axis get overwritten (no need to remove ticks :)
axes.imshow(input_im, cmap='gray')
plt.show()
def get_hypo_line_labeling_for_segm(self, segm_idx, line_hypos_agg, verbose=False):
# select line segment ground truth
seg_ls_df = self.select_df_by_segm_idx(segm_idx).copy()
# from n points only n-1 segments -> remove empty ones
seg_ls_df = seg_ls_df[seg_ls_df.line_segs.apply(len) > 0]
# check if any annotations found
if len(seg_ls_df) > 0:
# assign hypo lines to gt line segments
gt_line_segs = seg_ls_df.line_segs.values.tolist()
gt_ls_lbl, gt_ls_dist = assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg)
# update dataframe
seg_ls_df['hypo_line_lbl'] = gt_ls_lbl
seg_ls_df['hypo_line_dist'] = np.sqrt(gt_ls_dist)
# decide hypo line labels
seg_ls_df = seg_ls_df.groupby(['gt_line_idx']).apply(decide_hypo_line_lbl)
else:
if verbose:
print('No line ground truth available for segment idx [{}]!'.format(segm_idx))
return seg_ls_df
def get_assignment_for_line_hypos(self, segm_idx, line_hypos_agg):
# create empty dummy for cases where no annotations available
gt_line_assignment = pd.DataFrame()
if len(self.anno_df) > 0:
# get labelling
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
if len(seg_ls_df) > 0:
# in case of multiple annotations per hypo line, pick the one with smallest distance
gt_line_assignment = seg_ls_df.sort_values('hypo_line_dist').groupby('hypo_line_lbl').head(1)[
['gt_line_idx', 'hypo_line_lbl']]
gt_line_assignment = gt_line_assignment.sort_values('gt_line_idx')
return gt_line_assignment
def visualize_hypo_line_assignments(self, segm_idx, line_hypos_agg, input_im):
# get labelling
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
gt_ls_lbl = seg_ls_df.hypo_line_lbl.values
gt_line_segs = seg_ls_df.line_segs.values
# visualize
visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, input_im)
def visualize_gt_lines_with_assignments(self, segm_idx, line_hypos_agg, center_im):
# gt assignment
gt_line_assignment = self.get_assignment_for_line_hypos(segm_idx, line_hypos_agg)
# get labelling
seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
gt_ls_lbl = seg_ls_df.gt_line_idx.values
gt_line_segs = seg_ls_df.line_segs.values
# get line hypo endpoints
list_hypo_endpts = [np.fliplr(np.array(compute_line_endpoints_by_hypo_idx(hidx, line_hypos_agg)).
reshape(2, 2)).ravel() for hidx in gt_line_assignment.hypo_line_lbl.values]
# get color map
color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1)) # len(np.unique(gt_ls_lbl))
fig, axes = plt.subplots(1, 2, figsize=(15, 7))
ax = axes.ravel()
ax[0].imshow(center_im, cmap='gray')
ax[0].set_title('Input image')
ax[1].imshow(center_im * 0)
for line, li in zip(gt_line_segs, gt_ls_lbl):
p0, p1 = line
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
ax[1].set_xlim((0, center_im.shape[1]))
ax[1].set_ylim((center_im.shape[0], 0))
ax[1].set_title('gt line segments and assigned line hypos')
for idx, line_pts in enumerate(list_hypo_endpts):
ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
ax[1].text(line_pts[0], line_pts[1], '{}'.format(idx),
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
#### HELPERS
# create line segment column
def assemble_line_segments(group):
# assemble line segments
line_grouped = group.groupby('line_idx')
line_segs = []
# iterate over lines
for lidx, lgroup in line_grouped:
num_pts = len(lgroup)
# iterate over segments
for sidx in range(num_pts):
# assemble segments
if sidx == num_pts - 1:
line_segs.append(())
else:
line_segs.append(((lgroup.iloc[sidx].x, lgroup.iloc[sidx].y),
(lgroup.iloc[sidx + 1].x, lgroup.iloc[sidx + 1].y)
))
# assign to group
group['line_segs'] = line_segs
return group
# group line segments to line
def add_x_minmax(group):
group['xmin'] = group.x.min()
group['xmax'] = group.x.max()
return group
def mark_x_seperate(group):
# iterate line segments
list_left_or_right = []
for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
# create list of segments to the left
index_left = group.line_idx[group.xmax < line_seg.xmin].unique()
# create list of segments to the left
index_right = group.line_idx[group.xmin > line_seg.xmax].unique()
# concat and append to list
list_left_or_right.append(np.concatenate([np.array(index_left), np.array(index_right)]))
group['ls_x_separate'] = list_left_or_right
return group
def set_line_param(line_seg):
if len(line_seg) > 0:
# use basic line equation
#line_params = line_params_from_pts(line_seg[0], line_seg[1])
# use hess normal form (in corporates angle)
line_params = hess_normal_form_from_pts(line_seg[0], line_seg[1])
return line_params[1] # only interest in height
else:
return np.NaN
def set_mean(group):
group['dist_avg'] = group.dist.mean()
return group
def mark_vert_nb(group, interline_thresh):
# iterate line segments
list_vert_nb = []
for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
# create list of segments to the left
index_vert_near = group.line_idx[(group.dist >= 0) &
(np.abs(group.dist_avg - line_seg.dist_avg) < interline_thresh)].unique()
list_vert_nb.append(np.array(index_vert_near))
group['ls_vert_nb'] = list_vert_nb
return group
# def make_inline_symmetric(group):
# # iterate over line segments, and make symmetric reference of inline
# for i, (sidx, line_seg) in enumerate(group.iterrows()):
# if len(line_seg.inline) > 0:
# select_inline = group.line_idx.isin(line_seg.inline)
# group.loc[select_inline, 'inline'] = select_inline.sum() * [line_seg.inline]
# # deal with type mismatch in column (did find no better way :/)
# inline_list = []
# for el in group.inline.astype(list).values:
# if isinstance(el, np.ndarray):
# inline_list.append(el)
# else:
# inline_list.append(np.array([el]))
# group['inline'] = inline_list
# # return
# return group
def group_ls_by_order(group, interline_thresh):
last_lidx = -1
last_xseparate = []
# QUICK FIX: use this to deal with loc and list inserts (loc[idx] works rather than loc[idx, col]!!)
group_inline = group.inline
# iter line_idx aggregate
# https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group/49148885#49148885
ls_agg = group.sort_values('line_idx').groupby('line_idx').nth(0) #.first() is dangerous
for curr_lidx, ls_agg_rec in ls_agg.iterrows():
if last_lidx != -1:
# check if last line segment is x separate
if np.any(np.isin(ls_agg_rec.ls_x_separate, last_lidx)):
last_rec = ls_agg.loc[last_lidx]
# check if last line segment on the left
ls_left = (last_rec.xmax < ls_agg_rec.xmin)
if ls_left:
# check if vertical distance is small
vert_dist_is_small = np.abs(last_rec.dist_avg - ls_agg_rec.dist_avg) < interline_thresh
if vert_dist_is_small:
# check if already inline
if last_lidx not in ls_agg_rec.inline:
# print('merge line segments {} with {}'.format(curr_lidx, last_lidx))
# create new inlines
# do not use ls_agg_rec.inline, since it does not get updated during loop
#new_inline = np.concatenate([ls_agg_rec.inline, np.array([last_lidx])])
#new_last_inline = np.concatenate([ls_agg_rec.inline, np.array([curr_lidx])])
new_inline = np.concatenate([group_inline.loc[group.line_idx == curr_lidx].values[0], np.array([last_lidx])])
new_last_inline = np.concatenate([group_inline.loc[group.line_idx == last_lidx].values[0], np.array([curr_lidx])])
# add to data frame (loc[idx] works rather than loc[idx, col]!!)
select_line_idx = (group.line_idx == curr_lidx)
group_inline.loc[select_line_idx] = [new_inline] * select_line_idx.sum()
select_line_idx = (group.line_idx == last_lidx)
group_inline.loc[select_line_idx] = [new_last_inline] * select_line_idx.sum()
# set last values
last_lidx = curr_lidx
last_xseparate = ls_agg_rec.ls_x_separate
return group
# finalize assignment
def assign_actual_line_index(group):
# create new column
group['gt_line_idx'] = np.ones(len(group), dtype=int) * -1
# iterate over line segments and assign acutal_line_idx (segs sorted by 1) y position 2) x position)
new_idx = 0
for sidx, line_seg in group.sort_values(['dist_avg', 'x']).iterrows():
# check if index is already set
if group.loc[sidx, 'gt_line_idx'] == -1:
# assign index to line segment
group.loc[group.line_idx == line_seg.line_idx, 'gt_line_idx'] = new_idx
# assign same index to inline segments
for lidx in line_seg.inline:
group.loc[group.line_idx == lidx, 'gt_line_idx'] = new_idx
# finally increment index
new_idx += 1
# if index is already set, extend it to all inline members
else:
curr_idx = group.loc[sidx, 'gt_line_idx']
# assign same index to inline segments
for lidx in line_seg.inline:
group.loc[group.line_idx == lidx, 'gt_line_idx'] = curr_idx
return group
# for eval need to assign detection lines to ground truth lines
def assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg):
# get line pts from polar lines
line_pts = []
for idx in range(len(line_hypos_agg)):
# compute line endpoints
line_pts.append(compute_line_endpoints_by_hypo_idx(idx, line_hypos_agg))
#line_pts.append(line_frag.compute_line_endpoints(-1, hypo_idx=i))
line_pts = np.vstack(line_pts)
line_pts = np.flip(line_pts.reshape((-1, 2, 2)), axis=2).reshape(-1, 4)
# get line segments
line_seg_pts = np.stack(gt_line_segs).reshape(len(gt_line_segs), -1)
# compute distance between line segments and lines
X2_dist = cdist(line_pts, line_seg_pts,
lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
# assign line segments to nearest line
ls_labels = np.argmin(X2_dist, axis=0)
ls_dist = np.min(X2_dist, axis=0)
return ls_labels, ls_dist
def decide_hypo_line_lbl(group):
# count hypo line labels
uv, counts = np.unique(group.hypo_line_lbl, return_counts=True)
# get idx to all largest
largest_select = (np.max(counts) == counts)
# check if tiebreak is required
if largest_select.sum() > 1:
# for each similar large group compute mean hypo_line_dist and pick largest
tiebreak_df = group.groupby('hypo_line_lbl').hypo_line_dist.mean()
most_freq_hypo_lbl = tiebreak_df[uv[largest_select]].idxmax()
else:
most_freq_hypo_lbl = uv[np.argmax(counts)]
# assign most frequent label
group['hypo_line_lbl'] = most_freq_hypo_lbl
return group
# visualize
def visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, center_im, line_hypo_endpts=None):
color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1))
fig, axes = plt.subplots(1, 2, figsize=(15, 7))
ax = axes.ravel()
ax[0].imshow(center_im, cmap='gray')
ax[0].set_title('Input image')
# ax[1].imshow(lbl_ind_x, cmap='gray')
# ax[1].set_title('line det')
ax[1].imshow(center_im * 0)
for line, li in zip(gt_line_segs, gt_ls_lbl):
p0, p1 = line
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
ax[1].text(p0[0], p0[1], '{}'.format(li),
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
ax[1].set_xlim((0, center_im.shape[1]))
ax[1].set_ylim((center_im.shape[0], 0))
ax[1].set_title('gt line segments and assigned line hypos')
if line_hypo_endpts is not None:
for idx, line_pts in enumerate(line_hypo_endpts):
ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
+15
Ver Arquivo
@@ -0,0 +1,15 @@
# evaluate line-tl alignment using gt-line annotations
# only quality indicator because transliterations are unreliable
# (transliterations often miss lines or contain invisible lines)
def eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines):
tl_asst_eval = line_frag.line_hypos_agg[['gt_line_idx', 'tl_line']].dropna()
tl_asst_eval = tl_asst_eval[tl_asst_eval.tl_line >= 0] # do not count unassigned lines
print('LineHypos-TL assignment accuracy: {}'.format(
(tl_asst_eval.gt_line_idx == tl_asst_eval.tl_line).mean()))
# check if consistent gt and tl
num_lines_tl = num_vis_lines # line_frag.tl_df.line_idx.nunique()
num_lines_gt = lines_anno.select_df_by_segm_idx(seg_idx).gt_line_idx.nunique()
if num_lines_tl != num_lines_gt:
print('line annotation - transliteration mismatch: {} vs {} lines '.format(num_lines_gt, num_lines_tl))
+512
Ver Arquivo
@@ -0,0 +1,512 @@
import numpy as np
import pandas as pd
from tqdm import tqdm
from .config import cfg
from ..detection.detection_helpers import convert_detections_to_array
from ..utils.bbox_utils import box_iou
def voc_ap(rec, prec, use_07_metric=False):
""" ap = voc_ap(rec, prec, [use_07_metric])
Compute VOC AP given precision and recall.
If use_07_metric is true, uses the
VOC 07 11 point method (default:False).
Reference: Ross Girshick's Fast/er R-CNN code
"""
if use_07_metric:
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
# *BASIC* AP COMPUTATION (Fast RCNN style)
def evaluate_on_gt(gt_boxes, gt_labels, num_images, all_boxes, ovthresh=None, num_classes=None, use_07_metric=False):
# Reference: Ross Girshick's Fast/er R-CNN code
if ovthresh is None:
ovthresh = cfg.TEST.TP_MIN_OVERLAP
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
# all detections are collected into:
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
all_tp = [[[] for _ in xrange(num_images)]
for _ in xrange(num_classes)]
all_fp = [[[] for _ in xrange(num_images)]
for _ in xrange(num_classes)]
det_stats = []
total_num_tp = 0
total_false_cls = np.zeros(num_classes)
for j in xrange(1, num_classes): # num_classes
# if no detections for class available
if len(all_boxes[j][0]) == 0:
BB = np.empty((0, 4), dtype=np.float32)
confidence = np.empty(0, dtype=np.float32)
else:
BB = all_boxes[j][0][:, :4]
confidence = all_boxes[j][0][:, -1]
# sort by confidence
sorted_ind = np.argsort(-confidence)
sorted_scores = np.sort(-confidence)
BB = BB[sorted_ind, :]
inds = np.where(gt_labels == j)[0]
BBGT = gt_boxes[inds, :].astype(float)
npos = BBGT.shape[0]
det = [False] * npos
if npos > 0: # else if no gt boxes available for class, AP computation is not meaningful
# go down dets and mark TPs and FPs
nd = len(sorted_ind)
tp = np.zeros(nd)
fp = np.zeros(nd)
cls_tp = []
cls_fp = []
for d in range(nd):
bb = BB[d, :].astype(float)
ovmax = -np.inf
if BBGT.size > 0:
# compute overlaps
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps)
if ovmax > ovthresh:
if not det[jmax]:
tp[d] = 1.
det[jmax] = 1
cls_tp.append(d)
else:
# double detection (unlikely due to nms)
fp[d] = 1.
cls_fp.append(d) # comment?!
else:
fp[d] = 1.
cls_fp.append(d)
# save tp detections
all_tp[j][0] = np.array(cls_tp)
# save fp detections
all_fp[j][0] = np.array(cls_fp)
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
# print rec, prec, ap
num_tp = np.sum(det).astype(int)
total_num_tp += num_tp
det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
else:
if len(BB) > 0:
total_false_cls[j] += len(BB)
#print 'outlier class:', j, len(BB)
select_nonzero = total_false_cls > 0
# print(np.nonzero(select_nonzero), total_false_cls[select_nonzero])
return all_tp, all_fp, det_stats, total_num_tp #, total_false_cls
def df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=None, num_classes=None, use_07_metric=False):
# Reference: Ross Girshick's Fast/er R-CNN code
if ovthresh is None:
ovthresh = cfg.TEST.TP_MIN_OVERLAP
if num_classes is None:
num_classes = cfg.TEST.NUM_CLASSES
num_images = gt_boxes_df.seg_idx.nunique()
# sort by confidence
pred_boxes_df = pred_boxes_df.sort_values('conf', ascending=False)
det = [False] * len(gt_boxes_df)
det_stats = []
total_num_tp = 0
for j in tqdm(xrange(1, num_classes)): # num_classes
cls_dets_df = pred_boxes_df[pred_boxes_df.cls == j]
cls_gt_df = gt_boxes_df[gt_boxes_df.cls == j]
# get bounding box and image ids
BB = cls_dets_df[['x1', 'y1', 'x2', 'y2']].values
image_ids = cls_dets_df.seg_idx.values
# confidence = cls_dets_df.conf.values
npos = len(cls_gt_df)
if npos > 0: # else if no gt boxes available for class, AP computation is not meaningful
# go down dets and mark TPs and FPs
nd = len(cls_dets_df)
tp = np.zeros(nd)
fp = np.zeros(nd)
for d in range(nd):
ovmax = -np.inf
# get bbox and seg_idx
bb = BB[d, :].astype(float)
seg_idx = image_ids[d]
# get gt boxes
seg_cls_gt_df = cls_gt_df[cls_gt_df.seg_idx == seg_idx]
BBGT = seg_cls_gt_df[['x1', 'y1', 'x2', 'y2']].values.astype(float)
if BBGT.size > 0:
# compute overlaps
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps)
if ovmax > ovthresh:
# map seg_cls idx to global idx
gidx = seg_cls_gt_df.index.values[jmax]
if not det[gidx]:
tp[d] = 1.
det[gidx] = 1
else:
# double detection (unlikely due to nms)
fp[d] = 1.
else:
fp[d] = 1.
# compute num tp before cumsum (!)
num_tp = np.sum(tp).astype(int)
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
# print rec, prec, ap
total_num_tp += num_tp
det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
# print np.sum(det), total_num_tp
else:
if len(cls_dets_df) > 0:
if False: # turn on for debugging to see which classes are missing
print('outlier class:', j, len(BB))
return det_stats, total_num_tp
def eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=None, verbose=True):
# evaluate
num_imgs = 1
all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels, num_imgs, all_boxes,
ovthresh=ovthresh)
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
# print stats
pd.set_option('display.max_rows', 50)
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
if verbose:
print("total_tp", total_num_tp, "total_fp", total_num_fp,
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
acc = total_num_tp / float(total_num_tp + total_num_fp)
return acc, df_stats
def eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None):
det_stats, total_num_tp = df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=ovthresh)
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
# print stats
pd.set_option('display.max_rows', 50)
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
print('RESULTS ON FULL COLLECTION :')
print("total_tp", total_num_tp, "total_fp", total_num_fp,
"acc", '{:0.3f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
acc = total_num_tp / float(total_num_tp + total_num_fp)
return acc, df_stats
# *FAST* AP COMPUTATION
# prepare AP computation
def add_max_det(group):
# add column to dataframe
group['max_det'] = False
# select detections marked as TP
tp_group = group[group.det_type == 3]
# only one can be TP, others are double detections
if len(tp_group) > 0:
# set max entry to true
group.max_det.loc[tp_group.score.idxmax()] = True
return group
def add_det_type_column(eval_df, tp_thresh=0.5, bg_thresh=0.2):
# based on "Diagnosing Error in Object Detectors" by Hoiem et al.
# modifications:
# sim and other categories are merged, since every sign is considered similar
# bg_thresh is 0.2 instead of default 0.1
# determine detection types
type_list = []
for didx, det_rec in eval_df.iterrows():
overlap = det_rec.overlap
# class matches
if det_rec.pred == det_rec.true:
if overlap > tp_thresh:
type_list.append(3) # TP (3)
elif overlap > bg_thresh:
type_list.append(0) # FP: Loc(0) confusion
else:
type_list.append(2) # FP: BG(2) confusion
else:
if overlap > bg_thresh:
type_list.append(1) # FP: Sim/Oth(1) confusion
else:
type_list.append(2) # FP: BG(2) confusion
# add column to dataframe
eval_df['det_type'] = type_list
return eval_df
def prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, tp_thresh, bg_thresh):
""" prepare eval_df that contains most information for average precision computation """
# convert all_boxes to ndarray (N x 9)
# [ID, cx, cy, score, x1, y1, x2, y2, idx] bbox = [4:8] ctr = [1:3]
sign_detections = convert_detections_to_array(all_boxes)
# compute ious between detections and gt_boxes
ious = box_iou(sign_detections[:, 4:8], gt_boxes)
# for each detection get best fit with gt box
index_gt = np.argmax(ious, axis=1)
overlap_gt = np.max(ious, axis=1)
label_gt = gt_labels[index_gt]
# collect in data frame
eval_df = pd.DataFrame(np.hstack([overlap_gt.reshape(-1, 1), label_gt.reshape(-1, 1),
sign_detections[:, [0, 3, 8]], index_gt.reshape(-1, 1)]),
columns=['overlap', 'true', 'pred', 'score', 'det_idx', 'gt_idx'])
# add column with segment index
eval_df['seg_idx'] = seg_idx
# add det_type column (0:LOC, 1:SIM, 2:BG, 3:TP)
eval_df = add_det_type_column(eval_df, tp_thresh, bg_thresh)
# compute max_det (in order to fin double detections)
eval_df = eval_df.groupby('gt_idx').apply(add_max_det)
return eval_df
# AP computation
def compute_mean_ap(col_eval_df, gt_df, num_classes=240, class_list=None, verbose=True):
""" compute mean class AP """
# define list of classes to evaluate over
if class_list is None:
class_list = np.arange(1, num_classes) # range(1, num_classes)
col_eval_df = col_eval_df.sort_values('score', ascending=False)
if False:
# filter gt according to considered segments
bbox_anno = None
gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
gt_df['cls'] = gt_df.train_label
# compute class counts
gt_counts = gt_df.cls.value_counts()
det_stats = []
for cls_idx in class_list:
# get class predictions
cls_det_df = col_eval_df[col_eval_df.pred == cls_idx]
# get gt number
if cls_idx in gt_counts.index:
npos = gt_counts[cls_idx]
else:
npos = 0
if npos > 0:
if 1:
tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
fp_vec = ~tp_vec
# fp_vec = (cls_det_df.det_type < 3) | (cls_det_df.max_det == False)
fp = np.cumsum(fp_vec.values)
tp = np.cumsum(tp_vec.values)
assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
else:
# without considering double detections
fp = np.cumsum(cls_det_df.det_type < 3)
tp = np.cumsum(cls_det_df.det_type == 3)
rec = tp / float(npos)
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, False)
# sum is used to map empty list to 0
det_stats.append([npos, len(cls_det_df), np.sum(tp[-1:]), np.sum(fp[-1:]), ap, cls_idx])
else:
if len(cls_det_df) > 0:
if False: # turn on for debugging to see which classes are missing
print('outlier class:', cls_idx, len(cls_det_df))
# convert to ndarray
det_stats = np.asarray(det_stats)
mean_ap = np.mean(det_stats[:, -2])
# return aps
if verbose:
print('mAP {:.4}'.format(mean_ap))
return det_stats
def compute_global_ap(col_eval_df, gt_df, num_classes=240, verbose=True):
""" compute global AP """
# sort according to score
col_eval_df = col_eval_df.sort_values('score', ascending=False)
# not necessary, because predict classes are only in range [1, num_classes] anyways
cls_det_df = col_eval_df[col_eval_df.pred.isin(range(1, num_classes))]
if False:
# filter gt according to considered segments
bbox_anno = None
gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
gt_df['cls'] = gt_df.train_label
# filter considered classes
gt_df = gt_df[gt_df.cls.isin(range(1, num_classes))]
# select number of gt positives
npos = len(gt_df)
# npos = len(bbox_anno.anno_df.train_label[bbox_anno.anno_df.train_label > 0])
ap = 0
if npos > 0:
if 1:
tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
fp_vec = ~tp_vec
fp = np.cumsum(fp_vec)
tp = np.cumsum(tp_vec)
assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
else:
# without considering double detections
fp = np.cumsum(cls_det_df.det_type < 3)
tp = np.cumsum(cls_det_df.det_type == 3)
rec = tp / float(npos)
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, False)
if False:
from sklearn.metrics import precision_recall_curve, auc
import matplotlib.pyplot as plt
# compute normalized PR curve
precision, recall, _ = precision_recall_curve(tp_vec, cls_det_df.score.values)
# plot pr curve
plt.figure()
plt.step(recall, precision, color='b', alpha=0.2, where='post')
# plt.step(rec, prec, color='b', alpha=0.2) # works, but rec values not normalized to [0, 1] range
# compare different ways to compute VOC AP (ie. area under the precision recall curve)
# first two methods should produce same results, but there are slight differences
# in doubt use original VOC AP code
# https://datascience.stackexchange.com/questions/25119/how-to-calculate-map-for-detection-task-for-the-pascal-voc-challenge
# https://github.com/rafaelpadilla/Object-Detection-Metrics
plt.title('voc ap: {:.3} | PR AUC: {:.3} | norm. PR AUC: {:.3}'.format(voc_ap(rec, prec, False),
auc(rec, prec),
auc(recall, precision)))
plt.show()
# return ap
if verbose:
print('global AP {:.4}'.format(ap))
return ap
# FP categorization
def get_type_val_frac(fp_type_series, type_values=[0, 1, 2, 3], num_fp_thres=[5, 10, 25, 50, 100]):
# type_values = [0, 1, 2, 3]
# num_fp_thres = [5, 10, 25, 50, 100]
type_val_frac = np.zeros((len(num_fp_thres), len(type_values)))
for i, thres in enumerate(num_fp_thres):
type_counts = fp_type_series[:thres].value_counts(normalize=True, sort=True)
for j, val in enumerate(type_values):
val_check = type_counts.index.values == val
if np.any(val_check):
val_idx = np.argmax(val_check)
type_val_frac[i, j] = type_counts.iloc[val_idx]
return type_val_frac
+156
Ver Arquivo
@@ -0,0 +1,156 @@
import pandas as pd
import numpy as np
from ast import literal_eval
import os.path
from .config import cfg
from ..detection.detection_helpers import scale_detection_boxes, correct_for_shift, crop_bboxes_from_im
# class to wrap annotations
class BBoxAnnotations(object):
def __init__(self, collection_name, relative_path='../'):
# basic paths
self.data_root = cfg.DATA_TEST_DIR
self.num_classes = cfg.TEST.NUM_CLASSES
self.path_to_data_products = '{}data/annotations/'.format(relative_path)
# load collection annotations
self.anno_df = self.load_collection_annotations(collection_name)
if len(self.anno_df) > 0:
print('Load bbox annotations for {} dataset: {} found!'.format(collection_name,
self.anno_df.segm_idx.nunique()))
else:
print('No bbox annotations for {} dataset'.format(collection_name))
def load_collection_annotations(self, collection_name):
# assemble annotation file path
annotation_file = 'bbox_annotations_{}.csv'.format(collection_name)
annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
# check if annotation file exists
if os.path.isfile(annotation_file_path):
# read annotation file
anno_df = pd.read_csv(annotation_file_path, engine='python')
# convert string of list to list
anno_df['relative_bbox'] = anno_df['relative_bbox'].apply(literal_eval)
anno_df['bbox'] = anno_df['bbox'].apply(literal_eval)
# return data frame
return anno_df
else:
# return empty list (check later with len(.) to see if file exists)
return []
def select_anno_df_by_segm_idx(self, segm_idx):
# wrap pandas logic
return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
def select_anno_df_by_cdli_and_view(self, cdli, view):
# wrap pandas logic
return self.anno_df[(self.anno_df.tablet_CDLI == cdli) & (self.anno_df.view_desc == view)]
# static functions
def get_boxes_and_labels(anno_df):
# retrieves gt_boxes and gt_labels from anno_df
#gt_boxes = np.stack(anno_df.bbox.values)
if len(anno_df) > 0:
gt_boxes = np.stack(anno_df.relative_bbox.values) # use relative bbox
gt_labels = anno_df.train_label.values
else:
gt_boxes, gt_labels = np.array([]), np.array([]) # just dummy
return gt_boxes, gt_labels
def get_class_gt_boxes(gt_boxes, gt_labels, cls_id):
inds = np.where(gt_labels == cls_id)[0]
return gt_boxes[inds, :]
def apply_scaling_and_shift(gt_boxes, scaling=1, shift=0):
# if used, should be applied before calling eval
# apply scaling of detection boxes
gt_boxes = scale_detection_boxes(gt_boxes, scaling)
# apply shift of detection boxes due to center crop
gt_boxes = correct_for_shift(gt_boxes, shift)
return gt_boxes
def apply_scaling(gt_boxes, scaling=1):
# if used, should be applied before calling eval
# apply scaling of detection boxes
gt_boxes = scale_detection_boxes(gt_boxes, scaling)
return gt_boxes
def collect_gt_crops(gt_boxes, gt_labels, im, num_classes, max_vis=2):
# takes tablet image
# returns list of ground truth crops organized by class
gt_crops = [[] for _ in xrange(num_classes)]
for j in xrange(1, num_classes):
BBGT = get_class_gt_boxes(gt_boxes, gt_labels, j).astype(float)
npos = BBGT.shape[0]
if npos > 0:
# get boxes
bboxes = BBGT[:, :4] # remove any additional dims
ncrops = min(max_vis, bboxes.shape[0])
gt_crops[j] = crop_bboxes_from_im(im, bboxes[:ncrops, :])
return gt_crops
def prepare_segment_gt(segm_idx, segm_scale, bbox_anno, with_star_crop=False):
# this is how things work together
# create empty lists in case no annotations available
gt_boxes, gt_labels = [], []
if len(bbox_anno.anno_df) > 0:
# select annotations for specific segment
sub_anno_df = bbox_anno.select_anno_df_by_segm_idx(segm_idx)
# get boxes and labels
gt_boxes, gt_labels = get_boxes_and_labels(sub_anno_df)
# adapt gt boxes to input format
if with_star_crop:
gt_boxes = apply_scaling_and_shift(gt_boxes, scaling=segm_scale, shift=-cfg.TEST.SHIFT / 2.)
else:
gt_boxes = apply_scaling(gt_boxes, scaling=segm_scale)
# return selected ground truth
return gt_boxes, gt_labels
# def get_pred_boxes_df(all_boxes, seg_idx):
# # iterate list
# list_boxes = []
# list_cls_idx = []
# for cls, boxes in enumerate(all_boxes):
# num_boxes = len(boxes[0])
# if num_boxes > 0:
# list_boxes.append(boxes)
# list_cls_idx.extend([cls] * num_boxes)
# # create df
# pred_boxes_df = pd.DataFrame() # []
# if len(list_boxes) > 0:
# pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
# pred_boxes_df['cls'] = list_cls_idx
# pred_boxes_df['seg_idx'] = seg_idx
#
# return pred_boxes_df
#
#
# def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
# # create df
# gt_boxes_df = pd.DataFrame() # []
# if len(gt_boxes) > 0:
# gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
# gt_boxes_df['cls'] = gt_labels
# gt_boxes_df['seg_idx'] = seg_idx
# return gt_boxes_df
+101
Ver Arquivo
@@ -0,0 +1,101 @@
import pandas as pd
import numpy as np
def get_pred_boxes_df(all_boxes, seg_idx):
# iterate list
list_boxes = []
list_cls_idx = []
for cls, boxes in enumerate(all_boxes):
num_boxes = len(boxes[0])
if num_boxes > 0:
list_boxes.append(boxes)
list_cls_idx.extend([cls] * num_boxes)
# create df
pred_boxes_df = pd.DataFrame() # []
if len(list_boxes) > 0:
pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
pred_boxes_df['cls'] = list_cls_idx
pred_boxes_df['seg_idx'] = seg_idx
return pred_boxes_df
def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
# create df
gt_boxes_df = pd.DataFrame() # []
if len(gt_boxes) > 0:
gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
gt_boxes_df['cls'] = gt_labels
gt_boxes_df['seg_idx'] = seg_idx
return gt_boxes_df
# SSD specific
def convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, total_labels=240):
# convert from ssd detector format to all_boxes
all_boxes = [[] for _ in range(total_labels)]
for boxes, labels, scores in zip(pred_boxes, pred_labels, pred_scores):
for bbox, lbl, score in zip(boxes, labels, scores):
# temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
# copy data to _new_ all_boxes
box = np.zeros((1, 5))
box[0, :4] = bbox
box[0, 4] = score
all_boxes[np.int(lbl)].append(box)
# for each class stack list of bounding boxes together
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
return all_boxes
def prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds, num_classes=240):
if len(box_preds) > 0:
# Wrap VOC evaluation for PyTorch
pred_boxes = [b.numpy() for b in [box_preds]]
pred_labels = [label.numpy() for label in [label_preds]]
pred_scores = [score.numpy() for score in [score_preds]]
# convert to all boxes and stack tiles (better would be to have single tile for whole segment)
all_boxes = convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, num_classes)
all_boxes = [[el] for el in all_boxes]
else:
# deal with case if there are not any detections
all_boxes = [[] for _ in range(num_classes)]
all_boxes = [[el] for el in all_boxes]
return all_boxes
def prepare_ssd_gt_for_eval(gt_boxes, gt_labels):
gt_boxes = [b.numpy() for b in [gt_boxes]]
gt_labels = [label.numpy() for label in [gt_labels]]
return gt_boxes[0], gt_labels[0]
# alignment specific
def convert_to_all_boxes(seg_gen_annos, relative_bboxes, scale, num_labels):
all_boxes = [[] for _ in range(num_labels)]
for anno_idx, anno_rec in seg_gen_annos.iterrows():
# [x1, y1, x2, y2, score]
box = np.zeros((1, 5))
box[0, :4] = np.array(relative_bboxes[anno_idx]) * scale
box[0, 4] = anno_rec.det_score
# assign to class
all_boxes[anno_rec.newLabel].append(box)
# for each class stack list of bounding boxes together
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
return all_boxes
+187
Ver Arquivo
@@ -0,0 +1,187 @@
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from .sign_evaluation_prep import (get_pred_boxes_df, get_gt_boxes_df)
from .sign_evaluation import (eval_detector, eval_detector_on_collection, prepare_eval_df,
compute_global_ap, compute_mean_ap)
# *BASIC* EVALUATION
class SignEvalBasic(object):
def __init__(self, model_version, collection_name, eval_ovthresh=0.5):
self.model_version = model_version
self.coll_name = collection_name
self.eval_ovthresh = eval_ovthresh
self.list_seg_mean_ap = []
self.list_df_stats = []
#self.list_seg_name_with_anno = []
self.list_pred_boxes_df = []
self.list_gt_boxes_df = []
self.gt_boxes_df = pd.DataFrame()
self.pred_boxes_df = pd.DataFrame()
def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
# evaluate and print stats
acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=self.eval_ovthresh, verbose=verbose)
# collect results
self.list_seg_mean_ap.append(df_stats['ap'].mean())
self.list_df_stats.append(df_stats)
#list_seg_name_with_anno.append(image_name + view_desc)
# prepare full collection evaluation
if type(all_boxes[0]) is np.ndarray:
self.list_pred_boxes_df.append(get_pred_boxes_df([el.tolist() for el in all_boxes], seg_idx))
self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
else:
self.list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_idx))
self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
def prepare_eval_collection(self):
self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
self.pred_boxes_df = pd.concat(self.list_pred_boxes_df, ignore_index=True)
def eval_collection(self, verbose=True):
self.prepare_eval_collection()
acc, df_stats = eval_detector_on_collection(self.gt_boxes_df, self.pred_boxes_df, ovthresh=self.eval_ovthresh)
return acc, df_stats
# *FAST* EVALUATION
class SignEvalFast(object):
def __init__(self, model_version, collection_name, tp_thresh=0.5, bg_thresh=0.2, num_classes=240):
self.model_version = model_version
self.coll_name = collection_name
self.tp_thresh = tp_thresh
self.bg_thresh = bg_thresh
self.num_classes = num_classes
self.list_seg_mean_ap = []
self.list_eval_df = []
self.list_gt_boxes_df = []
self.list_seg_global_ap = []
self.col_eval_df = pd.DataFrame()
self.gt_boxes_df = pd.DataFrame()
def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
# get eval_df
eval_df = prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, self.tp_thresh, self.bg_thresh)
# get gt_df
gt_df = get_gt_boxes_df(gt_boxes, gt_labels, seg_idx)
mean_ap, global_ap, mean_ap_align = 0., 0., 0.
if len(eval_df) > 0 and len(gt_df[gt_df.cls > 0]) > 0:
# eval
det_stats = compute_mean_ap(eval_df, gt_df, self.num_classes, verbose=False)
global_ap = compute_global_ap(eval_df, gt_df, self.num_classes, verbose=False)
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
mean_ap = np.mean(df_stats.ap)
# mean_ap_align = np.mean(df_stats.ap[df_stats.ap.nonzero()[0]])
mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0]) # only consider classes with detections
if verbose:
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(*get_summary(eval_df, gt_df)))
else:
if verbose:
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {}'.format(mean_ap, global_ap, mean_ap_align))
print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(0, 0, 0, 0.))
# append
self.list_seg_mean_ap.append(mean_ap)
self.list_seg_global_ap.append(global_ap)
self.list_eval_df.append(eval_df)
self.list_gt_boxes_df.append(gt_df)
def prepare_eval_collection(self, verbose=False):
if len(self.col_eval_df) == 0:
if len(self.list_eval_df) > 0: # only concat if there is anything to concat
# concat dataframes
self.col_eval_df = pd.concat(self.list_eval_df)
self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
if verbose:
print(self.col_eval_df.det_type.value_counts())
print("num det:", len(self.col_eval_df))
print("num TP (without double detections):",
len(self.col_eval_df[(self.col_eval_df.max_det == True)
& (self.col_eval_df.det_type == 3)]))
def eval_collection(self, verbose=True):
# concat dataframes
self.prepare_eval_collection()
global_ap = 0
df_stats = pd.DataFrame()
if len(self.gt_boxes_df) > 0:
# full collection eval
det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
global_ap = compute_global_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
mean_ap = np.mean(det_stats[:, -2])
mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0]) # only consider classes with detections
if verbose:
print('{} | {}'.format(self.coll_name, self.model_version))
print('RESULTS ON FULL COLLECTION :')
print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
print ("total_tp: {} | total_fp: {} [{}] | prec: {:.3}".format(*self.get_col_summary()))
return df_stats, global_ap
def eval_collection_class_freq(self, freq_classes_list):
# freq_classes_list: sorted list of most frequent classes (in descending order)
# concat dataframes
self.prepare_eval_collection()
# compute mAP for different sets of topk most frequent classes
topk_list = [2, 4, 8, 16, 32, 64, 128, 192, 256]
topk_mAP_list = []
for topk in topk_list:
print("over {} most freq classes".format(topk))
det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes,
class_list=freq_classes_list[:topk])
mean_ap = np.mean(det_stats[:, -2])
topk_mAP_list.append(mean_ap)
# plot
plt.figure()
plt.plot(topk_list, topk_mAP_list, "o-")
plt.title('{} - {}'.format(self.coll_name, self.model_version))
plt.ylabel('mAP')
plt.xlabel('topk')
# plt.xscale('log')
def get_seg_summary(self, didx):
""" didx: index of segment in list of segments to evaluate """
num_tp, num_fp, num_fp_global, acc = get_summary(self.list_eval_df[didx], self.list_gt_boxes_df[didx])
mean_ap = self.list_seg_mean_ap
global_ap = self.list_seg_global_ap
return num_tp, num_fp, num_fp_global, acc, mean_ap, global_ap
def get_col_summary(self):
num_tp, num_fp, num_fp_global, acc = get_summary(self.col_eval_df, self.gt_boxes_df)
return num_tp, num_fp, num_fp_global, acc
def get_summary(col_eval_df, gt_boxes_df):
if len(gt_boxes_df) > 0 and len(col_eval_df) > 0:
select_tp = (col_eval_df.det_type == 3) & (col_eval_df.max_det == True)
select_fp = (~select_tp) & col_eval_df.pred.isin(gt_boxes_df.cls.unique())
num_tp = select_tp.sum()
num_fp = select_fp.sum()
num_fp_global = (~select_tp).sum()
return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)
else:
return 0, 0, 0, 0.
+129
Ver Arquivo
@@ -0,0 +1,129 @@
import numpy as np
import pandas as pd
import editdistance
import Levenshtein
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction
from .sign_evaluation import evaluate_on_gt
# deprecated (should be handled by sign_evaluator)
def get_eval_stats(gt_boxes, gt_labels, aligned_list):
# evaluate
num_imgs = 1
all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels,
num_imgs, [[el] for el in aligned_list])
total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
# print stats
pd.set_option('display.max_rows', 50)
df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
print("total_tp", total_num_tp, "total_fp", total_num_fp,
# here precision = accuarcy
"acc", '{:0.2f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
"mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
"mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
acc = total_num_tp / float(total_num_tp + total_num_fp)
return acc, df_stats
# deprecated (should be handled by sign_evaluator)
def compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=False):
# only run if gt available
if len(gt_boxes) > 0:
acc, df_stats = get_eval_stats(gt_boxes, gt_labels, aligned_list)
if return_stats:
return acc, df_stats
else:
return acc
else:
return -1
def convert_alignments_for_eval(detections, total_labels=240):
# convert from RANSAC format (Nx9) to all_boxes
all_boxes = [[] for _ in range(total_labels)]
for temp in detections:
# temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
# copy data to _new_ all_boxes
box = np.zeros((1, 5))
box[0, :4] = temp[4:8]
box[0, 4] = temp[3]
all_boxes[np.int(temp[0])].append(box)
# for each class stack list of bounding boxes together
all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
return all_boxes
# SCORE FUNCTIONS
def compute_bleu_score(candidate_words, reference_words):
reference = [reference_words]
candidate = candidate_words
# compute score
# deal with issue
# https://github.com/nltk/nltk/issues/1554
hyp_lengths = len(reference_words)
weights = (0.25, 0.25, 0.25, 0.25)
if hyp_lengths < 4:
if hyp_lengths == 0:
weights = (0, )
else:
weights = (1 / float(hyp_lengths), ) * hyp_lengths
chencherry = SmoothingFunction()
score = sentence_bleu(reference, candidate, weights=weights, smoothing_function=chencherry.method1)
return score
def compute_levenshtein(candidate, reference, normalize=True):
edist = 0
if len(reference) > 0:
# strict normalization in [0,1] range
edist = editdistance.eval(reference, candidate)
if normalize:
edist = float(edist) / max(len(reference), len(candidate))
return edist
def compute_cer(candidate, reference):
# character error rate (see also WER)
# character accuracy 1 - CER
edist = 0
if len(reference) > 0:
edist = editdistance.eval(reference, candidate)
edist = float(edist) / len(reference)
return edist
def compute_levenshtein_ops(candidate, reference, normalize=True):
# https://rawgit.com/ztane/python-Levenshtein/master/docs/Levenshtein.html
ops_dict = {'insert': 0, 'delete': 1, 'replace': 2}
# print candidate, reference
edist = 0
edit_ops = np.zeros(len(ops_dict))
if len(reference) > 0:
# convert to string for Levenshtein function
candidate_str = u''.join([unichr(lbl) for lbl in candidate])
reference_str = u''.join([unichr(lbl) for lbl in reference])
# compute ed ops
ops_df = pd.DataFrame(Levenshtein.editops(candidate_str, reference_str), columns=['type', 'ixA', 'ixB'])
edist = len(ops_df)
# collect types
for op, ii in ops_dict.iteritems():
edit_ops[ii] = len(ops_df[ops_df.type == op])
if normalize:
edist = float(edist) / max(len(reference), len(candidate))
return edist, edit_ops
+7
Ver Arquivo
@@ -0,0 +1,7 @@
### Network architectures
- `linenet.py` : a modified AlexNet used for line segmentation
- `mobilenetv2_mod03.py` : a modified MobileNetV2 used as backbone for the sign detector
- `mobilenetv2_fpn.py` : a FPN network wrapper for the backbone architecture
- `trained_model_loader.py` : contains functions to load sign detector and line segmentation models;
describes how detectors are assembled from parts.
Ver Arquivo
+192
Ver Arquivo
@@ -0,0 +1,192 @@
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.nn.functional as F
# HELPER FUNCTIONS
def initialize_weights(model):
for m in model.modules():
if isinstance(m, nn.Conv2d):
# n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
# m.weight.data.normal_(0, math.sqrt(2. / n))
# init.xavier_normal(m.weight.data)
# init.kaiming_normal(m.weight.data)
init.normal_(m.weight.data, std=0.01)
# check if bias = True
if hasattr(m.bias, 'data'):
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.005)
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
# check if affine = True
if hasattr(m.bias, 'data'):
m.weight.data.fill_(1)
m.bias.data.zero_()
def copy_layer_params(target, source):
""" Copy layer parameters from source to target; size of arrays needs to match! """
target.weight.data.copy_(source.weight.data.view(target.weight.size()))
target.bias.data.copy_(source.bias.data.view(target.bias.size()))
# HELPER MODULES
class LRN(nn.Module):
def __init__(self, local_size=1, alpha=1.0, beta=0.75, ACROSS_CHANNELS=False):
super(LRN, self).__init__()
self.ACROSS_CHANNELS = ACROSS_CHANNELS
if self.ACROSS_CHANNELS:
# make it work with pytorch 0.2.X # hacky!!! should be ConstantPadding
# self.average = nn.Sequential(
# nn.ReplicationPad3d(padding=(0, 0, 0, 0, int((local_size - 1.0) / 2), int((local_size - 1.0) / 2))),
# nn.AvgPool3d(kernel_size=(local_size, 1, 1), stride=1),
# )
self.average = nn.AvgPool3d(kernel_size=(local_size, 1, 1),
stride=1,
padding=(int((local_size - 1.0) / 2), 0, 0))
else:
self.average = nn.AvgPool2d(kernel_size=local_size,
stride=1,
padding=int((local_size - 1.0) / 2))
self.alpha = alpha
self.beta = beta
def forward(self, x):
if self.ACROSS_CHANNELS:
div = x.pow(2).unsqueeze(1)
div = self.average(div).squeeze(1)
div = div.mul(self.alpha).add(1.0).pow(self.beta)
else:
div = x.pow(2)
div = self.average(div)
div = div.mul(self.alpha).add(1.0).pow(self.beta)
x = x.div(div)
return x
class Softmax3D(nn.Module):
def forward(self, input_):
batch_size = input_.size()[0]
output_ = torch.stack([F.softmax(input_[i]) for i in range(batch_size)], 0)
return output_
# MAIN MODULES
class LineNet(nn.Module):
def __init__(self, num_classes=1000, input_channels=3):
super(LineNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(input_channels, 64, kernel_size=11, stride=4, padding=0),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
LRN(alpha=1e-4, beta=0.75, local_size=1),
nn.Conv2d(64, 256, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
LRN(alpha=1e-4, beta=0.75, local_size=1),
nn.Conv2d(256, 384, kernel_size=3, padding=1),
nn.BatchNorm2d(384, affine=False, momentum=.1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 384, kernel_size=3, padding=1),
nn.BatchNorm2d(384, affine=False, momentum=.1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256, affine=False, momentum=.1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.fc6 = nn.Linear(256 * 6 * 6, 512)
self.score = nn.Linear(512, 240)
self.classifier = nn.Sequential(
self.fc6,
nn.ReLU(inplace=True),
nn.Dropout(),
self.score,
)
self.line_score = nn.Linear(512, num_classes)
self.line_classifier = nn.Sequential(
self.fc6,
nn.ReLU(inplace=True),
nn.Dropout(),
self.line_score,
)
initialize_weights(self)
def legacy_forward(self, x):
x = self.features(x)
x = x.view(x.size(0), 256 * 6 * 6) # x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), 256 * 6 * 6) # x = x.view(x.size(0), -1)
x = self.line_classifier(x)
return x
class LineNetFCN(nn.Module):
def __init__(self, original_model, num_classes=240):
super(LineNetFCN, self).__init__()
# simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
self.features = original_model.features
# create new module and assign features
# original_net = CuneiNet(input_channels=1)
# self.features = original_net.features
# self.features.load_state_dict(original_model.features.state_dict())
# softmax function
self.softmax = nn.Softmax2d() ## Softmax3D(),
# create fcn head
self.classifier = nn.Sequential(
nn.Conv2d(256, 512, kernel_size=6, padding=0),
nn.ReLU(inplace=True),
# nn.Dropout(), DO NOT USE 1d dropout!!!
nn.Dropout2d(),
nn.Conv2d(512, num_classes, kernel_size=1, padding=0),
# self.softmax # not here to
)
# perform net surgery
self.net_surgery(original_model)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
x = self.softmax(x)
# batch_size = x.size()[0]
# x = torch.stack([F.softmax(x[i]) for i in range(batch_size)], 0)
return x
def get_conv_features(self, x):
x = self.features(x)
return x
def get_fc_features(self, x):
x = self.features(x)
x = self.classifier(x)
return x
def net_surgery(self, original_model):
""" perform net surgery
original.classifier --> fcn.classifier
"""
for i, l1 in enumerate(original_model.line_classifier):
if isinstance(l1, nn.Linear):
l2 = self.classifier[i]
# l2.weight.data.copy_(l1.weight.data.view(l2.weight.size()))
# l2.bias.data.copy_(l1.bias.data.view(l2.bias.size()))
copy_layer_params(l2, l1)
+93
Ver Arquivo
@@ -0,0 +1,93 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
class MobileNetV2FPN(nn.Module):
def __init__(self, original_model, num_classes=240, width_mult=1, with_p4=False):
super(MobileNetV2FPN, self).__init__()
# simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
self.features = original_model.features
self.conv6 = nn.Conv2d(512, 256, kernel_size=3, stride=2, padding=1)
# Top-down layers
self.toplayer = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0)
self.with_p4 = with_p4
if self.with_p4:
# Lateral layers
self.latlayer1 = nn.Conv2d(int(32*width_mult), 256, kernel_size=1, stride=1, padding=0)
# Smooth layers
self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
# init weights (exclude features) TODO
self._initialize_weights(['conv6', 'toplayer', 'latlayer1', 'smooth1'])
def forward(self, x):
for i in range(3):
x = self.features[i](x)
c4 = self.features[3](x) # 14 x 32*width_mult (expansion factor does not affect output of block)
x = self.features[4](c4)
x = self.features[5](x)
x = self.features[6](x) # 7 x 160*width_mult
c5 = self.features[7](x) # 7 x 512
p6 = self.conv6(c5)
# Top-down
p5 = self.toplayer(c5)
if self.with_p4:
p4 = self._upsample_add(p5, self.latlayer1(c4))
p4 = self.smooth1(p4)
return p4, p5, p6
else:
return p5, p6
def _upsample_add(self, x, y):
'''Upsample and add two feature maps.
Args:
x: (Variable) top feature map to be upsampled.
y: (Variable) lateral feature map.
Returns:
(Variable) added feature map.
Note in PyTorch, when input size is odd, the upsampled feature map
with `F.upsample(..., scale_factor=2, mode='nearest')`
maybe not equal to the lateral feature map size.
e.g.
original input size: [N,_,15,15] ->
conv2d feature map size: [N,_,8,8] ->
upsampled feature map size: [N,_,16,16]
So we choose bilinear upsample which supports arbitrary output sizes.
'''
_, _, H, W = y.size()
return F.upsample(x, size=(H, W), mode='bilinear', align_corners=False) + y
def _initialize_weights(self, name_list):
for name, m in self.named_modules():
# only init modules in name_list
if name in name_list:
# exclude self.features, Mobile_blocks
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
+160
Ver Arquivo
@@ -0,0 +1,160 @@
import torch.nn as nn
import math
def conv_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
def conv_1x1_bn(inp, oup):
return nn.Sequential(
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
self.use_res_connect = self.stride == 1 and inp == oup
self.conv = nn.Sequential(
# pw
nn.Conv2d(inp, inp * expand_ratio, 1, 1, 0, bias=False),
nn.BatchNorm2d(inp * expand_ratio),
nn.ReLU6(inplace=True),
# dw
nn.Conv2d(inp * expand_ratio, inp * expand_ratio, 3, stride, 1, groups=inp * expand_ratio, bias=False),
nn.BatchNorm2d(inp * expand_ratio),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(inp * expand_ratio, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class MobileBlock(nn.Module):
# introduced to simplify creation of FPN
def __init__(self, residual_setting, input_channel, output_channel):
super(MobileBlock, self).__init__()
t, c, n, s = residual_setting
block_seq = []
for i in range(n):
if i == 0:
block_seq.append(InvertedResidual(input_channel, output_channel, s, t))
else:
block_seq.append(InvertedResidual(input_channel, output_channel, 1, t))
input_channel = output_channel
self.mobile_block = nn.Sequential(*block_seq)
self.output_channel = output_channel
def forward(self, x):
return self.mobile_block(x)
class MobileNetV2(nn.Module):
def __init__(self, n_class=1000, input_size=224, input_dim=1, width_mult=1., arch_opt=1):
super(MobileNetV2, self).__init__()
# setting of inverted residual blocks
self.interverted_residual_setting = [
# t, c, n, s
[1, 16, 1, 2],
#[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 1],
# [6, 320, 1, 1],
]
# set arch option
self.arch_opt = arch_opt
# building first layer
assert input_size % 32 == 0
input_channel = int(32 * width_mult)
# self.last_channel = int(1280 * width_mult) if width_mult > 1.0 else 1280
if self.arch_opt == 1:
self.last_channel = int(512 * width_mult) if width_mult > 1.0 else 512
elif self.arch_opt == 2:
self.last_channel = int(256 * width_mult) if width_mult > 1.0 else 256
self.features = [conv_bn(input_dim, input_channel, 2)]
# building inverted residual blocks
for ii, residual_setting in enumerate(self.interverted_residual_setting):
t, c, n, s = residual_setting
output_channel = int(c * width_mult)
new_block = MobileBlock(residual_setting, input_channel, output_channel)
self.features.append(new_block)
input_channel = new_block.output_channel
# building last several layers
self.features.append(conv_1x1_bn(input_channel, self.last_channel))
if self.arch_opt == 1:
self.features.append(nn.AvgPool2d(input_size / 32, stride=1))
# need stride=1 for FCN (because default stride is kernel_sz)
# self.features.append(nn.MaxPool2d(kernel_size=input_size / 32, stride=1))
# building classifier
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(self.last_channel, n_class),
)
elif self.arch_opt == 2:
# building classifier
self.classifier = nn.Sequential(
nn.Linear(self.last_channel * 7 * 7, 384),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(384, n_class),
)
# make it nn.Sequential
self.features = nn.Sequential(*self.features)
self._initialize_weights()
def forward(self, x):
for layer in self.features:
x = layer(x)
# x = x.view(-1, self.last_channel)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
+99
Ver Arquivo
@@ -0,0 +1,99 @@
import torch
import os
from .linenet import LineNet, LineNetFCN
from .mobilenetv2_mod03 import MobileNetV2
from .mobilenetv2_fpn import MobileNetV2FPN
from ..utils.torchcv.models.net import FPNSSD
from ..utils.torchcv.models.rpn_net import RPN
def get_cunei_net_basic(model_version, device, arch_type, arch_opt=1, width_mult=0.5,
relative_path='../../', num_classes=240, num_c=1):
# create classifier model
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
arch_opt=arch_opt)
# load pretrained weights
weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, model_version)
basic_net.load_state_dict(torch.load(weights_path)) # , strict=False
# deploy to device and switch to train
basic_net.to(device)
basic_net.eval() # ATTENTION!
return basic_net
def get_line_net_fcn(model_version, device, relative_path='../../', num_classes=2, num_c=1):
# choose model filename
weights_path = '{}results/weights/lineNet_basic_{}.pth'.format(relative_path, model_version)
assert os.path.exists(weights_path), "File '{}' not found!".format(weights_path)
# load model definition
model_ft = LineNet(num_classes=num_classes, input_channels=num_c)
# load model weights
model_ft.load_state_dict(torch.load(weights_path), strict=False)
# create fully-convolutional version (convolutionalize)
model_fcn = LineNetFCN(model_ft, num_classes)
# deploy model to device
model_fcn = model_fcn.to(device)
# switch model to evaluation mode
# model_fcn.train(False)
model_fcn.eval()
return model_fcn
def get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
relative_path='../../', num_classes=240, num_c=1, rnd_init_model=False):
# create classifier model
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
arch_opt=arch_opt)
# create FPN model with classifier model
fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
# load full detector net
fpnssd_net = FPNSSD(fpn_net, num_classes)
if not rnd_init_model:
# load pretrained weights
weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
fpnssd_net.load_state_dict(torch.load(weights_path, map_location=device)) # , strict=False
# deploy to device and switch to train
fpnssd_net.to(device)
fpnssd_net.eval()
return fpnssd_net
def get_rpn_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
relative_path='../../', num_classes=240, num_c=1):
# create classifier model
basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
arch_opt=arch_opt)
# create FPN model with classifier model
fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
# load full detector net
rpn_net = RPN(fpn_net, num_classes, with_64)
# load pretrained weights
weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
rpn_net.load_state_dict(torch.load(weights_path)) # , strict=False
# deploy to device and switch to train
rpn_net.to(device)
rpn_net.eval()
return rpn_net
+32
Ver Arquivo
@@ -0,0 +1,32 @@
# -*- coding: utf-8 -*-
"""
@author: tdencker
"""
import pandas as pd
class SignsStats(object):
def __init__(self, tblSignHeight=128, stats_csv_file='../../data/unicode_sign_stats.csv'):
self.tblSignHeight = tblSignHeight
self.sign_df = None
# load stats file
self.load_stats_from_file(stats_csv_file)
def load_stats_from_file(self, stats_csv_file):
# Load sign stats
sign_df = pd.read_csv(stats_csv_file)
sign_df = sign_df.set_index('train_lbl')
# assign
self.sign_df = sign_df
def get_sign_width(self, train_lbl, sign_width=None):
""" Return width of sign from stats """
# check default sign width
if sign_width is None:
sign_width = self.tblSignHeight
if train_lbl in self.sign_df.index:
sign_width = self.sign_df.width.loc[train_lbl]
return sign_width
+51
Ver Arquivo
@@ -0,0 +1,51 @@
import pandas as pd
from ..utils.path_utils import *
class TransliterationSet:
def __init__(self, collections=[], relative_path='../../'):
# load list of coll_tl_df
list_coll_tl_df = []
for collection in collections:
coll_tl_file = '{}data/transliterations/transliterations_{}.csv'.format(relative_path, collection)
# check if transliteration exists
if os.path.isfile(coll_tl_file):
print('Transliteration file {} found!'.format(coll_tl_file))
# load transliteration
coll_tl_df = pd.read_csv(coll_tl_file)
# select subset of columns
coll_tl_df = coll_tl_df[['segm_idx', 'tablet_CDLI', 'train_label', 'mzl_label', 'line_idx', 'pos_idx', 'status']]
coll_tl_df['lbl'] = coll_tl_df['train_label']
coll_tl_df['mzl_lbl'] = coll_tl_df['mzl_label']
else:
print('Transliteration file {} NOT found!'.format(coll_tl_file))
coll_tl_df = pd.DataFrame()
# append coll_tl_df to list
list_coll_tl_df.append(coll_tl_df)
# make accessible
self.collections = collections
self.list_coll_tl_df = list_coll_tl_df
def get_tl_df(self, seg_rec, verbose=True):
# init empty tl
num_lines = 0
tl_df = pd.DataFrame()
# select corresponding coll_tl_df
collection = seg_rec.collection
coll_idx = self.collections.index(collection)
coll_tl_df = self.list_coll_tl_df[coll_idx]
# check if transliterations available
if len(coll_tl_df) > 0:
# select corresponding tl_df slice in coll_df
tl_df = coll_tl_df[coll_tl_df.segm_idx == seg_rec.name]
# compute number lines
num_lines = tl_df.line_idx.nunique()
# report if transliteration is missing
if len(tl_df) == 0:
if verbose:
print('No transliteration found for {}!'.format(seg_rec.tablet_CDLI))
return tl_df, num_lines
Ver Arquivo
+57
Ver Arquivo
@@ -0,0 +1,57 @@
import pandas as pd
def load_cunei_mzl_df(path_to_csv='./cunei_mzl.csv', filter=False):
cunei_mzl_df = pd.read_csv(path_to_csv, index_col=0)
# avoid mzl idx without codepoint
cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
# deal with multiple versions
#cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
# create composite sign
cunei_mzl_df['comp_script'] = cunei_mzl_df[['script_0', 'script_1', 'script_2']].fillna('').apply(
lambda x: ''.join(x), axis=1)
# decode to unicode (for matching with oracc utf8)
cunei_mzl_df.comp_script = cunei_mzl_df.comp_script.apply(lambda x: x.decode('utf8'))
if filter:
# avoid mzl idx without codepoint
cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
# deal with multiple versions
cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
return cunei_mzl_df
# def get_unicode(mzl_idx, cunei_mzl_df):
# select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
# if select_mzl_idx.any():
# cpt_hex = cunei_mzl_df.codepoint_0[select_mzl_idx].str[2:].values[0] # get hex
# cpt_int = int(cpt_hex, 16) # convert to int
# return unichr(cpt_int)
# else:
# return mzl_idx
def get_unicode_comp(mzl_idx, cunei_mzl_df):
# also handle composite signs by concatenation
select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
if select_mzl_idx.any():
cunei_rec = cunei_mzl_df[select_mzl_idx]
out_str = ''
for i in range(cunei_rec.num_cpts):
cpt_hex = cunei_rec['codepoint_{}'.format(i)].str[2:].values[0] # get hex
cpt_int = int(cpt_hex, 16) # convert to int
out_str += unichr(cpt_int)
return out_str
else:
return mzl_idx
def get_sign_name(mzl_idx, cunei_mzl_df):
select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
if select_mzl_idx.any():
cunei_rec = cunei_mzl_df[select_mzl_idx]
#return cunei_rec['Sign Name'].str.decode('utf8').item()
return cunei_rec['Sign Name'].str.decode('utf8').str.split('(').str[0].item()
else:
return mzl_idx
+28
Ver Arquivo
@@ -0,0 +1,28 @@
import json
import numpy as np
def get_label_list(path_to_lbl_file='../../data/newLabels.json'):
# get list that maps old -> new
# load label list
with open(path_to_lbl_file) as json_data:
lbl_list = json.load(json_data)
return lbl_list
def get_lbl2lbl(path_to_lbl_file):
# get list that maps new -> old
# actually using lbl_list with index function works as well !
# load label list
lbl_list = np.asarray(get_label_list(path_to_lbl_file))
# print np.unique(lbl_list)
# reverse (assume mapping is unique)
lbl2lbl = np.zeros(len(np.unique(lbl_list)), ) # 240
for (i, val) in enumerate(lbl_list):
lbl2lbl[val] = i # new -> old
# since mapping is not unique for 0, need to set manually to background
lbl2lbl[0] = 0
return lbl2lbl
Ver Arquivo
+200
Ver Arquivo
@@ -0,0 +1,200 @@
# --------------------------------------------------------
# Parts of code adapted from Ross Girshick's Fast/er R-CNN code
# --------------------------------------------------------
import numpy as np
def unique_boxes(boxes, scale=1.0):
"""Return indices of unique boxes."""
v = np.array([1, 1e3, 1e6, 1e9])
hashes = np.round(boxes * scale).dot(v)
_, index = np.unique(hashes, return_index=True)
return np.sort(index)
def xywh_to_xyxy(boxes):
"""Convert [x y w h] box format to [x1 y1 x2 y2] format."""
return np.hstack((boxes[:, 0:2], boxes[:, 0:2] + boxes[:, 2:4] - 1))
def xyxy_to_xywh(boxes):
"""Convert [x1 y1 x2 y2] box format to [x y w h] format."""
return np.hstack((boxes[:, 0:2], boxes[:, 2:4] - boxes[:, 0:2] + 1))
def convert_bbox_global2local(gbbox, seg_bbox):
relative_bbox = np.array(gbbox) - np.array(seg_bbox[:2] * 2)
return relative_bbox.tolist()
def convert_bbox_local2global(lbbox, seg_bbox):
global_bbox = np.array(lbbox) + np.array(seg_bbox[:2] * 2)
return global_bbox.tolist()
def validate_boxes(boxes, width=0, height=0):
"""Check that a set of boxes are valid."""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
assert (x1 >= 0).all()
assert (y1 >= 0).all()
assert (x2 >= x1).all()
assert (y2 >= y1).all()
assert (x2 < width).all()
assert (y2 < height).all()
def filter_small_boxes(boxes, min_size):
w = boxes[:, 2] - boxes[:, 0]
h = boxes[:, 3] - boxes[:, 1]
keep = np.where((w >= min_size) & (h > min_size))[0]
return keep
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
usage for single: bb_new = clip_boxes(bb[np.newaxis, :], [imw, imh]).squeeze()
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
# def clip_boxes(boxes, im_shape):
# """Clip boxes to image boundaries."""
# # x1 >= 0
# boxes[:, 0::4] = np.maximum(boxes[:, 0::4], 0)
# # y1 >= 0
# boxes[:, 1::4] = np.maximum(boxes[:, 1::4], 0)
# # x2 < im_shape[1]
# boxes[:, 2::4] = np.minimum(boxes[:, 2::4], im_shape[1] - 1)
# # y2 < im_shape[0]
# boxes[:, 3::4] = np.minimum(boxes[:, 3::4], im_shape[0] - 1)
# return boxes
def intersection_over_union(Reframe, GTframe):
# by Oemer
x1 = Reframe[0]
y1 = Reframe[1]
width1 = Reframe[2] - Reframe[0]
height1 = Reframe[3] - Reframe[1]
x2 = GTframe[0]
y2 = GTframe[1]
width2 = GTframe[2] - GTframe[0]
height2 = GTframe[3] - GTframe[1]
endx = max(x1 + width1, x2 + width2)
startx = min(x1, x2)
width = width1 + width2 - (endx - startx)
endy = max(y1 + height1, y2 + height2)
starty = min(y1, y2)
height = height1 + height2 - (endy - starty)
if width <= 0 or height <= 0:
ratio = 0
else:
Area = width * height
Area1 = width1 * height1
Area2 = width2 * height2
ratio = Area * 1. / (Area1 + Area2 - Area)
# return IOU
return ratio # Reframe,GTframe
def bb_intersection_over_union(box_a, box_b):
# adopted from https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(box_a[0], box_b[0])
yA = max(box_a[1], box_b[1])
xB = min(box_a[2], box_b[2])
yB = min(box_a[3], box_b[3])
# compute the area of intersection rectangle
inter_area = (xB - xA + 1) * (yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
box_a_area = (box_a[2] - box_a[0] + 1) * (box_a[3] - box_a[1] + 1)
box_b_area = (box_b[2] - box_b[0] + 1) * (box_b[3] - box_b[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the intersection area
if (xB - xA + 1) <= 0 or (yB - yA + 1) <= 0:
iou = 0
else:
iou = inter_area / float(box_a_area + box_b_area - inter_area)
# return the intersection over union value
return iou
def box_iou(box1, box2):
'''Compute the intersection over union of two set of boxes.
TD: modified to be legacy compatible
The box order must be (xmin, ymin, xmax, ymax).
Args:
box1: (tensor) bounding boxes, sized [N,4].
box2: (tensor) bounding boxes, sized [M,4].
Return:
(tensor) iou, sized [N,M].
Reference:
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
'''
N = box1.shape[0]
M = box2.shape[0]
lt = np.maximum(box1[:,None,:2], box2[:,:2]) # [N,M,2]
rb = np.minimum(box1[:,None,2:], box2[:,2:]) # [N,M,2]
wh = np.clip((rb-lt+1.), 0, None) # [N,M,2]
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
area1 = (box1[:,2]-box1[:,0]+1.) * (box1[:,3]-box1[:,1]+1.) # [N,]
area2 = (box2[:,2]-box2[:,0]+1.) * (box2[:,3]-box2[:,1]+1.) # [M,]
iou = inter / (area1[:,None] + area2 - inter)
return iou
def box_iou_org(box1, box2):
'''Compute the intersection over union of two set of boxes.
The box order must be (xmin, ymin, xmax, ymax).
Args:
box1: (tensor) bounding boxes, sized [N,4].
box2: (tensor) bounding boxes, sized [M,4].
Return:
(tensor) iou, sized [N,M].
Reference:
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
'''
N = box1.shape[0]
M = box2.shape[0]
lt = np.maximum(box1[:,None,:2], box2[:,:2]) # [N,M,2]
rb = np.minimum(box1[:,None,2:], box2[:,2:]) # [N,M,2]
wh = np.clip((rb-lt), 0, None) # [N,M,2]
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1]) # [N,]
area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1]) # [M,]
iou = inter / (area1[:,None] + area2 - inter)
return iou
+38
Ver Arquivo
@@ -0,0 +1,38 @@
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import numpy as np
def nms(dets, scores, threshold=0.5):
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
# scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= threshold)[0]
order = order[inds + 1]
return keep
+44
Ver Arquivo
@@ -0,0 +1,44 @@
import os
# file names, folders, paths
def make_folder(res_path):
# create folder, if it does not exist
if not os.path.exists(res_path):
os.makedirs(res_path)
def prepare_data_gen_folder(relative_path, sign_model_version, collection_name, res_folder_name='results'):
# create path to file that stores generated training data
res_path = '{}pytorch/{}/{}'.format(relative_path, res_folder_name, sign_model_version)
train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path, collection_name)
collection_subfolder = '{}/images/'.format(collection_name)
# create folder, if necessary
make_folder(res_path)
# remove generated file, if it exists
if os.path.isfile(train_data_ext_file):
os.remove(train_data_ext_file)
return train_data_ext_file, collection_subfolder, res_path
def prepare_data_gen_folder_slim(collection_name, res_path_base):
# create path to file that stores generated training data
train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path_base, collection_name)
collection_subfolder = '{}/images/'.format(collection_name)
# create folder, if necessary
make_folder(res_path_base)
# remove generated file, if it exists
if os.path.isfile(train_data_ext_file):
os.remove(train_data_ext_file)
return train_data_ext_file, collection_subfolder
def clean_cdli(cdli_str):
# remove Vs, Rs
out_str = cdli_str.replace("Vs", "")
out_str = out_str.replace("Rs", "")
return out_str
+384
Ver Arquivo
@@ -0,0 +1,384 @@
import time
from collections import OrderedDict
import copy
from tqdm import tqdm
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.nn.modules.module import _addindent
import torchvision
from torchvision.transforms import *
import matplotlib.pyplot as plt
# HELPER FUNCTIONS
def weights_init(m):
if isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01) # 0.005
m.bias.data.zero_()
def torch_summarize(model, show_weights=True, show_parameters=True):
# code found here: https://stackoverflow.com/questions/42480111/model-summary-in-pytorch
"""Summarizes torch model by showing trainable parameters and weights."""
tmpstr = model.__class__.__name__ + ' (\n'
for key, module in model._modules.items():
# if it contains layers let call it recursively to get params and weights
if type(module) in [
torch.nn.modules.container.Container,
torch.nn.modules.container.Sequential
]:
modstr = torch_summarize(module)
else:
modstr = module.__repr__()
modstr = _addindent(modstr, 2)
params = sum([np.prod(p.size()) for p in module.parameters()])
weights = tuple([tuple(p.size()) for p in module.parameters()])
tmpstr += ' (' + key + '): ' + modstr
if show_weights:
tmpstr += ', weights={}'.format(weights)
if show_parameters:
tmpstr += ', parameters={}'.format(params)
tmpstr += '\n'
tmpstr = tmpstr + ')'
return tmpstr
def summary(mymodule, input_size):
# code from PR by isaykatsman https://github.com/pytorch/pytorch/pull/3043
def register_hook(module):
def hook(module, input, output):
if module._modules: # only want base layers
return
class_name = str(module.__class__).split('.')[-1].split("'")[0]
module_idx = len(summary)
m_key = '%s-%i' % (class_name, module_idx + 1)
summary[m_key] = OrderedDict()
summary[m_key]['input_shape'] = list(input[0].size())
summary[m_key]['input_shape'][0] = None
if output.__class__.__name__ == 'tuple':
summary[m_key]['output_shape'] = list(output[0].size())
else:
summary[m_key]['output_shape'] = list(output.size())
summary[m_key]['output_shape'][0] = None
params = 0
# iterate through parameters and count num params
for name, p in module._parameters.items():
params += torch.numel(p.data)
summary[m_key]['trainable'] = p.requires_grad
summary[m_key]['nb_params'] = params
if not isinstance(module, torch.nn.Sequential) and \
not isinstance(module, torch.nn.ModuleList) and \
not (module == mymodule):
hooks.append(module.register_forward_hook(hook))
# check if there are multiple inputs to the network
if isinstance(input_size[0], (list, tuple)):
x = [Variable(torch.rand(1, *in_size)) for in_size in input_size]
else:
x = Variable(torch.randn(1, *input_size))
# create properties
summary = OrderedDict()
hooks = []
# register hook
mymodule.apply(register_hook)
# make a forward pass
mymodule(x)
# remove these hooks
for h in hooks:
h.remove()
# print out neatly
def get_names(module, name, acc):
if not module._modules:
acc.append(name)
else:
for key in module._modules.keys():
p_name = key if name == "" else name + "." + key
get_names(module._modules[key], p_name, acc)
names = []
get_names(mymodule, "", names)
col_width = 25 # should be >= 12
summary_width = 61
def crop(s):
return s[:col_width] if len(s) > col_width else s
print('_' * summary_width)
print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
'Layer (type)', 'Output Shape', 'Param #', col_width))
print('=' * summary_width)
total_params = 0
trainable_params = 0
for (i, l_type), l_name in zip(enumerate(summary), names):
d = summary[l_type]
total_params += d['nb_params']
if 'trainable' in d and d['trainable']:
trainable_params += d['nb_params']
print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
crop(l_name + ' (' + l_type[:-2] + ')'), crop(str(d['output_shape'])),
crop(str(d['nb_params'])), col_width))
if i < len(summary) - 1:
print('_' * summary_width)
print('=' * summary_width)
print('Total params: ' + str(total_params))
print('Trainable params: ' + str(trainable_params))
print('Non-trainable params: ' + str((total_params - trainable_params)))
print('_' * summary_width)
def visualize_model(model, dataloader, re_transform, device, num_images=6):
was_training = model.training
images_so_far = 0
fig = plt.figure(figsize=(10, 10))
# switch to bachnorm and dropout to eval mode
model.eval()
with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloader):
inputs = inputs.to(device)
labels = labels.to(device)
# compute predictions using the model
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images // 2, 2, images_so_far)
ax.axis('off')
ax.set_title('predicted: {}'.format(preds[j]))
ax.imshow(re_transform(inputs.cpu().data[j].clone()), cmap=plt.cm.Greys_r)
if images_so_far == num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)
def prepare_embedding(model_feature, dataloader, re_transform, device):
# switch to bachnorm and dropout to eval mode
model_feature.eval()
f_list = []
i_list = []
l_list = []
with torch.no_grad():
# inputs, labels = next(iter(dataloaders['train']))
for inputs, labels in dataloader:
em_sz = inputs.shape[0]
# append labels
l_list.append(labels)
# undo transform, convert to RGB, and convert back to tensor
t_list = []
for t in inputs:
t_list.append(torchvision.transforms.ToTensor()(re_transform(t.clone()).convert('RGB')))
# append images
i_list.append(torch.stack(t_list))
# compute feature
inputs = inputs.to(device)
# append features
f_list.append(model_feature(inputs).view(em_sz, -1).data)
return torch.cat(f_list), torch.cat(l_list).numpy(), torch.cat(i_list)
def prepare_prcurves(model, dataloader, device):
# create softmax
softmax = nn.Softmax()
# loop over dataset with dataloader
p_list = []
l_list = []
with torch.no_grad():
# inputs, labels = next(iter(dataloaders['train']))
for inputs, labels in dataloader:
# append labels
l_list.append(labels)
# prepare input
inputs = inputs.to(device)
# apply network model
output = model(inputs)
# compute softmax
predicted = softmax(output)
# append features
p_list.append(predicted.data.cpu())
# concat to tensors
return torch.cat(p_list), torch.cat(l_list)
def preprocess_tablet_im(pil_im, scale, shift=5.0):
# compute scaled size
imw, imh = pil_im.size
imw = int(imw * scale)
imh = int(imh * scale)
# determine crop size
crop_sz = [int(imh - shift), int(imw - shift)]
# tensor-space transforms
ts_transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.5], std=[1]), # normalize
])
# compose transforms
tablet_transform = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x.convert('L')), # convert to gray
Resize((imh, imw)), # resize according to scale
FiveCrop((crop_sz[0], crop_sz[1])), # oversample
torchvision.transforms.Lambda(
lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
])
# apply transforms
im_list = tablet_transform(pil_im)
return im_list
def predict(model, im_list, device, use_bbox_reg=False):
inputs = im_list
with torch.no_grad(): # faster, less memory usage
# prepare input
inputs = inputs.to(device)
# apply network model
# output = model(inputs) # consumes to much memory
output = []
for in_im in inputs:
output.append(model(in_im.unsqueeze(0)))
output = torch.cat(output, dim=0)
# convert to numpy
predicted = output.data.cpu().numpy()
# free memory?!
# del output
# TODO: integrate bbox regression
result_roi = []
# stack detections to single tensor
predicted_roi = []
if use_bbox_reg:
predicted_roi = np.stack(result_roi).squeeze()
return predicted, predicted_roi
# TRAINER HELPER
def get_tensorboard_writer(logs_folder='runs_new', comment=''):
# init logger
import os
import socket
from datetime import datetime
from tensorboardX import SummaryWriter
current_time = datetime.now().strftime('%b%d_%H-%M-%S')
log_dir = os.path.join(logs_folder, current_time + '_' + socket.gethostname() + comment)
writer = SummaryWriter(log_dir=log_dir) # comment='_{}'.format(weights_path.split('/')[1].split('.')[0])
return writer
# TRAINER FUNCTIONS
def train_model(model, criterion, optimizer, scheduler, writer, dataloaders, dataset_sizes, device, num_epochs=25, test_every=10):
''' generic trainer function '''
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
best_epoch = 0
for epoch in tqdm(range(num_epochs)):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
phases = ['train', 'dev']
if epoch % test_every != 0:
phases = ['train']
for phase in phases:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# else:
# for name, param in model.named_parameters():
# writer.add_histogram(name, param.clone().cpu().data.numpy(), epoch)
# statistics
running_loss += loss.item() # * inputs.size(0) # uncomment this to fix a legacy bug XXX
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / float(dataset_sizes[phase])
# write to logger
writer.add_scalar('data/{}/loss'.format(phase), epoch_loss, epoch)
writer.add_scalar('data/{}/acc'.format(phase), epoch_acc, epoch)
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
print('{} Number correct: {} '.format(phase, running_corrects))
# deep copy the model
if phase == 'dev' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
best_epoch = epoch
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f} at {}'.format(best_acc, best_epoch))
# load best model weights
model.load_state_dict(best_model_wts)
return model
Ver Arquivo
+134
Ver Arquivo
@@ -0,0 +1,134 @@
import torch
def change_box_order(boxes, order):
'''Change box order between (xmin,ymin,xmax,ymax) and (xcenter,ycenter,width,height).
Args:
boxes: (tensor) bounding boxes, sized [N,4].
order: (str) either 'xyxy2xywh' or 'xywh2xyxy'.
Returns:
(tensor) converted bounding boxes, sized [N,4].
'''
assert order in ['xyxy2xywh','xywh2xyxy']
a = boxes[:,:2]
b = boxes[:,2:]
if order == 'xyxy2xywh':
return torch.cat([(a+b)/2,b-a], 1)
return torch.cat([a-b/2,a+b/2], 1)
def box_clamp(boxes, xmin, ymin, xmax, ymax):
'''Clamp boxes.
Args:
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
xmin: (number) min value of x.
ymin: (number) min value of y.
xmax: (number) max value of x.
ymax: (number) max value of y.
Returns:
(tensor) clamped boxes.
'''
boxes[:,0].clamp_(min=xmin, max=xmax)
boxes[:,1].clamp_(min=ymin, max=ymax)
boxes[:,2].clamp_(min=xmin, max=xmax)
boxes[:,3].clamp_(min=ymin, max=ymax)
return boxes
def box_select(boxes, xmin, ymin, xmax, ymax):
'''Select boxes in range (xmin,ymin,xmax,ymax).
Args:
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
xmin: (number) min value of x.
ymin: (number) min value of y.
xmax: (number) max value of x.
ymax: (number) max value of y.
Returns:
(tensor) selected boxes, sized [M,4].
(tensor) selected mask, sized [N,].
'''
mask = (boxes[:,0]>=xmin) & (boxes[:,1]>=ymin) \
& (boxes[:,2]<=xmax) & (boxes[:,3]<=ymax)
boxes = boxes[mask,:]
return boxes, mask
def box_iou(box1, box2):
'''Compute the intersection over union of two set of boxes.
The box order must be (xmin, ymin, xmax, ymax).
Args:
box1: (tensor) bounding boxes, sized [N,4].
box2: (tensor) bounding boxes, sized [M,4].
Return:
(tensor) iou, sized [N,M].
Reference:
https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
'''
N = box1.size(0)
M = box2.size(0)
lt = torch.max(box1[:,None,:2], box2[:,:2]) # [N,M,2]
rb = torch.min(box1[:,None,2:], box2[:,2:]) # [N,M,2]
wh = (rb-lt).clamp(min=0) # [N,M,2]
inter = wh[:,:,0] * wh[:,:,1] # [N,M]
area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1]) # [N,]
area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1]) # [M,]
iou = inter / (area1[:,None] + area2 - inter)
return iou
def box_nms(bboxes, scores, threshold=0.5):
'''Non maximum suppression.
Args:
bboxes: (tensor) bounding boxes, sized [N,4].
scores: (tensor) confidence scores, sized [N,].
threshold: (float) overlap threshold.
Returns:
keep: (tensor) selected indices.
Reference:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/py_cpu_nms.py
'''
x1 = bboxes[:,0]
y1 = bboxes[:,1]
x2 = bboxes[:,2]
y2 = bboxes[:,3]
areas = (x2-x1 + 1) * (y2-y1 + 1)
_, order = scores.sort(0, descending=True)
keep = []
while order.numel() > 0:
if order.numel() == 1:
i = order.item()
keep.append(i)
break
i = order[0]
keep.append(i)
xx1 = x1[order[1:]].clamp(min=x1[i].item())
yy1 = y1[order[1:]].clamp(min=y1[i].item())
xx2 = x2[order[1:]].clamp(max=x2[i].item())
yy2 = y2[order[1:]].clamp(max=y2[i].item())
w = (xx2-xx1 + 1).clamp(min=0)
h = (yy2-yy1 + 1).clamp(min=0)
inter = w * h
overlap = inter / (areas[i] + areas[order[1:]] - inter)
ids = (overlap <= threshold).nonzero().squeeze()
if ids.numel() == 0:
break
order = order[ids+1]
return torch.tensor(keep, dtype=torch.long)
+246
Ver Arquivo
@@ -0,0 +1,246 @@
'''Encode object boxes and labels.'''
import math
import torch
import itertools
import time
import numpy as np
from .meshgrid import meshgrid
from .box import box_iou, box_nms, change_box_order
class FPNSSDBoxCoder:
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
self.num_anchors = 12 # 12 # 9
# self.anchor_areas = (32 * 32., 64 * 64., 128 * 128., 256 * 256., 341 * 341., 426 * 426., 512 * 512.)
# self.aspect_ratios = (1 / 2., 1 / 1., 2 / 1.)
# self.scale_ratios = (1., pow(2, 1 / 3.), pow(2, 2 / 3.))
# compute num boxes for 500x500 patch
# 500/16(stride) -> 32
# 500/32(stride) -> 16
# 500/64(stride) -> 8
# (16^2 + 8^2) * num_anchors -> for 12: 3840
# (32^2 + 16^2 + 8^2) * num_anchors -> for 12: 16128
self.with_64 = with_64
if self.with_64:
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
else:
self.anchor_areas = [128 * 128., 256 * 256.]
if with_4_aspects:
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
else:
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
if with_4_scales:
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
else:
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.input_size = torch.tensor(input_size).float()
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
self.create_bg_class = create_bg_class
def _get_anchor_wh(self):
'''Compute anchor width and height for each feature map.
Returns:
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
'''
anchor_wh = []
for s in self.anchor_areas:
for ar in self.aspect_ratios: # w/h = ar
h = math.sqrt(s / ar)
w = ar * h
for sr in self.scale_ratios: # scale
anchor_h = h * sr
anchor_w = w * sr
anchor_wh.append([anchor_w, anchor_h])
num_fms = len(self.anchor_areas)
return torch.tensor(anchor_wh).view(num_fms, -1, 2)
def _get_anchor_boxes(self, input_size):
'''Compute anchor boxes for each feature map.
Args:
input_size: (tensor) model input size of (w,h).
Returns:
anchor_boxes: (tensor) anchor boxes for each feature map. Each of size [#anchors,4],
where #anchors = fmw * fmh * #anchors_per_cell
'''
num_fms = len(self.anchor_areas)
anchor_wh = self._get_anchor_wh()
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
if self.with_64: # num_fms == 3:
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
else: # num_fms == 2:
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
boxes = []
for i in range(num_fms):
fm_size = fm_sizes[i]
grid_size = input_size / fm_size
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
boxes.append(box.view(-1, 4))
return torch.cat(boxes, 0)
def encode(self, boxes, labels):
'''Encode target bounding boxes and class labels.
SSD coding rules:
tx = (x - anchor_x) / (variance[0]*anchor_w)
ty = (y - anchor_y) / (variance[0]*anchor_h)
tw = log(w / anchor_w)
th = log(h / anchor_h)
Args:
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj,4].
labels: (tensor) object class labels, sized [#obj,].
Returns:
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
cls_targets: (tensor) encoded class labels, sized [#anchors,].
Reference:
https://github.com/chainer/chainercv/blob/master/chainercv/links/model/ssd/multibox_coder.py
'''
def argmax(x):
'''Find the max value index(row & col) of a 2D tensor.'''
v, i = x.max(0)
j = v.max(0)[1].item()
return (i[j], j)
# before_ts = time.time()
anchor_boxes = self.anchor_boxes
ious = box_iou(anchor_boxes, boxes) # [#anchors, #obj]
index = torch.empty(anchor_boxes.size(0), dtype=torch.long).fill_(-1) # TD: for every anchorbox
masked_ious = ious.clone()
# TD: this whole while loop seems unnecessary... maybe performance issue?!
while True:
# TD: this should be run for every gt box with fitting anchor
i, j = argmax(masked_ious)
if masked_ious[i, j] < 1e-6:
break
index[i] = j
# TD: zero row and column
masked_ious[i, :] = 0
masked_ious[:, j] = 0
# TD: deal with anchor boxes that have not been assigned yet
mask = (index < 0) & (ious.max(1)[0] >= 0.5)
if mask.any():
index[mask] = ious[mask].max(1)[1] # TD: assign if iou more than 0.5
# TD: does this clamp remove index -1 otherwise boxes[0] selected very often?!
boxes = boxes[index.clamp(min=0)] # negative index not supported
boxes = change_box_order(boxes, 'xyxy2xywh')
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
loc_targets = torch.cat([loc_xy, loc_wh], 1)
if self.create_bg_class:
# TD: does this clamp remove index -1 otherwise labels[0] selected very often?!
cls_targets = 1 + labels[index.clamp(min=0)]
else:
# if background class 0 already exists in labels
cls_targets = labels[index.clamp(min=0)]
# ok here index -1 targets are set to zero anyways
cls_targets[index < 0] = 0
# print('time spent encoding: {}'.format(time.time() - before_ts))
return loc_targets, cls_targets
def decode(self, loc_preds, cls_preds, score_thresh=0.6, nms_thresh=0.45):
'''Decode predicted loc/cls back to real box locations and class labels.
Args:
loc_preds: (tensor) predicted loc, sized [#anchors,4].
cls_preds: (tensor) predicted conf, sized [#anchors,#classes].
score_thresh: (float) threshold for object confidence score.
nms_thresh: (float) threshold for box nms.
Returns:
boxes: (tensor) bbox locations, sized [#obj,4].
labels: (tensor) class labels, sized [#obj,].
'''
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
boxes = []
labels = []
scores = []
num_classes = cls_preds.size(1)
if self.create_bg_class:
for i in range(num_classes - 1):
score = cls_preds[:, i + 1] # class i corresponds to (i+1) column
mask = score > score_thresh
if not mask.any():
continue
box = box_preds[mask]
score = score[mask]
# print(box.size())
# print(score.size())
keep = box_nms(box, score, nms_thresh)
boxes.append(box[keep])
labels.append(torch.empty_like(keep).fill_(i))
scores.append(score[keep])
else:
for i in range(1, num_classes):
score = cls_preds[:, i] # class i corresponds to (i+1) column
mask = score > score_thresh
if not mask.any():
continue
box = box_preds[mask]
score = score[mask]
# print(box.size())
# print(score.size())
keep = box_nms(box, score, nms_thresh)
boxes.append(box[keep])
labels.append(torch.empty_like(keep).fill_(i))
scores.append(score[keep])
# concatenate if not empty
if len(boxes) > 0:
boxes = torch.cat(boxes, 0)
labels = torch.cat(labels, 0)
scores = torch.cat(scores, 0)
return boxes, labels, scores
def decode_boxes(self, loc_preds):
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
boxes = box_preds
return boxes
def test():
box_coder = FPNSSDBoxCoder()
print(box_coder.anchor_boxes.size())
boxes = torch.tensor([[0, 0, 100, 100], [100, 100, 200, 200]], dtype=torch.float)
labels = torch.tensor([0, 1], dtype=torch.long)
loc_targets, cls_targets = box_coder.encode(boxes, labels)
print(loc_targets.size(), cls_targets.size())
# test()
+169
Ver Arquivo
@@ -0,0 +1,169 @@
'''Encode object boxes and labels.'''
import math
import torch
import numpy as np
from .meshgrid import meshgrid
from .box import box_iou, box_nms, change_box_order
class RetinaBoxCoder:
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
self.num_anchors = 12
# self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.) # p3 -> p7
# self.aspect_ratios = (1/2., 1/1., 2/1.)
# self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
self.with_64 = with_64
if self.with_64:
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
else:
self.anchor_areas = [128 * 128., 256 * 256.]
if with_4_aspects:
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
else:
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
if with_4_scales:
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
else:
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.input_size = torch.tensor(input_size).float()
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
self.create_bg_class = create_bg_class
def _get_anchor_wh(self):
'''Compute anchor width and height for each feature map.
Returns:
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
'''
anchor_wh = []
for s in self.anchor_areas:
for ar in self.aspect_ratios: # w/h = ar
h = math.sqrt(s / ar)
w = ar * h
for sr in self.scale_ratios: # scale
anchor_h = h * sr
anchor_w = w * sr
anchor_wh.append([anchor_w, anchor_h])
num_fms = len(self.anchor_areas)
return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
def _get_anchor_boxes(self, input_size):
'''Compute anchor boxes for each feature map.
Args:
input_size: (tensor) model input size of (w,h).
Returns:
boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
where #anchors = fmw * fmh * #anchors_per_cell
'''
num_fms = len(self.anchor_areas)
anchor_wh = self._get_anchor_wh()
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
if self.with_64: # num_fms == 3:
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
else: # num_fms == 2:
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
boxes = []
for i in range(num_fms):
fm_size = fm_sizes[i]
grid_size = input_size / fm_size
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
boxes.append(box.view(-1, 4))
return torch.cat(boxes, 0)
def encode(self, boxes, labels):
'''Encode target bounding boxes and class labels.
We obey the Faster RCNN box coder:
tx = (x - anchor_x) / anchor_w
ty = (y - anchor_y) / anchor_h
tw = log(w / anchor_w)
th = log(h / anchor_h)
Args:
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
labels: (tensor) object class labels, sized [#obj,].
Returns:
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
cls_targets: (tensor) encoded class labels, sized [#anchors,].
'''
anchor_boxes = self.anchor_boxes
ious = box_iou(anchor_boxes, boxes)
max_ious, max_ids = ious.max(1)
boxes = boxes[max_ids]
boxes = change_box_order(boxes, 'xyxy2xywh')
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
loc_targets = torch.cat([loc_xy, loc_wh], 1)
if self.create_bg_class:
cls_targets = 1 + labels[max_ids]
else:
# if background class 0 already exists in labels
cls_targets = labels[max_ids]
cls_targets[max_ious < 0.5] = 0 # WATCH OUT HERE, this is just for testing!!
# ignore = (max_ious > 0.4) & (max_ious < 0.5) # ignore ious between [0.4,0.5]
# cls_targets[ignore] = -1 # mark ignored to -1
return loc_targets, cls_targets
def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
'''Decode outputs back to bouding box locations and class labels.
Args:
loc_preds: (tensor) predicted locations, sized [#anchors, 4].
cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
input_size: (tuple) model input size of (w,h).
Returns:
boxes: (tensor) decode box locations, sized [#obj,4].
labels: (tensor) class labels for each box, sized [#obj,].
'''
CLS_THRESH = score_thresh
NMS_THRESH = nms_thresh
input_size = torch.Tensor(input_size)
# anchor_boxes = self._get_anchor_boxes(input_size) # xywh
anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
loc_xy = loc_preds[:, :2]
loc_wh = loc_preds[:, 2:]
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_wh.exp() * anchor_boxes[:, 2:]
boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1) # [#anchors,4]
score, labels = cls_preds.sigmoid().max(1) # [#anchors,]
ids = score > CLS_THRESH
ids = ids.nonzero().squeeze() # [#obj,]
keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
return boxes[ids][keep], labels[ids][keep] # , score[ids][keep]
def decode_boxes(self, loc_preds):
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
loc_xy = loc_preds[:, :2]
loc_wh = loc_preds[:, 2:]
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_wh.exp() * anchor_boxes[:, 2:]
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
boxes = box_preds
return boxes
+178
Ver Arquivo
@@ -0,0 +1,178 @@
'''Encode object boxes and labels.'''
import math
import torch
import numpy as np
from .meshgrid import meshgrid
from .box import box_iou, box_nms, change_box_order
class RetinaBoxCoder:
def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
self.num_anchors = 12
# self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.) # p3 -> p7
# self.aspect_ratios = (1/2., 1/1., 2/1.)
# self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
self.with_64 = with_64
if self.with_64:
self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
else:
self.anchor_areas = [128 * 128., 256 * 256.]
if with_4_aspects:
self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
else:
self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.] # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
if with_4_scales:
assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
else:
self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
self.input_size = torch.tensor(input_size).float()
self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
self.create_bg_class = create_bg_class
def _get_anchor_wh(self):
'''Compute anchor width and height for each feature map.
Returns:
anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
'''
anchor_wh = []
for s in self.anchor_areas:
for ar in self.aspect_ratios: # w/h = ar
h = math.sqrt(s / ar)
w = ar * h
for sr in self.scale_ratios: # scale
anchor_h = h * sr
anchor_w = w * sr
anchor_wh.append([anchor_w, anchor_h])
num_fms = len(self.anchor_areas)
return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
def _get_anchor_boxes(self, input_size):
'''Compute anchor boxes for each feature map.
Args:
input_size: (tensor) model input size of (w,h).
Returns:
boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
where #anchors = fmw * fmh * #anchors_per_cell
'''
num_fms = len(self.anchor_areas)
anchor_wh = self._get_anchor_wh()
# fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)] # p3 -> p7 feature map sizes
if self.with_64: # num_fms == 3:
fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)] # p4 -> p6 feature map sizes
else: # num_fms == 2:
fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)] # p5 -> p6 feature map sizes
boxes = []
for i in range(num_fms):
fm_size = fm_sizes[i]
grid_size = input_size / fm_size
fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
xy = meshgrid(fm_w, fm_h) + 0.5 # [fm_h*fm_w, 2]
xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
box = torch.cat([xy - wh / 2., xy + wh / 2.], 3) # [x,y,x,y]
boxes.append(box.view(-1, 4))
return torch.cat(boxes, 0)
def encode(self, boxes, labels, linemap):
'''Encode target bounding boxes and class labels.
We obey the Faster RCNN box coder:
tx = (x - anchor_x) / anchor_w
ty = (y - anchor_y) / anchor_h
tw = log(w / anchor_w)
th = log(h / anchor_h)
Args:
boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
labels: (tensor) object class labels, sized [#obj,].
Returns:
loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
cls_targets: (tensor) encoded class labels, sized [#anchors,].
'''
anchor_boxes = self.anchor_boxes
ious = box_iou(anchor_boxes, boxes)
max_ious, max_ids = ious.max(1)
boxes = boxes[max_ids]
# need to check if anchor_box center has positive linemap
anchor_ctrs = torch.zeros((anchor_boxes.shape[0], 2)).int()
anchor_ctrs[:, 0] = (anchor_boxes[:, 2] + anchor_boxes[:, 0]) / 2
anchor_ctrs[:, 1] = (anchor_boxes[:, 3] + anchor_boxes[:, 1]) / 2
linemap_val = np.asarray(linemap)[anchor_ctrs[:, 1], anchor_ctrs[:, 0]]
boxes = change_box_order(boxes, 'xyxy2xywh')
anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
loc_targets = torch.cat([loc_xy, loc_wh], 1)
if self.create_bg_class:
cls_targets = 1 + labels[max_ids]
else:
# if background class 0 already exists in labels
cls_targets = labels[max_ids]
cls_targets[max_ious < 0.5] = 0 # WATCH OUT HERE, this is just for testing!!
# ignore = (max_ious > 0.4) & (max_ious < 0.5) # ignore ious between [0.4,0.5]
# cls_targets[ignore] = -1 # mark ignored to -1
# ignore if box centered on line detection and iou below 0.5
ignore = torch.from_numpy(linemap_val.astype(np.uint8)) & (max_ious < 0.35) # 0.5
cls_targets[ignore] = -1 # mark ignored to -1
return loc_targets, cls_targets
def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
'''Decode outputs back to bouding box locations and class labels.
Args:
loc_preds: (tensor) predicted locations, sized [#anchors, 4].
cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
input_size: (tuple) model input size of (w,h).
Returns:
boxes: (tensor) decode box locations, sized [#obj,4].
labels: (tensor) class labels for each box, sized [#obj,].
'''
CLS_THRESH = score_thresh
NMS_THRESH = nms_thresh
input_size = torch.Tensor(input_size)
# anchor_boxes = self._get_anchor_boxes(input_size) # xywh
anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
loc_xy = loc_preds[:, :2]
loc_wh = loc_preds[:, 2:]
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_wh.exp() * anchor_boxes[:, 2:]
boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1) # [#anchors,4]
score, labels = cls_preds.sigmoid().max(1) # [#anchors,]
ids = score > CLS_THRESH
ids = ids.nonzero().squeeze() # [#obj,]
keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
return boxes[ids][keep], labels[ids][keep] # , score[ids][keep]
def decode_boxes(self, loc_preds):
anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
loc_xy = loc_preds[:, :2]
loc_wh = loc_preds[:, 2:]
xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
wh = loc_wh.exp() * anchor_boxes[:, 2:]
box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
boxes = box_preds
return boxes
+358
Ver Arquivo
@@ -0,0 +1,358 @@
'''Compute PASCAL_VOC MAP.
Reference:
https://github.com/chainer/chainercv/blob/master/chainercv/evaluations/eval_detection_voc.py
'''
from __future__ import division
import six
import itertools
import numpy as np
from collections import defaultdict
def voc_eval(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
gt_difficults=None, iou_thresh=0.5, use_07_metric=True):
'''Wrap VOC evaluation for PyTorch.'''
pred_bboxes = [xy2yx(b).numpy() for b in pred_bboxes]
pred_labels = [label.numpy() for label in pred_labels]
pred_scores = [score.numpy() for score in pred_scores]
gt_bboxes = [xy2yx(b).numpy() for b in gt_bboxes]
gt_labels = [label.numpy() for label in gt_labels]
return eval_detection_voc(
pred_bboxes, pred_labels, pred_scores, gt_bboxes,
gt_labels, gt_difficults, iou_thresh, use_07_metric)
def xy2yx(boxes):
'''Convert box (xmin,ymin,xmax,ymax) to (ymin,xmin,ymax,xmax).'''
c0 = boxes[:,0].clone()
c2 = boxes[:,2].clone()
boxes[:,0] = boxes[:,1]
boxes[:,1] = c0
boxes[:,2] = boxes[:,3]
boxes[:,3] = c2
return boxes
def bbox_iou(bbox_a, bbox_b):
'''Calculate the Intersection of Unions (IoUs) between bounding boxes.
Args:
bbox_a (array): An array whose shape is :math:`(N, 4)`.
:math:`N` is the number of bounding boxes.
The dtype should be :obj:`numpy.float32`.
bbox_b (array): An array similar to :obj:`bbox_a`,
whose shape is :math:`(K, 4)`.
The dtype should be :obj:`numpy.float32`.
Returns:
array:
An array whose shape is :math:`(N, K)`. \
An element at index :math:`(n, k)` contains IoUs between \
:math:`n` th bounding box in :obj:`bbox_a` and :math:`k` th bounding \
box in :obj:`bbox_b`.
'''
# top left
tl = np.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
# bottom right
br = np.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])
area_i = np.prod(br - tl, axis=2) * (tl < br).all(axis=2)
area_a = np.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
area_b = np.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
return area_i / (area_a[:, None] + area_b - area_i)
def eval_detection_voc(
pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
gt_difficults=None,
iou_thresh=0.5, use_07_metric=False):
"""Calculate average precisions based on evaluation code of PASCAL VOC.
This function evaluates predicted bounding boxes obtained from a dataset
which has :math:`N` images by using average precision for each class.
The code is based on the evaluation code used in PASCAL VOC Challenge.
Args:
pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of :obj:`pred_bboxes` is a set of coordinates
of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
where :math:`R` corresponds
to the number of bounding boxes, which may vary among boxes.
The second axis corresponds to
:math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
pred_labels (iterable of numpy.ndarray): An iterable of labels.
Similar to :obj:`pred_bboxes`, its index corresponds to an
index for the base dataset. Its length is :math:`N`.
pred_scores (iterable of numpy.ndarray): An iterable of confidence
scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
its index corresponds to an index for the base dataset.
Its length is :math:`N`.
gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
bounding boxes
whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
bounding box whose shape is :math:`(R, 4)`. Note that the number of
bounding boxes in each image does not need to be same as the number
of corresponding predicted boxes.
gt_labels (iterable of numpy.ndarray): An iterable of ground truth
labels which are organized similarly to :obj:`gt_bboxes`.
gt_difficults (iterable of numpy.ndarray): An iterable of boolean
arrays which is organized similarly to :obj:`gt_bboxes`.
This tells whether the
corresponding ground truth bounding box is difficult or not.
By default, this is :obj:`None`. In that case, this function
considers all bounding boxes to be not difficult.
iou_thresh (float): A prediction is correct if its Intersection over
Union with the ground truth is above this value.
use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
:obj:`False`.
Returns:
dict:
The keys, value-types and the description of the values are listed
below.
* **ap** (*numpy.ndarray*): An array of average precisions. \
The :math:`l`-th value corresponds to the average precision \
for class :math:`l`. If class :math:`l` does not exist in \
either :obj:`pred_labels` or :obj:`gt_labels`, the corresponding \
value is set to :obj:`numpy.nan`.
* **map** (*float*): The average of Average Precisions over classes.
"""
prec, rec = calc_detection_voc_prec_rec(
pred_bboxes, pred_labels, pred_scores,
gt_bboxes, gt_labels, gt_difficults,
iou_thresh=iou_thresh)
ap = calc_detection_voc_ap(prec, rec, use_07_metric=use_07_metric)
return {'ap': ap, 'map': np.nanmean(ap)}
def calc_detection_voc_prec_rec(
pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
gt_difficults=None,
iou_thresh=0.5):
"""Calculate precision and recall based on evaluation code of PASCAL VOC.
This function calculates precision and recall of
predicted bounding boxes obtained from a dataset which has :math:`N`
images.
The code is based on the evaluation code used in PASCAL VOC Challenge.
Args:
pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
sets of bounding boxes.
Its index corresponds to an index for the base dataset.
Each element of :obj:`pred_bboxes` is a set of coordinates
of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
where :math:`R` corresponds
to the number of bounding boxes, which may vary among boxes.
The second axis corresponds to
:math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
pred_labels (iterable of numpy.ndarray): An iterable of labels.
Similar to :obj:`pred_bboxes`, its index corresponds to an
index for the base dataset. Its length is :math:`N`.
pred_scores (iterable of numpy.ndarray): An iterable of confidence
scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
its index corresponds to an index for the base dataset.
Its length is :math:`N`.
gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
bounding boxes
whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
bounding box whose shape is :math:`(R, 4)`. Note that the number of
bounding boxes in each image does not need to be same as the number
of corresponding predicted boxes.
gt_labels (iterable of numpy.ndarray): An iterable of ground truth
labels which are organized similarly to :obj:`gt_bboxes`.
gt_difficults (iterable of numpy.ndarray): An iterable of boolean
arrays which is organized similarly to :obj:`gt_bboxes`.
This tells whether the
corresponding ground truth bounding box is difficult or not.
By default, this is :obj:`None`. In that case, this function
considers all bounding boxes to be not difficult.
iou_thresh (float): A prediction is correct if its Intersection over
Union with the ground truth is above this value..
Returns:
tuple of two lists:
This function returns two lists: :obj:`prec` and :obj:`rec`.
* :obj:`prec`: A list of arrays. :obj:`prec[l]` is precision \
for class :math:`l`. If class :math:`l` does not exist in \
either :obj:`pred_labels` or :obj:`gt_labels`, :obj:`prec[l]` is \
set to :obj:`None`.
* :obj:`rec`: A list of arrays. :obj:`rec[l]` is recall \
for class :math:`l`. If class :math:`l` that is not marked as \
difficult does not exist in \
:obj:`gt_labels`, :obj:`rec[l]` is \
set to :obj:`None`.
"""
pred_bboxes = iter(pred_bboxes)
pred_labels = iter(pred_labels)
pred_scores = iter(pred_scores)
gt_bboxes = iter(gt_bboxes)
gt_labels = iter(gt_labels)
if gt_difficults is None:
gt_difficults = itertools.repeat(None)
else:
gt_difficults = iter(gt_difficults)
n_pos = defaultdict(int)
score = defaultdict(list)
match = defaultdict(list)
for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \
six.moves.zip(
pred_bboxes, pred_labels, pred_scores,
gt_bboxes, gt_labels, gt_difficults):
if gt_difficult is None:
gt_difficult = np.zeros(gt_bbox.shape[0], dtype=bool)
for l in np.unique(np.concatenate((pred_label, gt_label)).astype(int)):
pred_mask_l = pred_label == l
pred_bbox_l = pred_bbox[pred_mask_l]
pred_score_l = pred_score[pred_mask_l]
# sort by score
order = pred_score_l.argsort()[::-1]
pred_bbox_l = pred_bbox_l[order]
pred_score_l = pred_score_l[order]
gt_mask_l = gt_label == l
gt_bbox_l = gt_bbox[gt_mask_l]
gt_difficult_l = gt_difficult[gt_mask_l]
n_pos[l] += np.logical_not(gt_difficult_l).sum()
score[l].extend(pred_score_l)
if len(pred_bbox_l) == 0:
continue
if len(gt_bbox_l) == 0:
match[l].extend((0,) * pred_bbox_l.shape[0])
continue
# VOC evaluation follows integer typed bounding boxes.
pred_bbox_l = pred_bbox_l.copy()
pred_bbox_l[:, 2:] += 1
gt_bbox_l = gt_bbox_l.copy()
gt_bbox_l[:, 2:] += 1
iou = bbox_iou(pred_bbox_l, gt_bbox_l)
gt_index = iou.argmax(axis=1)
# set -1 if there is no matching ground truth
gt_index[iou.max(axis=1) < iou_thresh] = -1
del iou
selec = np.zeros(gt_bbox_l.shape[0], dtype=bool)
for gt_idx in gt_index:
if gt_idx >= 0:
if gt_difficult_l[gt_idx]:
match[l].append(-1)
else:
if not selec[gt_idx]:
match[l].append(1)
else:
match[l].append(0)
selec[gt_idx] = True
else:
match[l].append(0)
for iter_ in (
pred_bboxes, pred_labels, pred_scores,
gt_bboxes, gt_labels, gt_difficults):
if next(iter_, None) is not None:
raise ValueError('Length of input iterables need to be same.')
n_fg_class = max(n_pos.keys()) + 1
prec = [None] * n_fg_class
rec = [None] * n_fg_class
for l in n_pos.keys():
score_l = np.array(score[l])
match_l = np.array(match[l], dtype=np.int8)
order = score_l.argsort()[::-1]
match_l = match_l[order]
tp = np.cumsum(match_l == 1)
fp = np.cumsum(match_l == 0)
# If an element of fp + tp is 0,
# the corresponding element of prec[l] is nan.
prec[l] = tp / (fp + tp)
# If n_pos[l] is 0, rec[l] is None.
if n_pos[l] > 0:
rec[l] = tp / n_pos[l]
return prec, rec
def calc_detection_voc_ap(prec, rec, use_07_metric=False):
"""Calculate average precisions based on evaluation code of PASCAL VOC.
This function calculates average precisions
from given precisions and recalls.
The code is based on the evaluation code used in PASCAL VOC Challenge.
Args:
prec (list of numpy.array): A list of arrays.
:obj:`prec[l]` indicates precision for class :math:`l`.
If :obj:`prec[l]` is :obj:`None`, this function returns
:obj:`numpy.nan` for class :math:`l`.
rec (list of numpy.array): A list of arrays.
:obj:`rec[l]` indicates recall for class :math:`l`.
If :obj:`rec[l]` is :obj:`None`, this function returns
:obj:`numpy.nan` for class :math:`l`.
use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
for calculating average precision. The default value is
:obj:`False`.
Returns:
~numpy.ndarray:
This function returns an array of average precisions.
The :math:`l`-th value corresponds to the average precision
for class :math:`l`. If :obj:`prec[l]` or :obj:`rec[l]` is
:obj:`None`, the corresponding value is set to :obj:`numpy.nan`.
"""
n_fg_class = len(prec)
ap = np.empty(n_fg_class)
for l in six.moves.range(n_fg_class):
if prec[l] is None or rec[l] is None:
ap[l] = np.nan
continue
if use_07_metric:
# 11 point metric
ap[l] = 0
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec[l] >= t) == 0:
p = 0
else:
p = np.max(np.nan_to_num(prec[l])[rec[l] >= t])
ap[l] += p / 11
else:
# correct AP calculation
# first append sentinel values at the end
mpre = np.concatenate(([0], np.nan_to_num(prec[l]), [0]))
mrec = np.concatenate(([0], rec[l], [1]))
mpre = np.maximum.accumulate(mpre[::-1])[::-1]
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap[l] = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
+1
Ver Arquivo
@@ -0,0 +1 @@
+84
Ver Arquivo
@@ -0,0 +1,84 @@
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from ..one_hot_embedding import one_hot_embedding
class FocalLoss(nn.Module):
def __init__(self, num_classes):
super(FocalLoss, self).__init__()
self.num_classes = num_classes
def _focal_loss(self, x, y):
'''Focal loss.
This is described in the original paper.
With BCELoss, the background should not be counted in num_classes.
Args:
x: (tensor) predictions, sized [N,D].
y: (tensor) targets, sized [N,].
Return:
(tensor) focal loss.
'''
alpha = 0.25 # balance param
gamma = 2 # focus param
size_average = False
t = one_hot_embedding(y, self.num_classes) # y-1
p = x.sigmoid()
pt = torch.where(t > 0, p, 1 - p) # pt = p if t > 0 else 1-p
w = (1 - pt).pow(gamma)
w = torch.where(t > 0, alpha * w, (1 - alpha) * w)
loss = F.binary_cross_entropy_with_logits(x, t, w, size_average=size_average)
# according to https://github.com/c0nn3r/RetinaNet/blob/master/focal_loss.py
# logpt = - F.cross_entropy(x, y)
# pt = torch.exp(logpt)
# focal_loss = -((1 - pt) ** gamma) * logpt
# loss = alpha * focal_loss
# averaging (or not) loss
# if size_average:
# loss = loss.mean()
# else:
# loss = loss.sum()
return loss
def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
'''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
Args:
loc_preds: (tensor) predicted locations, sized [batch_size, #anchors, 4].
loc_targets: (tensor) encoded target locations, sized [batch_size, #anchors, 4].
cls_preds: (tensor) predicted class confidences, sized [batch_size, #anchors, #classes].
cls_targets: (tensor) encoded target labels, sized [batch_size, #anchors].
loss:
(tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + FocalLoss(cls_preds, cls_targets).
'''
batch_size, num_boxes = cls_targets.size()
pos = cls_targets > 0 # [N,#anchors]
num_pos = pos.sum().item()
# ===============================================================
# loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
# ===============================================================
mask = pos.unsqueeze(2).expand_as(loc_preds) # [N,#anchors,4]
loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
# ===============================================================
# cls_loss = FocalLoss(cls_preds, cls_targets)
# ===============================================================
pos_neg = cls_targets > -1 # exclude ignored anchors
mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
cls_loss = self._focal_loss(masked_cls_preds, cls_targets[pos_neg])
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item() / num_pos, cls_loss.item() / num_pos), end=' | ')
loss = (loc_loss + cls_loss) / num_pos
return loss
+71
Ver Arquivo
@@ -0,0 +1,71 @@
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
class SSDLoss(nn.Module):
def __init__(self, num_classes):
super(SSDLoss, self).__init__()
self.num_classes = num_classes
def _hard_negative_mining(self, cls_loss, pos):
'''Return negative indices that is 3x the number as positive indices.
Args:
cls_loss: (tensor) cross entroy loss between cls_preds and cls_targets, sized [N,#anchors].
pos: (tensor) positive class mask, sized [N,#anchors].
Return:
(tensor) negative indices, sized [N,#anchors].
'''
cls_loss = cls_loss * (pos.float() - 1)
_, idx = cls_loss.sort(1) # sort by negative losses
_, rank = idx.sort(1) # [N,#anchors]
num_neg = 3*pos.sum(1) # [N,]
neg = rank < num_neg[:,None] # [N,#anchors]
return neg
def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
'''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
Args:
loc_preds: (tensor) predicted locations, sized [N, #anchors, 4].
loc_targets: (tensor) encoded target locations, sized [N, #anchors, 4].
cls_preds: (tensor) predicted class confidences, sized [N, #anchors, #classes].
cls_targets: (tensor) encoded target labels, sized [N, #anchors].
loss:
(tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + CrossEntropyLoss(cls_preds, cls_targets).
'''
pos = cls_targets > 0 # [N,#anchors]
batch_size = pos.size(0)
num_pos = pos.sum().item()
#===============================================================
# loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
#===============================================================
mask = pos.unsqueeze(2).expand_as(loc_preds) # [N,#anchors,4]
loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
#===============================================================
# cls_loss = CrossEntropyLoss(cls_preds, cls_targets)
#===============================================================
# TD: added clamp, because cross entropy does not handle negative indices well
cls_loss = F.cross_entropy(cls_preds.view(-1,self.num_classes), \
cls_targets.clamp(min=0).view(-1), reduce=False) # [N*#anchors,]
cls_loss = cls_loss.view(batch_size, -1)
cls_loss[cls_targets < 0] = 0 # set ignored loss to 0
neg = self._hard_negative_mining(cls_loss, pos) # [N,#anchors]
cls_loss = cls_loss[pos|neg].sum()
if num_pos > 0: # TD mod to prevent div by zero
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
loss = (loc_loss+cls_loss)/num_pos
else:
print('num_pos zero exception')
loss = (loc_loss+cls_loss)/1.
return loss
+38
Ver Arquivo
@@ -0,0 +1,38 @@
import torch
def meshgrid(x, y, row_major=True):
'''Return meshgrid in range x & y.
Args:
x: (int) first dim range.
y: (int) second dim range.
row_major: (bool) row major or column major.
Returns:
(tensor) meshgrid, sized [x*y,2]
Example:
>> meshgrid(3,2)
0 0
1 0
2 0
0 1
1 1
2 1
[torch.FloatTensor of size 6x2]
>> meshgrid(3,2,row_major=False)
0 0
0 1
0 2
1 0
1 1
1 2
[torch.FloatTensor of size 6x2]
'''
a = torch.arange(0, x, dtype=torch.float) # TD: make it float (0.4.1)
b = torch.arange(0, y, dtype=torch.float) # TD: make it float (0.4.1)
xx = a.repeat(y).view(-1, 1)
yy = b.view(-1, 1).repeat(1, x).view(-1, 1)
return torch.cat([xx, yy], 1) if row_major else torch.cat([yy, xx], 1)
Ver Arquivo
+53
Ver Arquivo
@@ -0,0 +1,53 @@
import torch
import torch.nn as nn
# this should import a specific architecture for cuneiform sign detection
class FPNSSD(nn.Module):
num_anchors = 12
def __init__(self, fpn_model, num_classes):
super(FPNSSD, self).__init__()
self.fpn = fpn_model
self.num_classes = num_classes
self.loc_head = self._make_head(self.num_anchors * 4)
self.cls_head = self._make_head(self.num_anchors * self.num_classes)
def forward(self, x):
loc_preds = []
cls_preds = []
fms = self.fpn(x)
for fm in fms:
loc_pred = self.loc_head(fm)
cls_pred = self.cls_head(fm)
loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
4) # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
self.num_classes) # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
loc_preds.append(loc_pred)
cls_preds.append(cls_pred)
return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1)
def _make_head(self, out_planes):
layers = []
for _ in range(4):
layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
layers.append(nn.ReLU(True))
layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
return nn.Sequential(*layers)
def freeze_bn(self):
'''Freeze BatchNorm layers.'''
for layer in self.modules():
if isinstance(layer, nn.BatchNorm2d):
layer.eval()
# def test():
# net = FPNSSD(21)
# loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
# print(loc_preds.size(), cls_preds.size())
# test()
+54
Ver Arquivo
@@ -0,0 +1,54 @@
import torch
import torch.nn as nn
# this should import a specific architecture for cuneiform sign detection
class RPN(nn.Module):
num_anchors = 12
def __init__(self, fpn_model, num_classes, with_64):
super(RPN, self).__init__()
self.fpn = fpn_model
self.num_classes = num_classes
self.with_p4 = int(with_64)
self.loc_head = self._make_head(self.num_anchors * 4)
self.cls_head = self._make_head(self.num_anchors * self.num_classes)
def forward(self, x):
loc_preds = []
cls_preds = []
fms = self.fpn(x)
for fm in fms:
loc_pred = self.loc_head(fm)
cls_pred = self.cls_head(fm)
loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
4) # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
self.num_classes) # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
loc_preds.append(loc_pred)
cls_preds.append(cls_pred)
return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1), fms[self.with_p4]
def _make_head(self, out_planes):
layers = []
for _ in range(4):
layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
layers.append(nn.ReLU(True))
layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
return nn.Sequential(*layers)
def freeze_bn(self):
'''Freeze BatchNorm layers.'''
for layer in self.modules():
if isinstance(layer, nn.BatchNorm2d):
layer.eval()
# def test():
# net = FPNSSD(21)
# loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
# print(loc_preds.size(), cls_preds.size())
# test()
+15
Ver Arquivo
@@ -0,0 +1,15 @@
import torch
def one_hot_embedding(labels, num_classes):
'''Embedding labels to one-hot.
Args:
labels: (LongTensor) class labels, sized [N,].
num_classes: (int) number of classes.
Returns:
(tensor) encoded labels, sized [N,#classes].
'''
y = torch.eye(num_classes, device=labels.device) # [D,D]
return y[labels] # [N,D]
+22
Ver Arquivo
@@ -0,0 +1,22 @@
import torch
def center_crop(img, boxes, size):
'''Crops the given PIL Image at the center.
Args:
img: (PIL.Image) image to be cropped.
boxes: (tensor) object boxes, sized [#ojb,4].
size (tuple): desired output size of (w,h).
Returns:
img: (PIL.Image) center cropped image.
boxes: (tensor) center cropped boxes.
'''
w, h = img.size
ow, oh = size
i = int(round((h - oh) / 2.))
j = int(round((w - ow) / 2.))
img = img.crop((j, i, j + ow, i + oh))
boxes -= torch.Tensor([j, i, j, i])
boxes[:, 0::2].clamp(min=0, max=ow - 1)
boxes[:, 1::2].clamp(min=0, max=oh - 1)
return img, boxes
+25
Ver Arquivo
@@ -0,0 +1,25 @@
import math
import torch
import random
from PIL import Image
from ..box import box_iou, box_clamp
def crop_box(img, boxes, labels, box):
x, y, x2, y2 = box
w = x2 - x
h = y2 - y
img = img.crop((x, y, x2, y2))
# check if center is still inside tile_box, otherwise ignore box
# (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
center = (boxes[:, :2] + boxes[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
if mask.any():
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
boxes = box_clamp(boxes, 0, 0, w, h)
labels = labels[mask]
else:
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
labels = torch.tensor([0], dtype=torch.long)
return img, boxes, labels
+23
Ver Arquivo
@@ -0,0 +1,23 @@
import torch
import random
from PIL import Image
def pad(img, target_size):
'''Pad image with zeros to the specified size.
Args:
img: (PIL.Image) image to be padded.
target_size: (tuple) target size of (ow,oh).
Returns:
img: (PIL.Image) padded image.
Reference:
`tf.image.pad_to_bounding_box`
'''
w, h = img.size
canvas = Image.new('RGB', target_size)
canvas.paste(img, (0,0)) # paste on the left-up corner
return canvas
+23
Ver Arquivo
@@ -0,0 +1,23 @@
import torch
import random
from PIL import Image
def pad(img, target_size):
'''Pad image with zeros to the specified size.
Args:
img: (PIL.Image) image to be padded.
target_size: (tuple) target size of (ow,oh).
Returns:
img: (PIL.Image) padded image.
Reference:
`tf.image.pad_to_bounding_box`
'''
w, h = img.size
canvas = Image.new('L', target_size)
canvas.paste(img, (0, 0)) # paste on the left-up corner
return canvas
+64
Ver Arquivo
@@ -0,0 +1,64 @@
'''This random crop strategy is described in paper:
[1] SSD: Single Shot MultiBox Detector
'''
import math
import torch
import random
from PIL import Image
# from torchcv.utils.box import box_iou, box_clamp
from ..box import box_iou, box_clamp
def random_crop(
img, boxes, labels,
min_scale=0.3,
max_aspect_ratio=2.):
'''Randomly crop a PIL image.
Args:
img: (PIL.Image) image.
boxes: (tensor) bounding boxes, sized [#obj, 4].
labels: (tensor) bounding box labels, sized [#obj,].
min_scale: (float) minimal image width/height scale.
max_aspect_ratio: (float) maximum width/height aspect ratio.
Returns:
img: (PIL.Image) cropped image.
boxes: (tensor) object boxes.
labels: (tensor) object labels.
'''
imw, imh = img.size
params = [(0, 0, imw, imh)] # crop roi (x,y,w,h) out
for min_iou in (0, 0.1, 0.3, 0.5, 0.7, 0.9):
for _ in range(100):
scale = random.uniform(min_scale, 1)
aspect_ratio = random.uniform(
max(1 / max_aspect_ratio, scale * scale),
min(max_aspect_ratio, 1 / (scale * scale)))
w = int(imw * scale * math.sqrt(aspect_ratio))
h = int(imh * scale / math.sqrt(aspect_ratio))
x = random.randrange(imw - w)
y = random.randrange(imh - h)
roi = torch.tensor([[x, y, x + w, y + h]], dtype=torch.float)
ious = box_iou(boxes, roi)
if ious.min() >= min_iou:
params.append((x, y, w, h))
break
x, y, w, h = random.choice(params)
img = img.crop((x, y, x + w, y + h))
center = (boxes[:, :2] + boxes[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
if mask.any():
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
boxes = box_clamp(boxes, 0, 0, w, h)
labels = labels[mask]
else:
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
labels = torch.tensor([0], dtype=torch.long)
return img, boxes, labels
@@ -0,0 +1,55 @@
'''This random crop strategy is described in paper:
[1] SSD: Single Shot MultiBox Detector
'''
import math
import torch
import random
from PIL import Image
# from torchcv.utils.box import box_iou, box_clamp
from ..box import box_iou, box_clamp
def random_crop_tile(
img, boxes, labels,
scale_range=[0.8, 1],
max_aspect_ratio=2.):
'''Randomly crop a PIL image.
Args:
img: (PIL.Image) image.
boxes: (tensor) bounding boxes, sized [#obj, 4].
labels: (tensor) bounding box labels, sized [#obj,].
scale_range: [float,float] minimal image width/height scale.
max_aspect_ratio: (float) maximum width/height aspect ratio.
Returns:
img: (PIL.Image) cropped image.
boxes: (tensor) object boxes.
labels: (tensor) object labels.
'''
imw, imh = img.size
scale = random.uniform(scale_range[0], scale_range[1])
aspect_ratio = random.uniform(
max(1 / max_aspect_ratio, scale * scale),
min(max_aspect_ratio, 1 / (scale * scale)))
w = int(imw * scale * math.sqrt(aspect_ratio))
h = int(imh * scale / math.sqrt(aspect_ratio))
x = random.randrange(imw - w)
y = random.randrange(imh - h)
img = img.crop((x, y, x + w, y + h))
center = (boxes[:, :2] + boxes[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
if mask.any():
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
boxes = box_clamp(boxes, 0, 0, w, h)
labels = labels[mask]
else:
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
labels = torch.tensor([0], dtype=torch.long)
return img, boxes, labels
@@ -0,0 +1,56 @@
import torch
import random
import torchvision.transforms as transforms
from PIL import Image
def random_distort(
img,
brightness_delta=32 / 255.,
contrast_delta=0.5,
saturation_delta=0.5,
hue_delta=0.1):
'''A color related data augmentation used in SSD.
Args:
img: (PIL.Image) image to be color augmented.
brightness_delta: (float) shift of brightness, range from [1-delta,1+delta].
contrast_delta: (float) shift of contrast, range from [1-delta,1+delta].
saturation_delta: (float) shift of saturation, range from [1-delta,1+delta].
hue_delta: (float) shift of hue, range from [-delta,delta].
Returns:
img: (PIL.Image) color augmented image.
'''
def brightness(img, delta):
if random.random() < 0.5:
img = transforms.ColorJitter(brightness=delta)(img)
return img
def contrast(img, delta):
if random.random() < 0.5:
img = transforms.ColorJitter(contrast=delta)(img)
return img
def saturation(img, delta):
if random.random() < 0.5:
img = transforms.ColorJitter(saturation=delta)(img)
return img
def hue(img, delta):
if random.random() < 0.5:
img = transforms.ColorJitter(hue=delta)(img)
return img
img = brightness(img, brightness_delta)
if random.random() < 0.5:
img = contrast(img, contrast_delta)
img = saturation(img, saturation_delta)
img = hue(img, hue_delta)
else:
img = saturation(img, saturation_delta)
img = hue(img, hue_delta)
img = contrast(img, contrast_delta)
return img
+28
Ver Arquivo
@@ -0,0 +1,28 @@
import torch
import random
from PIL import Image
def random_flip(img, boxes):
'''Randomly flip PIL image.
If boxes is not None, flip boxes accordingly.
Args:
img: (PIL.Image) image to be flipped.
boxes: (tensor) object boxes, sized [#obj,4].
Returns:
img: (PIL.Image) randomly flipped image.
boxes: (tensor) randomly flipped boxes.
'''
if random.random() < 0.5:
img = img.transpose(Image.FLIP_LEFT_RIGHT)
w = img.width
if boxes is not None:
xmin = w - boxes[:,2]
xmax = w - boxes[:,0]
boxes[:,0] = xmin
boxes[:,2] = xmax
return img, boxes
+33
Ver Arquivo
@@ -0,0 +1,33 @@
import torch
import random
from PIL import Image
def random_paste(img, boxes, max_ratio=4, fill=0):
'''Randomly paste the input image on a larger canvas.
If boxes is not None, adjust boxes accordingly.
Args:
img: (PIL.Image) image to be flipped.
boxes: (tensor) object boxes, sized [#obj,4].
max_ratio: (int) maximum ratio of expansion.
fill: (tuple) the RGB value to fill the canvas.
Returns:
canvas: (PIL.Image) canvas with image pasted.
boxes: (tensor) adjusted object boxes.
'''
w, h = img.size
ratio = random.uniform(1, max_ratio)
ow, oh = int(w*ratio), int(h*ratio)
canvas = Image.new('RGB', (ow,oh), fill)
x = random.randint(0, ow - w)
y = random.randint(0, oh - h)
canvas.paste(img, (x,y))
if boxes is not None:
boxes = boxes + torch.tensor([x,y,x,y], dtype=torch.float)
return canvas, boxes
+60
Ver Arquivo
@@ -0,0 +1,60 @@
import torch
import random
from PIL import Image
def resize(img, boxes, size, max_size=1000, scale=None, random_interpolation=False):
'''Resize the input PIL image to given size.
If boxes is not None, resize boxes accordingly.
Args:
img: (PIL.Image) image to be resized.
boxes: (tensor) object boxes, sized [#obj,4].
size: (tuple or int)
- if is tuple, resize image to the size.
- if is int, resize the shorter side to the size while maintaining the aspect ratio.
max_size: (int) when size is int, limit the image longer size to max_size.
This is essential to limit the usage of GPU memory.
random_interpolation: (bool) randomly choose a resize interpolation method.
Returns:
img: (PIL.Image) resized image.
boxes: (tensor) resized boxes.
Example:
>> img, boxes = resize(img, boxes, 600) # resize shorter side to 600
>> img, boxes = resize(img, boxes, (500,600)) # resize image size to (500,600)
>> img, _ = resize(img, None, (500,600)) # resize image only
'''
w, h = img.size
if scale is None:
if isinstance(size, int):
size_min = min(w,h)
size_max = max(w,h)
sw = sh = float(size) / size_min
if sw * size_max > max_size:
sw = sh = float(max_size) / size_max
ow = int(w * sw + 0.5)
oh = int(h * sh + 0.5)
else:
ow, oh = size
sw = float(ow) / w
sh = float(oh) / h
else:
ow = int(w * scale)
oh = int(h * scale)
sw, sh = scale, scale
method = random.choice([
Image.BOX,
Image.NEAREST,
Image.HAMMING,
Image.BICUBIC,
Image.LANCZOS,
Image.BILINEAR]) if random_interpolation else Image.BILINEAR
img = img.resize((ow,oh), method)
if boxes is not None:
boxes = boxes * torch.tensor([sw,sh,sw,sh])
return img, boxes
+36
Ver Arquivo
@@ -0,0 +1,36 @@
import torch
import random
from PIL import Image
def scale_jitter(img, boxes, sizes, max_size=1400):
'''Randomly scale image shorter side to one of the sizes.
If boxes is not None, resize boxes accordingly.
Args:
img: (PIL.Image) image to be resized.
boxes: (tensor) object boxes, sized [#obj,4].
sizes: (tuple) scale sizes.
max_size: (int) limit the image longer size to max_size.
Returns:
img: (PIL.Image) resized image.
boxes: (tensor) resized boxes.
'''
w, h = img.size
size_min = min(w,h)
size_max = max(w,h)
size = random.choice(sizes)
sw = sh = float(size) / size_min
if sw * size_max > max_size:
sw = sh = float(max_size) / size_max
ow = int(w * sw + 0.5)
oh = int(h * sh + 0.5)
img = img.resize((ow,oh), Image.BILINEAR)
if boxes is not None:
boxes = boxes * torch.tensor([sw,sh,sw,sh])
return img, boxes
+27
Ver Arquivo
@@ -0,0 +1,27 @@
import math
import torch
import random
from PIL import Image
from ..box import box_iou, box_clamp
def crop_box_lm(img, boxes, labels, linemap, box):
x, y, x2, y2 = box
w = x2 - x
h = y2 - y
img = img.crop((x, y, x2, y2))
linemap = linemap.crop((x, y, x2, y2))
# check if center is still inside tile_box, otherwise ignore box
# (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
center = (boxes[:, :2] + boxes[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
if mask.any():
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
boxes = box_clamp(boxes, 0, 0, w, h)
labels = labels[mask]
else:
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
labels = torch.tensor([0], dtype=torch.long)
return img, boxes, labels, linemap
+26
Ver Arquivo
@@ -0,0 +1,26 @@
import torch
import random
from PIL import Image
def pad_lm(img, linemap, target_size):
'''Pad image with zeros to the specified size.
Args:
img: (PIL.Image) image to be padded.
target_size: (tuple) target size of (ow,oh).
Returns:
img: (PIL.Image) padded image.
Reference:
`tf.image.pad_to_bounding_box`
'''
w, h = img.size
canvas = Image.new('L', target_size)
canvas.paste(img, (0, 0)) # paste on the left-up corner
canvas_line = Image.new('1', target_size)
canvas_line.paste(linemap, (0, 0)) # paste on the left-up corner
return canvas, canvas_line
@@ -0,0 +1,56 @@
'''This random crop strategy is described in paper:
[1] SSD: Single Shot MultiBox Detector
'''
import math
import torch
import random
from PIL import Image
# from torchcv.utils.box import box_iou, box_clamp
from ..box import box_iou, box_clamp
def random_crop_tile_lm(
img, boxes, labels, linemap,
scale_range=[0.8, 1],
max_aspect_ratio=2.):
'''Randomly crop a PIL image.
Args:
img: (PIL.Image) image.
boxes: (tensor) bounding boxes, sized [#obj, 4].
labels: (tensor) bounding box labels, sized [#obj,].
scale_range: [float,float] minimal image width/height scale.
max_aspect_ratio: (float) maximum width/height aspect ratio.
Returns:
img: (PIL.Image) cropped image.
boxes: (tensor) object boxes.
labels: (tensor) object labels.
'''
imw, imh = img.size
scale = random.uniform(scale_range[0], scale_range[1])
aspect_ratio = random.uniform(
max(1 / max_aspect_ratio, scale * scale),
min(max_aspect_ratio, 1 / (scale * scale)))
w = int(imw * scale * math.sqrt(aspect_ratio))
h = int(imh * scale / math.sqrt(aspect_ratio))
x = random.randrange(imw - w)
y = random.randrange(imh - h)
img = img.crop((x, y, x + w, y + h))
linemap = linemap.crop((x, y, x + w, y + h))
center = (boxes[:, :2] + boxes[:, 2:]) / 2
mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
& (center[:, 1] >= y) & (center[:, 1] <= y + h)
if mask.any():
boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
boxes = box_clamp(boxes, 0, 0, w, h)
labels = labels[mask]
else:
boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
labels = torch.tensor([0], dtype=torch.long)
return img, boxes, labels, linemap
+62
Ver Arquivo
@@ -0,0 +1,62 @@
import torch
import random
from PIL import Image
def resize_lm(img, boxes, linemap, size, max_size=1000, scale=None, random_interpolation=False):
'''Resize the input PIL image to given size.
If boxes is not None, resize boxes accordingly.
Args:
img: (PIL.Image) image to be resized.
boxes: (tensor) object boxes, sized [#obj,4].
size: (tuple or int)
- if is tuple, resize image to the size.
- if is int, resize the shorter side to the size while maintaining the aspect ratio.
max_size: (int) when size is int, limit the image longer size to max_size.
This is essential to limit the usage of GPU memory.
random_interpolation: (bool) randomly choose a resize interpolation method.
Returns:
img: (PIL.Image) resized image.
boxes: (tensor) resized boxes.
Example:
>> img, boxes = resize(img, boxes, 600) # resize shorter side to 600
>> img, boxes = resize(img, boxes, (500,600)) # resize image size to (500,600)
>> img, _ = resize(img, None, (500,600)) # resize image only
'''
w, h = img.size
if scale is None:
if isinstance(size, int):
size_min = min(w, h)
size_max = max(w, h)
sw = sh = float(size) / size_min
if sw * size_max > max_size:
sw = sh = float(max_size) / size_max
ow = int(w * sw + 0.5)
oh = int(h * sh + 0.5)
else:
ow, oh = size
sw = float(ow) / w
sh = float(oh) / h
else:
ow = int(w * scale)
oh = int(h * scale)
sw, sh = scale, scale
method = random.choice([
Image.BOX,
Image.NEAREST,
Image.HAMMING,
Image.BICUBIC,
Image.LANCZOS,
Image.BILINEAR]) if random_interpolation else Image.BILINEAR
img = img.resize((ow, oh), method)
linemap = linemap.resize((ow, oh), Image.NEAREST)
if boxes is not None:
boxes = boxes * torch.tensor([sw, sh, sw, sh])
return img, boxes, linemap
+340
Ver Arquivo
@@ -0,0 +1,340 @@
import numbers
import numpy as np
from PIL import Image
from PIL import ImageOps
import random
from random import randint
import torch.functional as F
# from skimage.transform import warp, AffineTransform
from bbox_utils import intersection_over_union
# own stuff
def convert2binaryPIL(lbl_ind):
# convert to PIL binary '1' without dither
lbl_im = Image.fromarray(np.uint8(lbl_ind))
fn = (lambda x: 255 if x > 0 else 0)
lbl_im = lbl_im.convert('L').point(fn, mode='1')
return lbl_im
def pad2square(bb, context_pad_ratio=0, context_pad=0, take_long_side=True):
# -- extract square patches using ground truth bounding boxes
# assert (context_pad >= 0 and context_pad_ratio == 0) or (context_pad_ratio >= 0 and context_pad == 0)
width = bb[2] - bb[0]
height = bb[3] - bb[1]
diff = width - height
width_is_smaller = 0 > diff
height_is_smaller = 0 < diff
if take_long_side:
# take long side
if context_pad == 0:
if width_is_smaller:
context_pad = np.round(context_pad_ratio * height)
else:
context_pad = np.round(context_pad_ratio * width)
bb[0] = bb[0] - context_pad - (width_is_smaller * np.ceil(0.5 * (height - width)))
bb[2] = bb[2] + context_pad + (width_is_smaller * np.floor(0.5 * (height - width)))
bb[1] = bb[1] - context_pad - (height_is_smaller * np.ceil(0.5 * (width - height)))
bb[3] = bb[3] + context_pad + (height_is_smaller * np.floor(0.5 * (width - height)))
else:
# take small side
if context_pad == 0:
if width_is_smaller:
context_pad = np.round(context_pad_ratio * width)
else:
context_pad = np.round(context_pad_ratio * height)
bb[0] = bb[0] - context_pad - (height_is_smaller * np.ceil(0.5 * (height - width)))
bb[2] = bb[2] + context_pad + (height_is_smaller * np.floor(0.5 * (height - width)))
bb[1] = bb[1] - context_pad - (width_is_smaller * np.ceil(0.5 * (width - height)))
bb[3] = bb[3] + context_pad + (width_is_smaller * np.floor(0.5 * (width - height)))
return bb
# BBOX sampling / cropping functions
def crop_image(im, bb, context_pad=0, pad_to_square=False, mean_values=[0, 0, 0]):
"""
Crop a window from the image for detection. Include surrounding context
according to the `context_pad` configuration. Creates square crop which
respects the aspect ratio.
window: bounding box coordinates as xmin, ymin, xmax, ymax.
"""
# copy list and use as ndarray
bb = np.array(bb, dtype=int) # list(bb)
imw, imh = im.shape[:2]
# pad to square while preserving aspect ratio
if pad_to_square:
bb = pad2square(bb, context_pad=context_pad)
# -- check whether bbox inside image
# pad: [x_min, y_min, x_max, y_max]
pad = [0, 0, 0, 0]
if (bb[0] < 0):
pad[0] = abs(bb[0])
bb[0] = 0
if (bb[1] < 0):
pad[1] = abs(bb[1])
bb[1] = 0
if (bb[2] > imh):
pad[2] = bb[2] - imh
bb[2] = imh
if (bb[3] > imw):
pad[3] = bb[3] - imw
bb[3] = imw
# -- apply zero padding if necessary
im = im[bb[1]:bb[3], bb[0]:bb[2], :]
channel_mean = np.reshape(mean_values, (1, 1, 3)).astype(np.uint8)
if pad[0]>0:
pad_left = np.multiply(np.ones(shape=(imw, pad[0], 3), dtype=np.uint8),
np.tile(channel_mean,(imw, pad[0],1)))
im = np.concatenate((pad_left, im), axis=1)
if pad[1]>0:
pad_up = np.multiply(np.ones(shape=(pad[1], imh, 3), dtype=np.uint8),
np.tile(channel_mean, (pad[1], imh, 1)))
im = np.concatenate((pad_up, im), axis=0)
if pad[2]>0:
pad_right = np.multiply(np.ones(shape=(imw, pad[2], 3), dtype=np.uint8),
np.tile(channel_mean, (imw, pad[2], 1)))
im = np.concatenate((im, pad_right), axis=1)
if pad[3]>0:
pad_down = np.multiply(np.ones(shape=(pad[3], imh, 3), dtype=np.uint8),
np.tile(channel_mean, (pad[3], imh, 1)))
im = np.concatenate((im, pad_down), axis=0)
return im, bb.tolist()
else:
if context_pad > 0:
# better use crop_pil_image
return NotImplemented
# return simple crop
return im[bb[1]:bb[3], bb[0]:bb[2], :]
def crop_pil_image(im, bb, context_pad=0, pad_to_square=False, fill_values=None):
"""
Crop a window from the image for detection. Include surrounding context
according to the `context_pad` configuration. Creates square crop which
respects the aspect ratio.
window: bounding box coordinates as xmin, ymin, xmax, ymax.
"""
# copy list and use as ndarray
bb = np.array(bb, dtype=int) # list(bb)
imw, imh = im.size
# pad to square while preserving aspect ratio
if pad_to_square:
bb = pad2square(bb, context_pad=context_pad)
if fill_values is None:
# if cropped out of image range, pillow pads with zeros automatically
im = im.crop((bb[0], bb[1], bb[2], bb[3]))
else:
# check whether bbox inside image
# pad: [x_min, y_min, x_max, y_max]
pad = [0, 0, 0, 0]
if bb[0] < 0:
pad[0] = abs(bb[0])
bb[0] = 0
if bb[1] < 0:
pad[1] = abs(bb[1])
bb[1] = 0
if bb[2] > imh:
pad[2] = bb[2] - imh
bb[2] = imh
if bb[3] > imw:
pad[3] = bb[3] - imw
bb[3] = imw
# crop box
im = im.crop((bb[0], bb[1], bb[2], bb[3]))
# apply zero padding if necessary
im = ImageOps.expand(im, border=(pad[0], pad[1], pad[2], pad[3]), fill=tuple(fill_values))
return im, bb.tolist()
else:
if context_pad > 0:
bb[0] = max(bb[0] - context_pad, 0)
bb[2] = min(bb[2] + context_pad, imw)
bb[1] = max(bb[1] - context_pad, 0)
bb[3] = min(bb[3] + context_pad, imh)
# return simple crop
return im.crop((bb[0], bb[1], bb[2], bb[3])), bb.tolist()
def spatial_sample(im_pad, bb, spatial_sample_rng, rnd_scale_ratio=0.05):
im = im_pad
imh, imw = im.shape[:2]
im_bb = [0, 0, imw, imh]
# make ground truth box square, and use its dimensions
bb_gt = list(bb)
w = bb[2] - bb[0]
h = bb[3] - bb[1]
if w > h:
bb_gt[1] = int(bb_gt[1] - np.ceil(0.5 * (w - h)))
bb_gt[3] = int(bb_gt[3] + np.floor(0.5 * (w - h)))
h = w
else:
bb_gt[0] = int(bb_gt[0] - np.ceil(0.5 * (h - w)))
bb_gt[2] = int(bb_gt[2] + np.floor(0.5 * (h - w)))
w = h
# add random scaling to test bbox
# by treating dimension differently the aspect ratio will fluctuate a little (due to resizing afterwards!)
wrange = round(rnd_scale_ratio * w)
hrange = round(rnd_scale_ratio * h)
w = min(w + random.randint(-wrange, 2*wrange), imw - 1) # ensure size is in im_pad
h = w # min(h + random.randint(hrange, 2*hrange), imh - 1) # ensure size is in im_pad
# set ranges according to provided label
min_IoU = spatial_sample_rng[0]
max_IoU = spatial_sample_rng[1]
max_iter = 500
curr_iter = 0
ratio = 0.0
while curr_iter < max_iter and (ratio >= max_IoU or ratio <= min_IoU):
curr_iter += 1
# bbox sampling
jxy = [randint(0, im_bb[2] - w), randint(0, im_bb[3] - h)]
bb_test = list([jxy[0], jxy[1], w + jxy[0], h + jxy[1]])
# check if new box fits criteria
if min(bb_test) >= 0 and bb_test[2] <= im.shape[1] and bb_test[3] <= im.shape[0]:
ratio = intersection_over_union(bb_test, bb_gt)
if max_IoU >= ratio >= min_IoU:
im = im[bb_test[1]:bb_test[3], bb_test[0]:bb_test[2], :]
# new_bb_gt = [bb_gt[0] - bb_test[0], bb_gt[1] - bb_test[1], bb_gt[2] - bb_test[0], bb_gt[3] - bb_test[1]]
new_bb_gt = bb_gt
else:
im = im
new_bb_gt = bb_gt
# DEBUG_MODE = False
# if DEBUG_MODE:
# print "tricky box", w, h, imw, imh
return im, new_bb_gt, bb_test
# TRANSFORMS
class MyRandomZoom(object):
def __init__(self, scale_range, interpolation=Image.BILINEAR):
self.scale_range = scale_range
self.interpolation = interpolation
def __call__(self, img):
scale = np.random.uniform(*self.scale_range)
new_size = (int(img.height * scale), int(img.width * scale))
return F.resize(img, new_size, self.interpolation)
class MyFuzzyZoom(object):
"""
:param target_size: (2-tuple) height, width
:param scale_range: (2-tuple) range from which target_size may deviate
:param interpolation: ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional)
"""
def __init__(self, target_size, scale_range, interpolation=Image.BILINEAR):
self.target_size = target_size
self.scale_range = scale_range
self.interpolation = interpolation
@staticmethod
def get_params(scale_range):
return np.random.uniform(*scale_range)
def __call__(self, img):
scale = self.get_params(self.scale_range)
new_size = (int(self.target_size[0] * scale), int(self.target_size[1] * scale))
return F.resize(img, new_size, self.interpolation)
class MyRandomChoiceZoom(object):
def __init__(self, scales, p=None, interpolation=Image.BILINEAR):
self.scales = scales
self.interpolation = interpolation
self.p = p
def __call__(self, img):
scale = np.random.choice(self.scales, replace=True, p=self.p)
new_size = (int(img.height * scale), int(img.width * scale))
return F.resize(img, new_size, self.interpolation)
class MyRandomCenteredRotation(object):
"""
Args:
degrees (sequence or float or int): Range of degrees to select from.
If degrees is a number instead of sequence like (min, max), the range of degrees
will be (-degrees, +degrees).
translation_range (2-tuple): Range of pixels to select from.
The center of rotation is shifted according to a number sampled from this range.
resample ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional):
An optional resampling filter.
See http://pillow.readthedocs.io/en/3.4.x/handbook/concepts.html#filters
If omitted, or if the image has mode "1" or "P", it is set to PIL.Image.NEAREST.
"""
def __init__(self, degrees, translation_range=(-3, 3), resample=Image.BILINEAR):
if isinstance(degrees, numbers.Number):
if degrees < 0:
raise ValueError("If degrees is a single number, it must be positive.")
self.degrees = (-degrees, degrees)
else:
if len(degrees) != 2:
raise ValueError("If degrees is a sequence, it must be of len 2.")
self.degrees = degrees
self.translation_range = translation_range
self.resample = resample
def __call__(self, img):
angle = np.random.uniform(*self.degrees)
translated_center = None
if self.translation_range:
translated_center = (
np.random.uniform(*self.translation_range) + int(img.height/2),
np.random.uniform(*self.translation_range) + int(img.width/2)
)
return F.rotate(img, angle, resample=self.resample, expand=False, center=translated_center)
class UnNormalize(object):
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, tensor):
"""
Args:
tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
Returns:
Tensor: Normalized image.
"""
for t, m, s in zip(tensor, self.mean, self.std):
t.mul_(s).add_(m)
# The normalize code -> t.sub_(m).div_(s)
return tensor
Ver Arquivo
+62
Ver Arquivo
@@ -0,0 +1,62 @@
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, cdist, squareform
def show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color):
# Generating figure 1
fig, axes = plt.subplots(1, 2, figsize=(15, 6), # (25, 10)
subplot_kw={'adjustable': 'box-forced'})
ax = axes.ravel()
ax[0].imshow(center_im, cmap='gray')
ax[0].set_title('Input image')
ax[0].set_axis_off()
ax[1].imshow(lbl_ind_x, cmap='gray')
for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
angle = line_rec.angle
dist = line_rec.dist
y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
ax[1].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
ax[1].text(0, y0, '{}'.format(int(line_rec.tl_line)),
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
ax[1].set_xlim((0, lbl_ind_x.shape[1]))
ax[1].set_ylim((lbl_ind_x.shape[0], 0))
ax[1].set_axis_off()
ax[1].set_title('Detected lines / Assigned tl line idx')
def show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag):
# Generating figure 1
fig, axes = plt.subplots(1, 3, figsize=(15, 6),
subplot_kw={'adjustable': 'box-forced'})
ax = axes.ravel()
# weak score
X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_weak_score(a_idx.squeeze(), b_idx.squeeze()))
ax[0].imshow(X_dist, cmap='gray')
ax[0].set_title('weak score')
print(np.diag(X_dist))
# ransac score
# X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
max_dist_thresh=2, dist_weight=1)) # 5/5, 4/1
ax[1].imshow(X_dist, cmap='gray_r')
ax[1].set_title('ransac score')
print(np.diag(X_dist))
# line matching score
X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
ax[2].imshow(X_dist, cmap='gray_r') # vmin=0, vmax=1
ax[2].set_title('line matching score')
print(np.diag(X_dist))
+100
Ver Arquivo
@@ -0,0 +1,100 @@
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage as ndi
def show_line_skeleton(lbl_ind_x, skeleton):
# display results
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 4), sharex=True, sharey=True,
subplot_kw={'adjustable': 'box-forced'})
ax = axes.ravel()
ax[0].imshow(lbl_ind_x, cmap=plt.cm.gray)
ax[0].axis('off')
ax[0].set_title('original', fontsize=20)
ax[1].imshow(skeleton, cmap=plt.cm.gray)
ax[1].axis('off')
ax[1].set_title('skeleton', fontsize=20)
ax[2].imshow(ndi.label(skeleton, structure=np.ones((3, 3)))[0], cmap=plt.cm.spectral)
ax[2].axis('off')
ax[2].set_title('skeleton', fontsize=20)
fig.tight_layout()
def show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color):
# Generating figure 1
fig, axes = plt.subplots(1, 3, figsize=(15, 6),
subplot_kw={'adjustable': 'box-forced'}) # (25, 15)
ax = axes.ravel()
ax[0].imshow(center_im, cmap='gray')
ax[0].set_title('Input image')
ax[0].set_axis_off()
ax[1].imshow(np.log(1 + h),
extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]],
cmap='gray', aspect=1 / 1.5)
ax[1].set_title('Hough transform')
ax[1].set_xlabel('Angles (degrees)')
ax[1].set_ylabel('Distance (pixels)')
ax[1].axis('image')
ax[2].imshow(lbl_ind_x, cmap='gray')
for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
angle = line_rec.angle
dist = line_rec.dist
y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
ax[2].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
ax[2].set_xlim((0, lbl_ind_x.shape[1]))
ax[2].set_ylim((lbl_ind_x.shape[0], 0))
ax[2].set_axis_off()
ax[2].set_title('Detected lines')
# ax[2].imshow(lbl_ind, cmap='gray')
# ax[2].set_title('Input image')
# ax[2].set_axis_off()
def show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color):
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
ax = axes.ravel()
ax[0].imshow(center_im, cmap='gray')
ax[0].set_title('Input image')
# ax[1].imshow(lbl_ind_x, cmap='gray')
# ax[1].set_title('line det')
ax[1].imshow(lbl_ind_x * 0)
for line, li in zip(line_segs, ls_labels):
p0, p1 = line
ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[int(group2line[li])], linewidth=2)
ax[1].text(p0[0], p0[1], '{}'.format(group2line[li]),
bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
ax[1].set_xlim((0, lbl_ind_x.shape[1]))
ax[1].set_ylim((lbl_ind_x.shape[0], 0))
ax[1].set_title('Probabilistic Hough')
def show_line_segms(image_label_overlay, segm_labels):
fig, axes = plt.subplots(1, 2, figsize=(15, 9)) # 25, 15
ax = axes.ravel()
ax[0].imshow(image_label_overlay, cmap='gray')
ax[0].set_title('Input image')
ax[1].imshow(segm_labels)
ax[1].set_title('Line segments')

Alguns arquivos não foram exibidos porque demasiados arquivos foram alterados neste diff Mostrar Mais